[ofa-general] IPOIB/CM increase retry counts

Wed Feb 13 23:14:44 PST 2008

Jack Morgenstein wrote:
> On Wednesday 13 February 2008 20:08, Sean Hefty wrote:
>> IMO, the fact that TCP implements reliability doesn't mean it's unnecessary in
>> underlying layers.  For example, wireless typically adds reliability at the link
>> layer because the link itself is so unreliable.  If adding in reliability in the
>> underlying layers improves overall performance, then it makes sense to add it,
>> independent of the upper level protocol.
>>
>> Since RC is our 'link layer', overrunning the receiver doesn't just result in IP
>> resending the packet, but transitioning the QP into an error state, cleaning up,
>> re-establishing the connection, and then resending the packet.  This works, just
>> not well based on what Pradeep has seen.
>>
> On the other hand, if the remote host is actually down, you will make "retry storms"
> worse by retrying both at the link layer AND at the TCP layer (each TCP retry resulting
> in multiple lower-layer retries).  This will have an effect on the fabric.

If the remote host is down establishment of an RC connection does not arise. The UD
connection itself will fail.

Pradeep