[ofa-general] IPOIB/CM increase retry counts

Sean Hefty sean.hefty at intel.com
Thu Feb 14 10:26:44 PST 2008


>On the other hand, if the remote host is actually down, you will make "retry
>storms"
>worse by retrying both at the link layer AND at the TCP layer (each TCP retry
>resulting
>in multiple lower-layer retries).  This will have an effect on the fabric.

I don't think I would call retrying a send a few more times a storm; it's a
point to point send.  When the remote host drops, the first think IPoIB will do
is try to reconnect, which involves sending CM MADs to the unavailable node in
an effort to restablish the connection anyway.  I don't think we try optimizing
for the case when systems crash.

In any case, I thought the problem was more related to RNR Nacks than simple
retries, but that doesn't seem to be the case.

- Sean




More information about the general mailing list