[ofa-general] IPOIB/CM increase retry counts

Pradeep Satyanarayana pradeeps at linux.vnet.ibm.com
Tue Feb 12 14:29:24 PST 2008


Eli Cohen wrote:
> Sean Hefty wrote:
>>> Saying all that, I don't think we want to have --any RNR retries--, as
>>> for retries, I am open to hear what others think.
>>
>> I'm really not all that familiar with ipoib protocol, but if it's being
>> implemented over an RC connection, then adding an RNR retry seems to
>> make sense
>> to me.  I believe using UC is better, but if it's over RC, I don't
>> know that we
>> want to take the hit of tearing down and re-establishing the
>> connection just
>> because we have a fast sender.  (This is just an opinion based on no fact
>> whatsoever.)
>>
> 
> I don't see why setting rnr retry count can help if we have a fast
> sender. If this sender is faster than the receiver eventually the rnr
> counter will expire
> and the connection will close.
> 
> As for retry count, I don't know how common are errors that contribute
> to the retry counter. If anyone has statistics of this I'd be glad to know.
> 
> Pradeep, can you tell identify what part of the patch you sent actually
> solved the problem you were seeing and also give some description of the
> problem?
> 
I brought this issue up on the mailing list sometime in the summer of 2007 is
my recollection. I could not locate that with a quick search of the archives.
I will probably do that again later.

However, the crux of the issue is that I was seeing "send completion errors" and
that is what prompted me to change the retry counts. Please see Table 78 
"Completion Error Handling for RC Send Queues" in the IB Spec for reference.
And changing the retry counts did help.

Pradeep




More information about the general mailing list