[ewg] Re: [ofa-general] Re: [PATCH] IPOIB/CM Increase retry counts for OFED-1.3

Roland Dreier rdreier at cisco.com
Tue Feb 19 10:49:17 PST 2008


 > The "send completion errors" indicates the packet hasn't been sent out
 > to the wire. It seems the retries you have added induced a little bit
 > delay for the packet to be sent out successfully, which might indicates
 > some flow control or other issues in the device transport  layer?

Actually for RC a send completion error can occur if an ACK is not
received for the message.  It would be useful to know what the status
of the first failed send it though.

 > Do you have any suggestions on how to debug this problem? How can we
 > hack the mthca/ipoib code to narrow down the root cause of the problem?
 > From the behavior it looks like the local resource temp unavailable, but
 > it could be something else.

I definitely think we want to understand what the problem is.  For
example does it go away if you increase the RNR retry count but not
the ACK timeout retry count?  When the problem occurs is the receive
SRQ empty (or is it only happening with ehca's non-SRQ IPoIB/cm)?

 - R.



More information about the ewg mailing list