[ofa-general] IPOIB/CM increase retry counts
Sean Hefty
sean.hefty at intel.com
Wed Feb 13 10:08:18 PST 2008
>As I see it, the issue here is that from the view point of upper layers
>(TCP, UDP, etc) the IP service is expected to provide unreliable
>service. Hence layers that do need reliability such TCP, add that in
>their protocol, so adding it in the IP layer and below (eg IPoIB or the
>HW it uses) is in a way redundant since the upper layer is not aware to
>that.
IMO, the fact that TCP implements reliability doesn't mean it's unnecessary in
underlying layers. For example, wireless typically adds reliability at the link
layer because the link itself is so unreliable. If adding in reliability in the
underlying layers improves overall performance, then it makes sense to add it,
independent of the upper level protocol.
Since RC is our 'link layer', overrunning the receiver doesn't just result in IP
resending the packet, but transitioning the QP into an error state, cleaning up,
re-establishing the connection, and then resending the packet. This works, just
not well based on what Pradeep has seen.
>With all that, I am not religiously against adding the retries...
>however, I prefer to understand the original problem which seems to be
>an issue relates to HCA interoperability before putting the solution in
>the code. We both agree that UC is the way to go, and in that case the
>real problem would pop again, but higher layers would have to take care
>of it.
I definitely think UC is worth trying, but I would like to see how it performs
against RC. UC doesn't quite have the same issue as RC, since overrunning the
receiver doesn't require tearing down the connection.
- Sean
More information about the general
mailing list