[openib-general] ucma into kernel.org

Sean Hefty sean.hefty at intel.com
Tue Jul 25 12:42:37 PDT 2006


>I thought about this some more. I think there's value in making it generic.
>Can we maybe emulate TCP by changing the TID, or is this better done in the
>ULP, in your opinion?

Thinking out loud here...

I don't think that it makes sense to change the IB CM to support retrying a REQ
more than that specified by the spec.  Max CM retries is also used by other CM
messages, plus there's the problem that what the active side is sending as a
retry is really a new request to the passive side, and both requests carry the
same active QPN.

The problem that we were seeing running Intel MPI, and I'm guessing at least a
couple of the other MPI implementations are hitting as well, wasn't that the
number of retries was too small, but that the remote_cm_response_timeout was.
Connections were taking minutes to form.  Setting max CM retries to the largest
value only helped to a point.

My solution was to allow the user to override the IB CM REQ parameters used by
the RDMA CM.  This included local and remote CM response timeouts, plus max CM
retries.  It sounds like the only value that you want to make generic is max CM
retries.

Could the CMA retry a connection request after it times out by the IB CM?  I
think so, but that gets back to the issue of the passive IB CM seeing different
connection requests for the same QP.  For the actual problem I was trying to
solve, the original REQ had been received, so a second REQ would have been
rejected due to a duplicate QPN.

I think that a generic solution would have to scale down to the lowest value.

- Sean




More information about the general mailing list