[openib-general] ucma into kernel.org

Michael S. Tsirkin mst at mellanox.co.il
Tue Jul 25 13:15:07 PDT 2006


Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [openib-general] ucma into kernel.org
> 
> >I thought about this some more. I think there's value in making it generic.
> >Can we maybe emulate TCP by changing the TID, or is this better done in the
> >ULP, in your opinion?
> 
> Thinking out loud here...
> 
> I don't think that it makes sense to change the IB CM to support retrying a REQ
> more than that specified by the spec.  Max CM retries is also used by other CM
> messages, plus there's the problem that what the active side is sending as a
> retry is really a new request to the passive side, and both requests carry the
> same active QPN.
> 
> The problem that we were seeing running Intel MPI, and I'm guessing at least a
> couple of the other MPI implementations are hitting as well, wasn't that the
> number of retries was too small, but that the remote_cm_response_timeout was.
> Connections were taking minutes to form.  Setting max CM retries to the largest
> value only helped to a point.
> 
> My solution was to allow the user to override the IB CM REQ parameters used by
> the RDMA CM.  This included local and remote CM response timeouts, plus max CM
> retries.  It sounds like the only value that you want to make generic is max CM
> retries.
> 
> Could the CMA retry a connection request after it times out by the IB CM?  I
> think so, but that gets back to the issue of the passive IB CM seeing different
> connection requests for the same QP.  For the actual problem I was trying to
> solve, the original REQ had been received, so a second REQ would have been
> rejected due to a duplicate QPN.

How do you mean duplicate QPN? You can;t track remote QPNs, can you?

> I think that a generic solution would have to scale down to the lowest value.
> 
> - Sean
> 

-- 
MST




More information about the general mailing list