[openib-general] ucma into kernel.org
Michael S. Tsirkin
mst at mellanox.co.il
Tue Jul 25 13:15:07 PDT 2006
Quoting r. Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [openib-general] ucma into kernel.org
>
> >I thought about this some more. I think there's value in making it generic.
> >Can we maybe emulate TCP by changing the TID, or is this better done in the
> >ULP, in your opinion?
>
> Thinking out loud here...
>
> I don't think that it makes sense to change the IB CM to support retrying a REQ
> more than that specified by the spec. Max CM retries is also used by other CM
> messages, plus there's the problem that what the active side is sending as a
> retry is really a new request to the passive side, and both requests carry the
> same active QPN.
>
> The problem that we were seeing running Intel MPI, and I'm guessing at least a
> couple of the other MPI implementations are hitting as well, wasn't that the
> number of retries was too small, but that the remote_cm_response_timeout was.
> Connections were taking minutes to form. Setting max CM retries to the largest
> value only helped to a point.
>
> My solution was to allow the user to override the IB CM REQ parameters used by
> the RDMA CM. This included local and remote CM response timeouts, plus max CM
> retries. It sounds like the only value that you want to make generic is max CM
> retries.
>
> Could the CMA retry a connection request after it times out by the IB CM? I
> think so, but that gets back to the issue of the passive IB CM seeing different
> connection requests for the same QP. For the actual problem I was trying to
> solve, the original REQ had been received, so a second REQ would have been
> rejected due to a duplicate QPN.
How do you mean duplicate QPN? You can;t track remote QPNs, can you?
> I think that a generic solution would have to scale down to the lowest value.
>
> - Sean
>
--
MST
More information about the general
mailing list