[ofa-general] Re: some questions on stale connection handling at the IB CM
Or Gerlitz
ogerlitz at voltaire.com
Thu Dec 20 05:41:52 PST 2007
Sean Hefty wrote:
>> So in the case of lost DREQ etc, in cm_match_req() we will pass the
>> checking for duplicate REQs but fall in the check for stale connections
>> and it can happen in endless loop? this seems like a bug to me.
> This problem isn't limited to stale connections. If a client tries to
> connect, gets a reject for whatever reason, ignores the reject, then
> tries to reconnect with the same parameters, then they've put themselves
> into an endless loop.
I don't follow: if they don't ignore the reject, but reuse the same QP
for their successive connection requests, each new REQ will pass the ID
check (duplicate REQs) but will fail on the remote QPN check, correct?
so what can a client do to not fall into that? what does it means to not
ignore the reject? note that even if on getting a reject they release
the qp and allocate new one, they can get the qp number.
>> Yes, this seems to be able to solve the keep-alive thing in a generic
>> fashion for all ULPs using the IB CM, will you be able to look on this
>> during the next weeks or so?
> This method can be used by apps today. The only enhancement that I can
> see being made is having the CM automatically send the messages at
> regular intervals. But I hesitate to add this to the CM since it
> doesn't have knowledge of traffic occurring over the QP, and may
> interfere with the app wanted to actually change alternate path information.
You mean one side to send a LAP message with the current path and the
peer replying with APR message confirming this is fine? I guess this LAP
sending has to carried out by both sides, correct? and its not supported
for RDMA-CM users...
As for your comments, assuming an app must notify the CM that it does
not use a QP anymore (and if not we delare it RTFM bug), as long as the
QP is alive from the CM view point, its perfectly fine to sends these
LAPs, doing this once every few seconds or tens of seconds will not
create heavy load, I think. As for the point of interfering with apps
that want to use LAP/APR for APM implementation over their protocols, we
can let the CM consumer specify if they want the CM to issue keep-alives
for them, and what is the frequency of sending the messages.
Or.
More information about the general
mailing list