[ofa-general] Re: some questions on stale connection handling at the IB CM

Or Gerlitz ogerlitz at voltaire.com
Thu Dec 20 05:41:52 PST 2007


Sean Hefty wrote:
>> So in the case of lost DREQ etc, in cm_match_req() we will pass the
>> checking for duplicate REQs but fall in the check for stale connections
>> and it can happen in endless loop? this seems like a bug to me.

> This problem isn't limited to stale connections.  If a client tries to
> connect, gets a reject for whatever reason, ignores the reject, then
> tries to reconnect with the same parameters, then they've put themselves
> into an endless loop.

I don't follow: if they don't ignore the reject, but reuse the same QP 
for their successive connection requests, each new REQ will pass the ID 
check (duplicate REQs) but will fail on the remote QPN check, correct? 
so what can a client do to not fall into that? what does it means to not 
ignore the reject? note that even if on getting a reject they release 
the qp and allocate new one, they can get the qp number.

>> Yes, this seems to be able to solve the keep-alive thing in a generic
>> fashion for all ULPs using the IB CM, will you be able to look on this
>> during the next weeks or so?

> This method can be used by apps today.  The only enhancement that I can
> see being made is having the CM automatically send the messages at
> regular intervals.  But I hesitate to add this to the CM since it
> doesn't have knowledge of traffic occurring over the QP, and may
> interfere with the app wanted to actually change alternate path information.

You mean one side to send a LAP message with the current path and the 
peer replying with APR message confirming this is fine? I guess this LAP 
sending has to carried out by both sides, correct? and its not supported 
for RDMA-CM users...

As for your comments, assuming an app must notify the CM that it does 
not use a QP anymore (and if not we delare it RTFM bug), as long as the 
QP is alive from the CM view point, its perfectly fine to sends these 
LAPs, doing this once every few seconds or tens of seconds will not 
create heavy load, I think. As for the point of interfering with apps 
that want to use LAP/APR for APM implementation over their protocols, we 
can let the CM consumer specify if they want the CM to issue keep-alives 
for them, and what is the frequency of sending the messages.

Or.





More information about the general mailing list