[ofa-general] Re: RDMA/iwarp CM question

Kanoj Sarcar kanojsarcar at yahoo.com
Wed Sep 12 14:07:19 PDT 2007


--- Steve Wise <swise at opengridcomputing.com> wrote:

> 
> >>>            
> >> It looks to me like ucma_clean_events() calls
> >> rdma_destroy_id() / 
> >> iw_destroy_cm_id() / destroy_cm_id() which calls
> the
> >> provider reject 
> >> function.  Or NOT! :)  There's a comment in the
> >> IW_CM_STATE_CONN_RECV 
> >> case inside destroy_cm_id():
> >>
> >>>                 /*
> >>>                  * App called destroy
> >> before/without calling accept after
> >>>                  * receiving connection request
> >> event notification or
> >>>                  * returned non zero from the
> >> event callback function.
> >>>                  * In either case, must tell the
> >> provider to reject.
> >>>                  */
> >> But I don't see the call to reject the
> connection...
> >>
> >> Maybe you could add it and see if it clears up
> your
> >> issue?
> > 
> > I haven't hit a problem yet, I am looking at what
> my
> > driver should/should not do ...
> > 
> >>
> >>> Doesn't this sound like a problem (namely
> >>> provider/card resource leak due to races with
> >> listener
> >>> destruct)?
> >>>         
> >> It does.
> >>
> >> But MPA mandates a timeout so the connections
> will
> >> get aborted 
> >> eventually by the provider or peer...
> >>
> > 
> > I believe the timeout you are talking about
> applies to
> > limiting how long it takes (on responder side)
> from an
> > incoming SYN to receipt of complete MPA request. I
> > don't believe there is much logic in having a
> timeout
> > between the incoming-connect upcall send by the
> driver
> > and an eventual accept/reject done by the app, but
> > thats a seperate discussion.
> >
> 
> My point is the peer will abort the TCP connection
> if the passive side 
> never accepts or rejects.

Agreed, but even though the connection is aborted, the
handle for it can not be deallocted unless we can
guarantee the local system will not make accept/reject
calls.

> 
> 
> > The core problem is this though. On a listener
> > destruct, the driver can either do:
> > 
> > a. destroy all children on which an accept/reject
> has
> > not yet been invoked, and OFA stack then must stop
> app
> > from sending an accept/reject down in such case.
> There
> > is currently an attempt to do this at the ucma
> layer
> > (eg cleanup unpolled events), but it is not race
> free.
> > 
> 
> This code is only cleaning up cm_id's that have
> _not_ been reaped by the 
> application via get_rdma_cm_event().  Any connection
> requests that have 
> been reaped will stay around until the application
> disposes of them via 
> rdma_accept(), rdma_reject(), rdma_destroy_id(), or
> when the process exists.

Exactly, thats why I mentioned there is a non
race-free attempt; it is quite possible that the app
has polled some of the events and will make eventual
accept/rejects.

> 
> 
> > b. OFA guarantees than an eventual accept/reject
> > downcall will be made, and driver can rely on that
> to
> > prevent resource leakage.
> >
> 
> Yes I think the rdma core must guarantee an eventual
> accept/reject downcall.
> 
> 
> > Any other solution will have some problem
> somewhere.
> > EG, in your timeout suggestion, if the driver goes
> > ahead and cleans up the state on on-card resource
> for
> > the child, due to the race mentioned in a) above,
> the
> > app might succeed in making an eventual
> accept/reject,
> > leading to a kernel crash.
> > 
> > 
> >> But I think you've found a bug...
> >>
> >> Steve.
> >>
> > 
> > Are folks filing bugs in bugzilla or similar? 
> > 
> 
> You can if you want.  There is a bugzilla db on the
> ofa site...
> 
> Or provide the fix and test it. That would be
> ideal...
> 
> 
> Steve.
> 

I will likely file a bug just to ensure this is not
lost, since its unlikely I will be able to work on a
solution "soon".

Kanoj


       
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC



More information about the general mailing list