[openib-general] Re: [PATCH 3/3] iWARP CM

Thu Mar 16 10:16:17 PST 2006

openib-general-bounces at openib.org wrote:
> Tom Tucker wrote:
>> The iWARP CM prevents this from happening by having a state
>> (DESTROYING) that prevents events from being delivered after
>> iw_destroy_cm_id has returned. This approach avoids having either the
>> kernel application thread or the event thread blocked while waiting
>> for the last reference to be released.
> 
> You need to consider destroy being called by a separate
> thread from the one processing events.  An event can be
> generated, queued, and just about to callback the user when
> the user calls destroy.  Place the event thread at the top of
> the user's callback routine.  There's no way to halt the
> execution of the callback at this point.  Now let the thread
> calling destroy execute and return to the user.  The callback
> code is still running, but the user is not even aware at this point.
> 
>> Unlike the IB side, the iWARP side has orderly shutdown semantics
>> that can delay the delivery of the CLOSE event for minutes. With this
>> implementation, life goes on and the object simply stays around until
>> the last reference is removed.
> 
> Even in IB, there's a CM object that hangs around after the
> user has called destroy, and it has returned.  This is fine;
> the user is unaware of this object.
> 
>> Please look at the handling of events in cm_event_handler. If the
>> state is DESTROYING, events are not queued for delivery. This handles
>> events that are generated by the provider after iw_destroy_cm_id has
>> returned. 
> 
> The problem is when the user calls destroy at the same time
> that an event is being generated.  If the event gets there
> first, a callback will run.  Destroy does not wait for that
> callback to complete before returning.  Hopefully, I've
> explained the situation a little better.
> 

I agree that the protocol difference does nothing to address
the problem of an unreaped event that was generated by/for a now
deceased object. The same problem exists for QPs.

There are at least two approaches to this: check for deleted
objects when reaping the event (and discarding those associated
with deleted objects), or defer the final deletion until all
completions *have* been reaped.