[openib-general] Re: [PATCH 3/3] iWARP CM - iWARP ConnectionManager

Sean Hefty sean.hefty at intel.com
Wed Mar 22 10:28:50 PST 2006


>Once the the cm_id is connected, the provider must post a CLOSE event
>when it is done with the cm_id.  That's the model.  The IWCM will not
>free the cm_id until the CLOSE upcall happens.   Adding an explicit
>alloc_context/dealloc_context in the provider will just push this logic
>down into each provider.  IE:  The chelsio provider would block the
>dealloc_context call until the LLP connection is fully shut down.

I just think that this approach is susceptible subtle race conditions that will
be extremely difficult to debug.  And so far all of the patches submitted have
had some sort of race.  I do not know if there's a race in the latest
submission.  I'm just saying that the destruction is complex -- involving a
cm_id state, bit-flag, event state, and reference count -- which makes it
difficult to verify its correctness.

For example, as soon as the user calls connect(), can they receive a CLOSE
event, even before the connect() call returns?  If so, are there any issues
here?  Is it possible for the user to call down to the provider, after the
provider has generated a CLOSE event, resulting in accessing the wrong
connection, or crashing in the provider?

Note, that I'm not saying that providers need to block a call until everything
is shutdown.  It only needs to ensure that no callbacks will occur after
dealloc_context() returns.  Destroy_listen() should be providing similar logic,
so it ends up being in each provider anyway.

At this point, I'm still trying to understand the operation.  When does the
provider allocate a context for the user?  My guess is when calling connect() or
listen().  When does the provider deallocate this context?  If it's not always
done in response to the user invoking a function, then we're almost certain to
have a race.

- Sean



More information about the general mailing list