[openib-general] Re: [PATCH 3/3] iWARP CM - iWARP ConnectionManager

Tom Tucker tom at opengridcomputing.com
Wed Mar 22 12:21:08 PST 2006


On Wed, 2006-03-22 at 13:03 -0600, Steve Wise wrote:
> On Wed, 2006-03-22 at 10:28 -0800, Sean Hefty wrote:
> > >Once the the cm_id is connected, the provider must post a CLOSE event
> > >when it is done with the cm_id.  That's the model.  The IWCM will not
> > >free the cm_id until the CLOSE upcall happens.   Adding an explicit
> > >alloc_context/dealloc_context in the provider will just push this logic
> > >down into each provider.  IE:  The chelsio provider would block the
> > >dealloc_context call until the LLP connection is fully shut down.
> > 
> > I just think that this approach is susceptible subtle race conditions that will
> > be extremely difficult to debug.  And so far all of the patches submitted have
> > had some sort of race.  I do not know if there's a race in the latest
> > submission.  I'm just saying that the destruction is complex -- involving a
> > cm_id state, bit-flag, event state, and reference count -- which makes it
> > difficult to verify its correctness.

The previous patch used the filtering approach which was even more
complex than the current block and wait approach.

> 
> We're struggling with implementing an approach that meets your
> requirements -and- supports RDMAC verbs and iWARP providers...
> 
> > For example, as soon as the user calls connect(), can they receive a CLOSE
> > event, even before the connect() call returns?  
> 
> No.  connect results in a CONNECT_REPLY event always. Not a CLOSE
> event.   
> 
> Tom, can you post more info on the various events and their relationship
> to the cm_id states?  Maybe that will help?
[] around a string represent client calls
<> around a string represent provider events

IDLE [iw_cm_connect] --> 
	CONN_SENT <CONNECT_REPLY(accept)> --> 
		ESTABLISHED 

IDLE [iw_cm_listen] -->
	LISTENING <CONNECT_REQUEST> -->
		new_endpoint in CONN_RECV

CONN_RECV [iw_cm_accept] -->
	CONN_RECV <ESTABLISHED**> --> ESTABLISHED 

CONN_RECV [iw_cm_reject] --> 
	IDLE

ESTABLISHED [iw_cm_destroy] --> DESTROYING 

ESTABLISHED <DISCONNECT>--> CLOSING		// normal close

CLOSING     <CLOSE>	--> IDLE		// abortive close


** On iWARP there is no ESTABLISHED event in the provider. This
   event is generated by the IW CM to provide a vehicle for
   delivering the passive side connect complete event
   to the app via a callback. 

> 
> > If so, are there any issues
> > here?  Is it possible for the user to call down to the provider, after the
> > provider has generated a CLOSE event, resulting in accessing the wrong
> > connection, or crashing in the provider?
> 
> The IWCM prevents this, methinks, by failing any downcalls once the
> cm_id is no longer in a CONNECTED state.
> 
> > 
> > Note, that I'm not saying that providers need to block a call until everything
> > is shutdown.  It only needs to ensure that no callbacks will occur after
> > dealloc_context() returns.  Destroy_listen() should be providing similar logic,
> > so it ends up being in each provider anyway.
> 
> And destroy_listen() blocks until the RNIC acknowleges the destruction
> of the listening endpoint.  
> 
> For orderly close, the provider needs to wait until the close completes
> or is aborted due to protocol errors.  This can be implemented in many
> ways, but blocking the caller is perhaps the simplest.  Also, note, that
> the QP needs to stay around until it transitions out of CLOSING by the
> provider.  This further complicates things and is different from IB.    
> 
> 
> > At this point, I'm still trying to understand the operation.  When does the
> > provider allocate a context for the user?  My guess is when calling connect() or
> > listen().  When does the provider deallocate this context?  If it's not always
> > done in response to the user invoking a function, then we're almost certain to
> > have a race.

I didn't think that IB worked this way either since the user can
'deallocate' the context by returning a non-zero value from a callback.

But anyway...

RDMAC has a different model. The connection context is given to 
the provider, not allocated by the provider. The IW CM allocates 
this context and gives it to provider at connect or accept time. 
RDMAC actually specifies that the context is given to the
provider via a modify_qp attribute, but I was trying to avoid 
changing existing API.  An important fact is that the QP becomes 
bound to the connection and stays bound until the CLOSE event 
is received from the provider. The initiating peer does this by
moving the QP to either ERROR or CLOSING (SQD), but the QP remains 
associated until the provider says he's done via a CLOSE event.

Please see page 43 of the enclosed RDMAC Verbs Spec. I'm not saying 
whether this approach is right or wrong, only that that's the way
iWARP vendors implemented it because that's what was spec'd.


> connect(), listen(), and when incoming iwarp connections get
> established.
> 
> Deallocation for connected contexts can only happen after the TCP
> protocol has finished tearing down the connection.  This could be via an
> orderly close or an abortive close.
> 
> I hope I'm helping here...
> 
> Stevo.
> 

One more important thing to note is that the CLOSE event 
holds the last reference on the cm_id and therefore a destroy
initiated in the event thread cannot wait until the refcount 
goes to zero because the event that has this reference may 
not have occurred yet and will deadlock the event thread. The 
purpose of the destroy_flags in the cm_id_priv is to note 
whether the client thread or the event thread initiated the 
destruction of the cm_id. If it was the client thread, then 
the CLOSE event will remove the last reference and wake up the 
client thread that will then kfree the cm_id. If it was the 
event thread (via a non-zero return value from a user callback), 
then the CLOSE event will remove the last reference and kfree 
the cm_id synchronously.

> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf
Type: application/pdf
Size: 819455 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060322/bf1695e2/attachment.pdf>


More information about the general mailing list