[openib-general] Re: [PATCH 3/3] iWARP CM - iWARPConnectionManager

Wed Mar 22 16:35:13 PST 2006

>> Tom, can you post more info on the various events and their relationship
>> to the cm_id states?  Maybe that will help?
>[] around a string represent client calls
><> around a string represent provider events

Thanks - just to be clear, my concern is that neither a client nor the iwcm try
to access memory that has been freed.  I'm trying to limit a state/event model
discussion to this scope.

For the client, this means that a callback is never invoked with a context that
the user has freed.  For the user to know when they can free their context, my
recommendation was to block iw_destroy_cm_id() until all outstanding callbacks
had completed, and no new callbacks would be invoked.

For the iwcm, I was suggesting to use the same model, but I'm fine with an
alternate approach, as long as it's relatively simple to verify its correctness.

>IDLE [iw_cm_connect] -->
>	CONN_SENT <CONNECT_REPLY(accept)> -->
>		ESTABLISHED
>
>IDLE [iw_cm_listen] -->
>	LISTENING <CONNECT_REQUEST> -->
>		new_endpoint in CONN_RECV
>
>CONN_RECV [iw_cm_accept] -->
>	CONN_RECV <ESTABLISHED**> --> ESTABLISHED
>
>CONN_RECV [iw_cm_reject] -->
>	IDLE
>
>ESTABLISHED [iw_cm_destroy] --> DESTROYING
>
>ESTABLISHED <DISCONNECT>--> CLOSING		// normal close
>
>CLOSING     <CLOSE>	--> IDLE		// abortive close
>
>
>** On iWARP there is no ESTABLISHED event in the provider. This
>   event is generated by the IW CM to provide a vehicle for
>   delivering the passive side connect complete event
>   to the app via a callback.

Is it possible to receive either a DISCONNECT or CLOSE event between calling
accept() and queuing the ESTABLISHED event?  It seems that it is.

>I didn't think that IB worked this way either since the user can
>'deallocate' the context by returning a non-zero value from a callback.

This results in calling the standard ib_destroy_cm_id() routine after returning
from the callback.  The user is unable to call this routine directly from their
callback because a reference is held on the cm_id while in the callback, which
would result in deadlock.  The alternative is for the user to spawn a thread to
call destroy.

>One more important thing to note is that the CLOSE event
>holds the last reference on the cm_id and therefore a destroy
>initiated in the event thread cannot wait until the refcount
>goes to zero because the event that has this reference may
>not have occurred yet and will deadlock the event thread. The
>purpose of the destroy_flags in the cm_id_priv is to note
>whether the client thread or the event thread initiated the
>destruction of the cm_id. If it was the client thread, then
>the CLOSE event will remove the last reference and wake up the
>client thread that will then kfree the cm_id. If it was the
>event thread (via a non-zero return value from a user callback),
>then the CLOSE event will remove the last reference and kfree
>the cm_id synchronously.

This is one of the areas that concerns me.  The thread acquiring the reference
until the CLOSE event can occur is separate from the thread releasing it.  I'm
unsure that the synchronization is there to ensure that the acquire will always
occur before the release.

- Sean