[openib-general] Re: uCM create connection ID

Thu Jun 30 12:42:12 PDT 2005

On Thu, Jun 30, 2005 at 09:13:28AM -0700, Sean Hefty wrote:
> Libor Michalek wrote:
> >   Assume that the userspace 'struct ib_cm_event' contains the cm_id as
> > well as a new 'u64 context' which is inherited from the cm_id, and is
> > set at the time of the cm_id creation. This is what I'm assuming that
> > Arlin would like to see.
> > 
> >   In the case of two threads accessing the CM at once there's a race
> > condition if you are going to use the 'context' variable as a pointer
> > to memory:
> > 
> > Thread 1                              Thread 2
> > ------------------------------------- -----------------------------------
> > cm_object = malloc(sizeof(*cm_object)
> > ib_cm_create_id(&cm_object->cm_id,
> >                 (u64)cm_object)
> > 
> >                                     ib_cm_event_get(&event)
> > ib_cm_destroy_id(cm_object->cm_id)
> > free(cm_object);
> >                                     process_event((void *)event->context);
> 
> I see.  This appears to come from a difference between the event reporting 
> model used by the kernel CM versus the usermode CM (callback versus 
> calldown).

  Do you block the destroy on a lock while a callback for that cm_id
is active? I wouldn't say that the difference is attributed to callback
vs. calldown, in both cases it's a matter of serializing the destroy
with the event.

>  Maybe there's a way to assist the user here.  Can we report a 
> destruction event, or require a second call to indicate that an event has 
> been processed?

  A destruction event could work, but with some limits which might make
it impracticle. The user would have to be really carefull not to do 
_anything_ with the object after calling destroy, and only cleanup in
the same thread that is used to get the destroy completion event. The
destroy completion event could be retreived and processed before the
original destroy call returns. Also, the user would need to make sure 
that they are getting events in a _single_ thread, since multiple event
get threads could pose the same problem as before. 

  Blocking on the destroy seems like it could be error prone, that you
could easily deadlock the user, who probably has a lock around the
object which contains the cm_id...

  We could build the serialization table for the API consumer, have
all cm_id calls and events go through a level of indirection in a
table locked against multiple threads. This was the way we ended up
doing it in our old code for the userCM that we used for uDAPL. I
had left this out since it seems reasonable that not all apps would
want/need this guarantee from the API, and that they could implement
it themselves if they did want it... I could be wrong.

-Libor