[openib-general] RE: cm crash

Sean Hefty sean.hefty at intel.com
Sun May 7 19:57:43 PDT 2006


>cm_process_work does:
>
>        cm_deref_id(cm_id_priv);
>        if (ret)
>                ib_destroy_cm_id(&cm_id_priv->id);
>
>assume that another thread calls ib_destroy_cm_id.
>Now
>
>        wait_event(cm_id_priv->wait, !atomic_read(&cm_id_priv->refcount));
>        while ((work = cm_dequeue_work(cm_id_priv)) != NULL)
>                cm_free_work(work);
>        kfree(cm_id_priv->compare_data);
>        kfree(cm_id_priv->private_data);
>        kfree(cm_id_priv);
>
>once the reference count reaches 0, this thread will wake.
>We now have two threads running destroy on the same id!

This is a user issue where they try to destroy the cm_id twice.  A user cannot
call ib_destroy_cm_id() and return non-zero from a callback on that same ID.

We cannot fix this in the CM.  If the thread calling ib_destroy_cm_id() is
delayed, then the callback handler will return, and cleanup will occur.  The
thread calling ib_destroy_cm_id() will then reference invalid memory.

- Sean



More information about the general mailing list