[openib-general] ucm.c review

Sean Hefty sean.hefty at intel.com
Tue Aug 2 15:13:10 PDT 2005


Libor,

Arlin has been seeing some hangs running some multi-threaded DAPL tests with the
CM.  I looked over the ucm.c code and wanted to get feedback on some potential
changes before working on them.  (Some of them are fairly minor.)

The main issue that I found is possible deadlock calling ib_destroy_cm_id() from
within a CM callback, which requires some rework.  See below for more details.
I will begin work on some of these, but wanted to verify the approach to take
regarding the destruction of the ucm_id.

- Sean

-------------------

ib_ucm_ctx_get() - Add a check that ctx-file == file.  This check is made in
most places where ib_ucm_ctx_get() is called.

ib_ucm_ctx_put() - Remove the automatic destruction when the last reference
count goes to 0.  Destruction and cleanup of struct ib_ucm_context would be done
only in ib_ucm_destroy_id().

I believe that we only need to prevent ib_destroy_cm_id() from being called
while another kernel CM call is in progress.  We may be able to use the mutex in
struct ib_ucm_context for this.  (It doesn't appear to be used anywhere.)
Reference counting may still work better though.

ib_ucm_event_rej_get()
ib_ucm_event_mra_get()
ib_ucm_event_lap_get()
ib_ucm_event_apr_get()
ib_ucm_event_sidr_req_get()
Any objection to removing these functions?

ib_ucm_event_process() - for IB_CM_REQ_RECEIVED, the primary path must exist, so
we can remove the check.  As a minor optimization, we could combine the three
allocations associated with struct ib_ucm_event (structure itself, private data,
and info).

ib_ucm_event_handler() - We could store a reference to struct ib_ucm_context
directly in the cm_id, rather than needing to perform a lookup.

We also need to avoid call paths to ib_destroy_cm_id.  Note that calls from a
separate thread into ib_destroy_cm_id will block while we're in this callback,
so the cm_id context is safe to access.

I'm not sure how to handle errors reporting events here.  Currently, the code
returns an error status back to the CM, which will result in the destruction of
the cm_id.  The cm_id will be destroyed a second time when the ucm_context is
destroyed.  It may be best to just drop the event.

ib_ucm_create_id() - We need to serialize with file->mutex around the call to
ib_ucm_ctx_alloc().

ib_ucm_destroy_id() - Missing ctx->file check against file parameter.  I believe
that we want to move the majority of the ib_ucm_ctx_put() routine to here.

ib_ucm_close() - Change call to ib_ucm_ctx_put() to ib_ucm_destroy_id().





More information about the general mailing list