[ofa-general] [PATCH] Fix racy deadlock in cma
Kanoj Sarcar
kanoj at netxen.com
Wed Oct 3 10:09:16 PDT 2007
Hello,
Here's a possible (aka easily reproducible) deadlock scenario involving
cma.c's global mutex "lock" while destroying a listener.
Assume provider has done a IW_CM_EVENT_CONNECT_REQUEST upcall on behalf
of listener, thus iwcm.c:cm_event_handler() will cause refcount to be
bumped and iw_cm_wq to be scheduled to execute cm_work_handler().
cma.c:rdma_destroy_id() is invoked on the listener causing invocation of
the call chain
cma_cancel_operation():cma_cancel_listens():cma_destroy_listen():iw_destroy_cm_id()
with the global "lock" held; iw_destroy_cm_id() will do
wait_for_completion(), waiting for the
listener refcount to get to 0.
When iw_cm_wq gets to run, it executes
cm_work_handler():process_event():cm_conn_req_handler():iw_conn_req_handler(),
which tries to get the global "lock" (held as described previously) and
goes to sleep. The deadlock is because iw_cm_wq needs to execute
cm_work_handler():iwcm_deref_id() for things to make forward progress.
Notice that cm_conn_req_handler() tries to exit early if listener
destruct has started (by checking IW_CM_STATE_LISTEN).
iw_conn_req_handler() does similar checks on CMA_LISTEN. But there is a
race window with the destruct path, such that the upcall path waits for
the mutex which the destruct path acquires.
Appended patch fixes the problem.
Thanks.
Kanoj
--- drivers/infiniband/core/cma.c 2006-12-13 17:14:23.000000000 -0800
+++ /tmp/cma.c 2007-10-03 00:48:32.000000000 -0700
@@ -624,6 +624,7 @@
cma_exch(id_priv, CMA_DESTROYING);
if (id_priv->cma_dev) {
+ mutex_unlock(&lock);
switch
(rdma_node_get_transport(id_priv->id.device->node_type))
{
case RDMA_TRANSPORT_IB:
if (id_priv->cm_id.ib && !IS_ERR(id_priv->cm_id.ib))
@@ -636,6 +637,7 @@
default:
break;
}
+ mutex_lock(&lock);
cma_detach_from_dev(id_priv);
}
list_del(&id_priv->listen_list);
More information about the general
mailing list