[ofa-general] hotplug event handle question

Michael S. Tsirkin mst at dev.mellanox.co.il
Wed Apr 25 21:05:53 PDT 2007


> Quoting Sean Hefty <mshefty at ichips.intel.com>:
> Subject: Re: [ofa-general] hotplug event handle question
> 
> >        if (!cma_comp(id_priv, CMA_CONNECT) &&
> >            !cma_comp(id_priv, CMA_DISCONNECT))
> >                return -EINVAL;
> 
> This check only ensures that we have a valid underlying cm_id (cm_id.ib or 
> cm_id.iw) and are bound to a device, with the underlying cm's providing the 
> synchronization that we need.  To allow rdma_disconnect() to be called 
> after the device has been removed will likely take a slight re-working of 
> the states. (I'm more concerned about userspace clients calling 
> rdma_disconnect at the wrong time and crashing the kernel than a misuse 
> from kernel clients.)  I don't think that the device removal code should 
> transition the QP into the error state underneath the user, so fixing 
> rdma_disconnect seems like the way to go.

I think the problem is that cma_remove_id_dev overrides the current state,
losing state information in the process. Why do we need CMA_DEVICE_REMOVAL
at all? Everything seems to work fine just by forwarding RDMA_CM_EVENT_DEVICE_REMOVAL
to user, without touching state.

> 
> I will work on a fix for this.  In the meantime, the alternatives are 
> either to remove the check or have the ULP transitioning QP into the error 
> state.
>
> 
> Steve, I don't see where the iwarp code transitions the cm_id to 
> CMA_DISCONNECT. Does iwarp keep the cm_id in the CMA_CONNECT state until 
>  it is destroyed?

Don't lose state on RDMA_CM_EVENT_DEVICE_REMOVAL.
Without this patch, rdma_disconnect won't move the QP to error.

Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>

---

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index fde92ce..9f37eac 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -92,7 +92,6 @@ enum cma_state {
 	CMA_DISCONNECT,
 	CMA_ADDR_BOUND,
 	CMA_LISTEN,
-	CMA_DEVICE_REMOVAL,
 	CMA_DESTROYING
 };
 
@@ -2689,20 +2688,13 @@ static void cma_add_one(struct ib_device *device)
 static int cma_remove_id_dev(struct rdma_id_private *id_priv)
 {
 	struct rdma_cm_event event;
-	enum cma_state state;
 
-	/* Record that we want to remove the device */
-	state = cma_exch(id_priv, CMA_DEVICE_REMOVAL);
-	if (state == CMA_DESTROYING)
+	if (cma_comp(id_priv, CMA_DESTROYING))
 		return 0;
 
-	cma_cancel_operation(id_priv, state);
+	cma_cancel_operation(id_priv, id_priv->state);
 	wait_event(id_priv->wait_remove, !atomic_read(&id_priv->dev_remove));
 
-	/* Check for destruction from another callback. */
-	if (!cma_comp(id_priv, CMA_DEVICE_REMOVAL))
-		return 0;
-
 	memset(&event, 0, sizeof event);
 	event.event = RDMA_CM_EVENT_DEVICE_REMOVAL;
 	return id_priv->id.event_handler(&id_priv->id, &event);

-- 
MST



More information about the general mailing list