[ofa-general] [PATCH] fix flow of handling duplicate SIDR REQs

Or Gerlitz ogerlitz at voltaire.com
Mon Jul 2 02:46:27 PDT 2007


Hi Sean,

When the process on the passive side is somehow slow to react on a
SIDR request, a --retry-- sent by the active side CM causes the passive
side CM to send a SIDR REP with IB_SIDR_REJECT status. This makes the
active side CM to deliver up IB_CM_SIDR_REP_RECEIVED event with status
IB_SIDR_REJECT etc.

Later, when the process calls rdma_accept --> ib_send_cm_sidr_rep etc,
another SIDR REP with status IB_SIDR_SUCCESS is sent, but its too late.

This seems to be solved with the below patch, however, i see that
for duplicate REQs the code is much more involved, which means i
might be over-simplifying here...

To reproduce the problem/see the fix effect, you can run passive udaddy,
suspend it, then run active udaddy, and then resume the passive. Without
the patch, the active gets RDMA_CM_EVENT_UNREACHABLE with status 2, where
with the patch its working fine.

Or.
----------------------------------

Don't reject SIDR REQ retries which are received before
the passive side had the chance to send SIDR REP.

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

Index: ofa_kernel-1.2/drivers/infiniband/core/cm.c
===================================================================
--- ofa_kernel-1.2.orig/drivers/infiniband/core/cm.c	2007-07-02 12:20:13.000000000 +0300
+++ ofa_kernel-1.2/drivers/infiniband/core/cm.c	2007-07-02 12:35:17.000000000 +0300
@@ -746,7 +746,8 @@ retest:
 		break;
 	case IB_CM_SIDR_REQ_RCVD:
 		spin_unlock_irqrestore(&cm_id_priv->lock, flags);
-		cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
+		if (!err)
+			cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
 		break;
 	case IB_CM_REQ_SENT:
 		ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
@@ -2835,7 +2836,7 @@ static int cm_sidr_req_handler(struct cm
 	cur_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
 	if (cur_cm_id_priv) {
 		spin_unlock_irqrestore(&cm.lock, flags);
-		goto out; /* Duplicate message. */
+		goto out_dup; /* Duplicate message. */
 	}
 	cur_cm_id_priv = cm_find_listen(cm_id->device,
 					sidr_req_msg->service_id,
@@ -2858,6 +2859,9 @@ static int cm_sidr_req_handler(struct cm
 	cm_process_work(cm_id_priv, work);
 	cm_deref_id(cur_cm_id_priv);
 	return 0;
+out_dup:
+	cm_destroy_id(&cm_id_priv->id, -1);
+	return -EINVAL;
 out:
 	ib_destroy_cm_id(&cm_id_priv->id);
 	return -EINVAL;



More information about the general mailing list