[ofa-general] [PATCH] fix flow of handling duplicate SIDR REQs
Or Gerlitz
ogerlitz at voltaire.com
Mon Jul 2 02:46:27 PDT 2007
Hi Sean,
When the process on the passive side is somehow slow to react on a
SIDR request, a --retry-- sent by the active side CM causes the passive
side CM to send a SIDR REP with IB_SIDR_REJECT status. This makes the
active side CM to deliver up IB_CM_SIDR_REP_RECEIVED event with status
IB_SIDR_REJECT etc.
Later, when the process calls rdma_accept --> ib_send_cm_sidr_rep etc,
another SIDR REP with status IB_SIDR_SUCCESS is sent, but its too late.
This seems to be solved with the below patch, however, i see that
for duplicate REQs the code is much more involved, which means i
might be over-simplifying here...
To reproduce the problem/see the fix effect, you can run passive udaddy,
suspend it, then run active udaddy, and then resume the passive. Without
the patch, the active gets RDMA_CM_EVENT_UNREACHABLE with status 2, where
with the patch its working fine.
Or.
----------------------------------
Don't reject SIDR REQ retries which are received before
the passive side had the chance to send SIDR REP.
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Index: ofa_kernel-1.2/drivers/infiniband/core/cm.c
===================================================================
--- ofa_kernel-1.2.orig/drivers/infiniband/core/cm.c 2007-07-02 12:20:13.000000000 +0300
+++ ofa_kernel-1.2/drivers/infiniband/core/cm.c 2007-07-02 12:35:17.000000000 +0300
@@ -746,7 +746,8 @@ retest:
break;
case IB_CM_SIDR_REQ_RCVD:
spin_unlock_irqrestore(&cm_id_priv->lock, flags);
- cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
+ if (!err)
+ cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
break;
case IB_CM_REQ_SENT:
ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg);
@@ -2835,7 +2836,7 @@ static int cm_sidr_req_handler(struct cm
cur_cm_id_priv = cm_insert_remote_sidr(cm_id_priv);
if (cur_cm_id_priv) {
spin_unlock_irqrestore(&cm.lock, flags);
- goto out; /* Duplicate message. */
+ goto out_dup; /* Duplicate message. */
}
cur_cm_id_priv = cm_find_listen(cm_id->device,
sidr_req_msg->service_id,
@@ -2858,6 +2859,9 @@ static int cm_sidr_req_handler(struct cm
cm_process_work(cm_id_priv, work);
cm_deref_id(cur_cm_id_priv);
return 0;
+out_dup:
+ cm_destroy_id(&cm_id_priv->id, -1);
+ return -EINVAL;
out:
ib_destroy_cm_id(&cm_id_priv->id);
return -EINVAL;
More information about the general
mailing list