[openib-general] [CM] possible problem with crossing DREQs.

Sean Hefty sean.hefty at intel.com
Thu Jun 9 12:47:03 PDT 2005


>  I'm seeing an unusual problem when both halves of a connection
>actively disconnect at the same time. Each connection peer issues
>a DREQ at the same time, next each receive the DREQ and responds
>with a DREP, and finally each connection gets a callback for the
>transition to the idle state. However, at this point it appears
>that each CM keeps retransmitting DREQ requests, which then seems
>to interfere with new connection establishment.

I think that I understand what's happening.  Receiving the DREQ
changed the state of the cm_id, but did not cancel the previous send.

I'm actually out on vacation for a little over two weeks (and will
be totally away from e-mail after Friday), but something
like the patch below might fix the issue.  (Note that I didn't test /
compile this.)  If it does work for you, feel free to commit it.

Signed-off-by: Sean Hefty <sean.hefty at intel.com>


Index: cm.c
===================================================================
--- cm.c	(revision 2568)
+++ cm.c	(working copy)
@@ -1813,9 +1813,12 @@ static int cm_dreq_handler(struct cm_wor
 
 	switch (cm_id_priv->id.state) {
 	case IB_CM_REP_SENT:
-	case IB_CM_MRA_REP_RCVD:
-	case IB_CM_ESTABLISHED:
 	case IB_CM_DREQ_SENT:
+		ib_cancel_mad(cm_id_priv->av.port->mad_agent,
+			      (unsigned long) cm_id_priv->msg);
+		break;
+	case IB_CM_ESTABLISHED:
+	case IB_CM_MRA_REP_RCVD:
 		break;
 	case IB_CM_TIMEWAIT:
 		if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))






More information about the general mailing list