[openib-general] [CM] possible problem with crossing DREQs.
Sean Hefty
sean.hefty at intel.com
Thu Jun 9 12:47:03 PDT 2005
> I'm seeing an unusual problem when both halves of a connection
>actively disconnect at the same time. Each connection peer issues
>a DREQ at the same time, next each receive the DREQ and responds
>with a DREP, and finally each connection gets a callback for the
>transition to the idle state. However, at this point it appears
>that each CM keeps retransmitting DREQ requests, which then seems
>to interfere with new connection establishment.
I think that I understand what's happening. Receiving the DREQ
changed the state of the cm_id, but did not cancel the previous send.
I'm actually out on vacation for a little over two weeks (and will
be totally away from e-mail after Friday), but something
like the patch below might fix the issue. (Note that I didn't test /
compile this.) If it does work for you, feel free to commit it.
Signed-off-by: Sean Hefty <sean.hefty at intel.com>
Index: cm.c
===================================================================
--- cm.c (revision 2568)
+++ cm.c (working copy)
@@ -1813,9 +1813,12 @@ static int cm_dreq_handler(struct cm_wor
switch (cm_id_priv->id.state) {
case IB_CM_REP_SENT:
- case IB_CM_MRA_REP_RCVD:
- case IB_CM_ESTABLISHED:
case IB_CM_DREQ_SENT:
+ ib_cancel_mad(cm_id_priv->av.port->mad_agent,
+ (unsigned long) cm_id_priv->msg);
+ break;
+ case IB_CM_ESTABLISHED:
+ case IB_CM_MRA_REP_RCVD:
break;
case IB_CM_TIMEWAIT:
if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg))
More information about the general
mailing list