[openib-general] multiple RDMA_CM_EVENT_DISCONNECTED callbacks

Eric Barton eeb at bartonsoftware.com
Fri May 19 10:10:43 PDT 2006


Hi,

I'm using the rdam_cm API.  I've seen it call my CM callback with
RDMA_CM_EVENT_DISCONNECTED twice.  Is this a bug?

I've implemented connection teardown procedure that checks whether this is the
first time it has been called on a given connection.  If not, it's a NOOP.
Otherwise it schedules the connection for teardown by a thread.  This thread
calls rdma_disconnect() and then explicitly moves the QP state to error (but
maybe that's redundant?).

I call this teardown procedure any time sends or receives complete with error,
or when I get the RDMA_CM_EVENT_DISCONNECTED callback.  I refcount my
connections, so after I've called the teardown function, I'm basically just
waiting for the refs to drain to zero, including the CM ref with is released
when I see RDMA_CM_EVENT_DISCONNECTED.

I had a typo in my code that meant that I was sending with an opcode
IB_WC_SEND, rather than IB_WR_SEND, causing a remote access error (IB_WC_SEND
== 0 == IB_WR_RDMA_WRITE).  Posted receives on the remote QP all completed with
error (I guess openib moved the QP state to error) and the send complete
locally with error.  This meant that both sides were racing to call
rdma_disconnect(), and that's when I got 2 CM callbacks with
RDMA_CM_EVENT_DISCONNECTED for the same connection.

All this was running on a 2.6.9 EL based kernel (LLNL) and openib subversion
version #6829.

-- 

                Cheers,
                        Eric

---------------------------------------------------
|Eric Barton        Barton Software               |
|9 York Gardens     Tel:    +44 (117) 330 1575    |
|Clifton            Mobile: +44 (7909) 680 356    |
|Bristol BS8 4LL    Fax:    call first            |
|United Kingdom     E-Mail: eeb at bartonsoftware.com|
---------------------------------------------------




More information about the general mailing list