[ofa-general] Problem with dropped CQE's on RDMA CM channel
Mike Heffner
mike.heffner at evergrid.com
Tue Mar 20 12:52:56 PDT 2007
Forgot to mention that this is with OFED 1.1 on a SUSE 10 box:
Linux amd13 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 x86_64
x86_64 x86_64 GNU/Linux
with a "Mellanox Technologies MT23108 InfiniHost (rev a1)" PCI-X card
with firmware version 3.5.0.
Mike Heffner wrote:
> Hi,
>
> I'm writing a program that allows two clients to communicate over an RC
> channel that is connected using the RDMA CM. To negotiate a clean
> shutdown of the channel both clients send IBV_WR_SEND's with the
> IBV_SEND_SIGNALED bit set. The connection is only rdma_disconnect()'d
> when a client receives the CQE from its signaled send and the CQE from
> the peer's incoming IBV_WR_SEND (ie., when the peer receives the send).
> This ensures that both clients have conceptually called "close()" on
> both ends of the connection before the connection is torn down and the
> QP moved into the error state with rdma_disconnect().
>
> The problem I'm seeing is that occasionally one peer will not receive
> both CQE's while the other peer has successfully received both and has
> called rdma_disconnect(). What's odd is that one client may not receive
> the local CQE for the "signaled" IBV_WR_SEND send even though the peer
> has received the client's send. Since one peer does not receive both CQE
> events, the connection remains in an open state and does not get cleaned
> up appropriately.
>
> Can you call rdma_disconnect() immediately after posting sends on the
> QP? I don't see any CQE's come back with errors but they appear to
> "disappear" and never get signaled on one peer side. Are there any
> potential race issues to avoid here (it only happens about one out of
> every 100 connections)?
>
> Any assistance would be greatly appreciated.
>
>
> Thanks,
>
> Mike
>
--
Mike Heffner <mike.heffner at evergrid.com>
EverGrid Software
Blacksburg, VA USA
Voice: (540) 443-3500 x603
More information about the general
mailing list