[ofa-general] Problem with dropped CQE's on RDMA CM channel
Mike Heffner
mike.heffner at evergrid.com
Tue Mar 20 15:52:39 PDT 2007
Sean Hefty wrote:
>> Can you call rdma_disconnect() immediately after posting sends on the
>> QP? I don't see any CQE's come back with errors but they appear to
>> "disappear" and never get signaled on one peer side. Are there any
>> potential race issues to avoid here (it only happens about one out of
>> every 100 connections)?
>
> rdma_disconnect() will immediately transition the QP into the error
> state, which can affect queued send operations.
>
> I think the situation that you're describing could happen if the side
> that received the send transitioned the QP into the error state, but the
> ACK sent back to the sender was lost.
That's what I was worried about. Is there anyway to manually flush a QP
before disconnecting it?
>
> You could try having one side initiate the rdma_disconnect, with the
> other side waiting for the disconnect event before calling
> rdma_disconnect itself.
Unfortunately I still end up with a situation that whatever data was
last sent on the connection may not get flushed correctly to the
receiver if the sender shuts down the channel -- immediately calling
rdma_disconnect() and transitioning the QP to error/reset state. The
handshake with send messages was my attempt at avoiding that problem. ;-)
>
> - Sean
>
>
Mike
--
Mike Heffner <mike.heffner at evergrid.com>
EverGrid Software
Blacksburg, VA USA
Voice: (540) 443-3500 x603
More information about the general
mailing list