[ofa-general] Problem with dropped CQE's on RDMA CM channel

Mike Heffner mike.heffner at evergrid.com
Tue Mar 20 15:52:39 PDT 2007


Sean Hefty wrote:
>> Can you call rdma_disconnect() immediately after posting sends on the 
>> QP? I don't see any CQE's come back with errors but they appear to 
>> "disappear" and never get signaled on one peer side. Are there any 
>> potential race issues to avoid here (it only happens about one out of 
>> every 100 connections)?
> 
> rdma_disconnect() will immediately transition the QP into the error 
> state, which can affect queued send operations.
> 
> I think the situation that you're describing could happen if the side 
> that received the send transitioned the QP into the error state, but the 
> ACK sent back to the sender was lost.

That's what I was worried about. Is there anyway to manually flush a QP 
before disconnecting it?

> 
> You could try having one side initiate the rdma_disconnect, with the 
> other side waiting for the disconnect event before calling 
> rdma_disconnect itself.

Unfortunately I still end up with a situation that whatever data was 
last sent on the connection may not get flushed correctly to the 
receiver if the sender shuts down the channel -- immediately calling 
rdma_disconnect() and transitioning the QP to error/reset state. The 
handshake with send messages was my attempt at avoiding that problem. ;-)

> 
> - Sean
> 
> 


Mike

-- 

   Mike Heffner <mike.heffner at evergrid.com>
   EverGrid Software
   Blacksburg, VA USA

   Voice: (540) 443-3500 x603



More information about the general mailing list