[ofa-general] Problem with dropped CQE's on RDMA CM channel

Sean Hefty mshefty at ichips.intel.com
Tue Mar 20 12:02:06 PDT 2007


> Can you call rdma_disconnect() immediately after posting sends on the 
> QP? I don't see any CQE's come back with errors but they appear to 
> "disappear" and never get signaled on one peer side. Are there any 
> potential race issues to avoid here (it only happens about one out of 
> every 100 connections)?

rdma_disconnect() will immediately transition the QP into the error state, which 
can affect queued send operations.

I think the situation that you're describing could happen if the side that 
received the send transitioned the QP into the error state, but the ACK sent 
back to the sender was lost.  Can anyone confirm this?

You could try having one side initiate the rdma_disconnect, with the other side 
waiting for the disconnect event before calling rdma_disconnect itself.

- Sean



More information about the general mailing list