[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

Jack Morgenstein jackm at dev.mellanox.co.il
Fri Dec 21 10:09:26 PST 2007


On Friday 21 December 2007 19:13, Tang, Changqing wrote:
> This kernel QP is for receiving only, so when there is no activity on this QP,
> can the kernel sends a heart-beat message to check if the remote sending QP
> is still there (still connected) ? if not, the kernel is safe to cleanup
> this qp.
> 
> So whenever the RC connection is broken, kernel can destroy this QP.
> 
This increases the XRC complexity considerably:

1. Need to have a separate kernel thread which will scan ALL xrc domains on this host for XRC receive QPs.
   This thread will need to do some form of RDMA_READ/WRITE, because otherwise it will interfere with
   the remote (sending side) operation.  Furthermore, the sending-side XRC QP may not have anyone listening
   on an associated XRC SRQ qp -- it is not meant to be set up to receive.  We only need an operation that
   will yield a RETRY_EXCEEDED error completion if the connection has broken.

2. This opens the door for all sorts of nasty race conditions, since we will now have a bi-directional
   protocol. For example, what if this feature is being combined with APM (valid for RC QPs), and we
   are simply in the middle of a migration, and maybe communication is temporarily interrupted.
   We will be killing off the QP without allowing any error recovery mechanism to work.

3. The application complexity goes up -- we now need the sending-side QP to declare a memory region and send
   this region's address to the receiving side so that the receiving side (the kernel thread mentioned above)
   can periodically try to read from this region.

Still, I'll give this some thought.  For example, maybe we can rdma_read some random (illegal) address --
If the connection is alive, we'll get a "remote access error" completion, while if its dead, we'll get
retry exceeded (need to check that the bad rdma read request does not cause the QPs to enter an error state).

- Jack



More information about the general mailing list