[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

Fri Dec 21 10:22:29 PST 2007

What we do for heart-beat is using zero-byte rdma_write, the message goes to the peer QP only, there is no need to post anything
on remote side, no need for pinned memory.

--CQ

> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> Sent: Friday, December 21, 2007 12:09 PM
> To: Tang, Changqing
> Cc: pasha at dev.mellanox.co.il;
> mvapich-discuss at cse.ohio-state.edu;
> general at lists.openfabrics.org; Open MPI Developers
> Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP
> independent of any one user process
>
> On Friday 21 December 2007 19:13, Tang, Changqing wrote:
> > This kernel QP is for receiving only, so when there is no
> activity on
> > this QP, can the kernel sends a heart-beat message to check if the
> > remote sending QP is still there (still connected) ? if not, the
> > kernel is safe to cleanup this qp.
> >
> > So whenever the RC connection is broken, kernel can destroy this QP.
> >
> This increases the XRC complexity considerably:
>
> 1. Need to have a separate kernel thread which will scan ALL
> xrc domains on this host for XRC receive QPs.
>    This thread will need to do some form of RDMA_READ/WRITE,
> because otherwise it will interfere with
>    the remote (sending side) operation.  Furthermore, the
> sending-side XRC QP may not have anyone listening
>    on an associated XRC SRQ qp -- it is not meant to be set
> up to receive.  We only need an operation that
>    will yield a RETRY_EXCEEDED error completion if the
> connection has broken.
>
> 2. This opens the door for all sorts of nasty race
> conditions, since we will now have a bi-directional
>    protocol. For example, what if this feature is being
> combined with APM (valid for RC QPs), and we
>    are simply in the middle of a migration, and maybe
> communication is temporarily interrupted.
>    We will be killing off the QP without allowing any error
> recovery mechanism to work.
>
> 3. The application complexity goes up -- we now need the
> sending-side QP to declare a memory region and send
>    this region's address to the receiving side so that the
> receiving side (the kernel thread mentioned above)
>    can periodically try to read from this region.
>
> Still, I'll give this some thought.  For example, maybe we
> can rdma_read some random (illegal) address -- If the
> connection is alive, we'll get a "remote access error"
> completion, while if its dead, we'll get retry exceeded (need
> to check that the bad rdma read request does not cause the
> QPs to enter an error state).
>
> - Jack
>