[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

Fri Dec 21 09:13:26 PST 2007

> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> Sent: Friday, December 21, 2007 2:32 AM
> To: Tang, Changqing
> Cc: pasha at dev.mellanox.co.il;
> mvapich-discuss at cse.ohio-state.edu;
> general at lists.openfabrics.org; Open MPI Developers
> Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP
> independent of any one user process
>
> On Thursday 20 December 2007 18:24, Tang, Changqing wrote:
> >        If I have a MPI server processes on a node, many other MPI
> > client processes will dynamically connect/disconnect with
> the server. The server use same XRC domain.
> >
> >         Will this cause accumulating the "kernel" QP for such
> > application ? we want the server to run 365 days a year.
>
> Yes, it will.  I have no way of knowing when a given
> receiving XRC QP is no longer needed -- except when the
> domain it belongs to is finally closed.
>
> I don't see that adding a userspace "destroy" verb for this
> QP will help:

This kernel QP is for receiving only, so when there is no activity on this QP,
can the kernel sends a heart-beat message to check if the remote sending QP
is still there (still connected) ? if not, the kernel is safe to cleanup
this qp.

So whenever the RC connection is broken, kernel can destroy this QP.

>
> The only one who actually knows that the XRC QP is no longer
> required is the userspace process which created the QP at the
> remote end of the RC connection of the receiving XRC QP.
>
> This remote process can only send a request to destroy the QP
> to some local process (via its own private protocol).
> However, you pointed out that the process which originally
> created the QP may not be around any more (this was the
> source of the problem which led to the RFC in this thread) --
> and sending the destroy request to all the remote processes
> on that node which it communicates with is REALLY ugly.
>
> I'm not familiar with MPI, so this may be a silly question:
> Can the MPI server process create a new domain for each
> client process, and destroy that domain when the client
> process is done (i.e., is this MPI server process a
> supervisor of resources for distributed computations (but is
> not a participant in these computations)?).

The server could be process group across multiple nodes, there are
parallel database searching engine, for example.

>
> (Actually, what I'm asking -- is it possible to allocate a
> new XRC domain for a distributed computation, and destroy
> that domain at the end of that computation?)

Yes, it could, but it makes MPI harder to manage the code. And also
we have a connect/accept speed concern.

We hope not to do it this way.

--CQ

>
>
> -- Jack
>