[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

Fri Dec 21 00:31:59 PST 2007

On Thursday 20 December 2007 18:24, Tang, Changqing wrote:
>        If I have a MPI server processes on a node, many other MPI client processes will dynamically
> connect/disconnect with the server. The server use same XRC domain.
> 
>         Will this cause accumulating the "kernel" QP for such application ? we want the server to run 365 days
> a year.

Yes, it will.  I have no way of knowing when a given receiving XRC QP is no longer needed -- 
except when the domain it belongs to is finally closed.

I don't see that adding a userspace "destroy" verb for this QP will help:

The only one who actually knows that the XRC QP is no longer required is the userspace process which created
the QP at the remote end of the RC connection of the receiving XRC QP.

This remote process can only send a request to destroy the QP to some local process (via its own private protocol).
However, you pointed out that the process which originally created the QP may not be around any more (this was the
source of the problem which led to the RFC in this thread) -- and sending the destroy request to all the remote
processes on that node which it communicates with is REALLY ugly.

I'm not familiar with MPI, so this may be a silly question: Can the MPI server process create a 
new domain for each client process, and destroy that domain when the client process is done
(i.e., is this MPI server process a supervisor of resources for distributed computations 
(but is not a participant in these computations)?).

(Actually, what I'm asking -- is it possible to allocate a new XRC domain for a distributed computation, and destroy
that domain at the end of that computation?)

-- Jack