[ofa-general] [RFC] XRC -- make receiving XRC QP independent of any one user process

Tang, Changqing changquing.tang at hp.com
Wed Jan 2 07:26:55 PST 2008


This interface is OK for me.

Now, every rank on a node who wants to receive message from the same remote rank must
know the same receiving QP number, and register for receiving using this QP number.

If rank B does not register (receiving QP has been created by another rank A on the node),
and sender know B's SRQ number, if sender sends a message to B, can B still receive this
message ?   (I hope, no register, no receive)

I hope to know the opinion from other MPI team, or other XRC user.



--CQ



> -----Original Message-----
> From: Jack Morgenstein [mailto:jackm at dev.mellanox.co.il]
> Sent: Monday, December 31, 2007 5:40 AM
> To: pasha at mellanox.co.il
> Cc: ishai at mellanox.co.il; Gleb Natapov; Roland Dreier; Tang,
> Changqing; general at lists.openfabrics.org
> Subject: Re: [ofa-general] [RFC] XRC -- make receiving XRC QP
> independent of any one user process
>
> > Tang, Changqing wrote:
> > >         If I have a MPI server processes on a node, many
> other MPI
> > > client processes will dynamically connect/disconnect with the
> > > server. The server use same XRC domain.
> > >
> > >         Will this cause accumulating the "kernel" QP for such
> > > application ? we want the server to run 365 days a year.
> > >
> > > I have some question about the scenario above. Did you
> call for the
> > > mpi disconnect on the both ends (server/client) before the client
> > > exit (did we must to do it?)
> >
> > Yes, both ends will call disconnect. But for us,
> MPI_Comm_disconnect()
> > call is not a collective call, it is just a local operation.
> >
> > --CQ
> >
> Possible solution (internal review as yet):
>
>   Each user process registers with the XRC QP:
>     a. each process registers ONCE. If it registers multiple
> times, there is no reference increment --
>        rather the registration succeeds, but only one PID
> entry is kept per QP.
>     b. Can have cleanup in the event of a process dying suddenly.
>     c. QP cannot be destroyed while there are any user
> processes still registered with it.
>
> libibverbs API is as follows:
>
> ==============================================================
> ========================
> /**
>  * ibv_xrc_rcv_qp_alloc - creates an XRC QP for serving as a
> receive-side only QP,
>  *      and moves the created qp through the RESET->INIT and
> INIT->RTR transitions.
>  *      (The RTR->RTS transition is not needed, since this QP
> does no sending).
>  *      The sending XRC QP uses this QP as destination, while
> specifying an XRC SRQ
>  *      for actually receiving the transmissions and
> generating all completions on the
>  *      receiving side.
>  *
>  *      This QP is created in kernel space, and persists
> until the last process registered
>  *      for the QP calls ibv_xrc_rcv_qp_unregister() (at
> which time the QP is destroyed).
>  *
>  * @pd: protection domain to use.  At lower layer, this
> provides access to userspace obj
>  * @xrc_domain: xrc domain to use for the QP.
>  * @attr: modify-qp attributes needed to bring the QP to RTR.
>  * @attr_mask:  bitmap indicating which attributes are
> provided in the attr struct.
>  *      used for validity checking.
>  * @xrc_rcv_qpn: qp_num of created QP (if success). To be
> passed to the remote node (sender).
>  *               The remote node will use xrc_rcv_qpn in
> ibv_post_send when sending to
>  *               XRC SRQ's on this host in the same xrc domain.
>  *
>  * RETURNS: success (0), or a (negative) error value.
>  *
>  * NOTE: this verb also registers the calling user-process
> with the QP at its creation time
>  *       (implicit call to ibv_xrc_rcv_qp_register), to avoid
> race conditions.
>  *       The creating process will need to call
> ibv_xrc_qp_unregister() for the QP to release it from
>  *       this process.
>  */
>
> int ibv_xrc_rcv_qp_alloc(struct ibv_pd *pd,
>                          struct ibv_xrc_domain *xrc_domain,
>                          struct ibv_qp_attr *attr,
>                          enum ibv_qp_attr_mask attr_mask,
>                          uint32_t *xrc_rcv_qpn);
>
> =====================================================================
>
> /**
>  * ibv_xrc_rcv_qp_register: registers a user process with an
> XRC QP which serves as
>  *         a receive-side only QP.
>  *
>  * @xrc_domain: xrc domain the QP belongs to (for verification).
>  * @xrc_qp_num: The (24 bit) number of the XRC QP.
>  *
>  * RETURNS: success (0),
>  *          or error (-EINVAL), if:
>  *            1. There is no such QP_num allocated.
>  *            2. The QP is allocated, but is not an receive XRC QP
>  *            3. The XRC QP does not belong to the given domain.
>  */
> int ibv_xrc_rcv_qp_register(struct ibv_xrc_domain
> *xrc_domain, uint32_t xrc_qp_num);
>
> =====================================================================
> /**
>  * ibv_xrc_rcv_qp_unregister: detaches a user process from an
> XRC QP serving as
>  *         a receive-side only QP. If as a result, there are
> no remaining userspace processes
>  *         registered for this XRC QP, it is destroyed.
>  *
>  * @xrc_domain: xrc domain the QP belongs to (for verification).
>  * @xrc_qp_num: The (24 bit) number of the XRC QP.
>  *
>  * RETURNS: success (0),
>  *          or error (-EINVAL), if:
>  *            1. There is no such QP_num allocated.
>  *            2. The QP is allocated, but is not an XRC QP
>  *            3. The XRC QP does not belong to the given domain.
>  * NOTE: I don't see any reason to return a special code if
> the QP is destroyed -- the unregister simply
>  *       succeeds.
>  */
> int ibv_xrc_rcv_qp_unregister(struct ibv_xrc_domain
> *xrc_domain, uint32_t xrc_qp_num);
> ==============================================================
> ===============================
>
> Usage:
>
> 1. Sender creates an XRC QP (sending QP) 2. Sender sends some
> receiving process on a remote node (say R1) a request to
> provide an XRC QP and XRC SRQ for
>    receiving messages (the request includes the sending QP number).
> 3. R1 calls ibv_xrc_rcv_qp_alloc() to create a receiving XRC
> QP in kernel space, and move
>    that QP up to RTR state. This function also registers
> process R1 with the XRC QP.
> 4. R1 calls ibv_create_xrc_srq() to create an SRQ for receive
> messages via the just created XRC QP.
> 5. R1 responds to request, providing the XRC qp number, and
> XRC SRQ number to be used in communication.
> 6. Sender then may wish to communicate with another receiving
> process on the remote host (say R2).
>    it sends a request to R2 containing the remote XRC QP
> number (obtained from R1)
>    which it will use to send messages.
> 7. R2 creates an XRC SRQ (if one does not already exist for
> the domain), and also
>    calls ibv_xrc_rcv_qp_register() to register the process R2
> with the XRC QP created by R1.
> 8. If R1 no longer needs to communicate with the sender, it
> calls ibv_xrc_rcv_qp_unregister() for the QP.
>    The QP will not yet be destroyed, since R2 is still
> registered with it.
> 9. If R2 no longer needs to communicate with the sender, it
> calls ibv_xrc_rcv_qp_unregister() for the QP.
>    At this point, the QP is destroyed, since no processes
> remain registered with it.
>
> NOTES:
> 1. The problem of the QP being destroyed and quickly
> re-allocated does not exist -- the upper bits of the
>    QP number are incremented at each allocation (except for
> the MSB which is always 1 for XRC QPs).  Thus,
>    even if the same QP is re-allocated, its QP number (stored
> in the QP object) will be different than
>    expected (unless it is re-destroyed/re-allocated several
> hundred times).
>
> 2. With this model, we do not need a heartbeat: if a
> receiving process dies, all XRC QPs it has registered for will
>    be unregistered as part of process cleanup in kernel space.
>
> - Jack
>
>



More information about the general mailing list