[openib-general] RFC userspace / MPI multicast support

Hal Rosenstock halr at voltaire.com
Thu Apr 20 04:55:12 PDT 2006


Hi Sean,

On Wed, 2006-04-19 at 15:05, Sean Hefty wrote:
> I'd like to get some feedback regarding the following approach to supporting
> multicast groups in userspace, and in particular for MPI.  Based on side
> conversations, I need to know if this approach would meet the needs of MPI
> developers.
> 
> To join / leave a multicast group,

MC groups also need to be created and deleted as well. Creating and
deleting the group are assumed under the covers (first joiner, last
leaver) so the additional MC parameters for creation need to be
available on all adds.

>  my proposal is to add the following APIs to
> the rdma_cm.  (Note I haven't implemented this yet, so I'm just assuming that
> it's possible at this point.)
> 
> /* Asynchronously join a multicast group. */
> int rdma_set_option(struct rdma_cm_id *id, int level, int optname,
> 			  void *optval, size_t optlen);
> 
> /* Retrieve multicast group information - not usually called. */
> int rdma_get_option(struct rdma_cm_id *id, int level, int optname,
> 			  void *optval, size_t optlen);
> 
> /*
>  * Post a message on the QP associated with the cm_id for the
>  * specified multicast address.
> */
> int rdma_sendto(struct rdma_cm_id *id, struct ibv_send_wr *send_wr,
> 		    struct sockaddr *to);
> 
> ---
> 
> As an example of how these APIs would be used:
> 
> /* The cm_id provides event handling and context. */
> rdma_create_id(&id, context);
> 
> /* Bind to a local interface to attach to a local device. */
> rdma_bind_addr(id, local_addr);
> 
> /* Allocate a PD, CQs, etc. */
> pd = ibv_alloc_pd(id->verbs);
> ...
> 
> /*
>  * Create a UD QP associated with the cm_id.
>  * TBD: automatically transition the QP to RTS for UD QP types?
>  */
> rdma_create_qp(id, pd, init_attr);
> 
> /* Bind to multicast group. */
> mcast_ip = 224.0.0.74.71; /* some fine mcast addr */

How are the MGIDs formed from this IP address ? Is the same algorithm as
IPoIB used ?

Are the MGIDs constrained to use 0x401B in the signature part (and
0x601B if this is extended to IPv6) ?

BTW, this example has too many bytes...

> ip_mreq.imr_multiaddr = mcast_ip.in_addr;
> rdma_set_option(id, RDMA_PROTO_IP, IP_ADD_MEMBERSHIP, &ip_mreq,
> 		    sizeof(ip_mreq));

The API only supports ADD/DROP. It lacks support for JoinStates.
(I don't think the IP semantics are rich enough for IB; this was
previously pointed out in the context of IP routers quite a while ago).

> /* Wait for join to complete. */
> rdma_get_cm_event(&event);
> if (event->event == RDMA_CM_EVENT_JOIN_COMPLETE)
> 	/* join worked - we could call rdma_get_option() here */
> 	/* The rdma_cm attached the QP to the multicast group for us. */
> ...
> rdma_ack_cm_event(event);
> 
> /*
>  * Format a send wr.  The ah, remote_qpn, and remote_qkey are
>  * filled out by the rdma_cm based on the provided destination
>  * address.
>  */
> rdma_sendto(id, send_wr, &mcast_ip);
> 
> ---
> 
> The multicast group information is created / managed by the rdma_cm.  The
> rdma_cm defines the mgid, q_key, p_key, sl, flowlabel, tclass, and joinstate.
> Except for mgid, these would most likely match the values used by the ipoib
> broadcast group.  The mgid mapping would be similar to that used by ipoib.

Does that limit the MGIDs to use IP signatures ?

-- Hal

>   The
> actual MCMember record would be available to the user by calling
> rdma_get_option.

> I don't believe that there would be any restriction on the use of the QP that is
> attached to the multicast group, but it would take more work to support more
> than one multicast group per QP.  The purpose of the rdma_sendto() routine is to
> map a given IP address to an allocated address handle and Qkey.  At this point,
> rdma_sendto would only work for multicast addresses that have been joined by the
> user.
> 
> If a user wanted more control over the multicast group, we could support a call
> such as:
> 
> struct ib_mreq {
> 	struct ib_sa_mcmember_rec	rec;
> 	ib_sa_comp_mask			comp_mask;
> }
> 
> rdma_set_option(id, RDMA_PROTO_IB, IB_ADD_MEMBERSHIP, &ib_mreq,
> 		    sizeof(ib_mreq));
> 
> Thoughts?
> 
> - Sean
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list