[openib-general] RDMA CM multicast

Andrew Friedley afriedle at open-mpi.org
Wed Jan 24 09:39:38 PST 2007


Sean Hefty wrote:
> Posting to openib-general list...
> 
>> RDMA CM has multicast of course, though it seems no means of preventing
>> address collisions (to me, that means two separate MPI jobs using the
>> same multicast address).  I know that part of the new multicast support
>> you had developed a few months ago was the ability to specify a '0'
>> MGID/MLID to indicate that an unused multicast address should be used
>> and returned.
>>
>> How hard would it be to add this functionality to RDMA CM?
> 
> I looked into this, and it seems doable.  I hacked the kernel rdma_cm to join a
> multicast group with an mgid of 0, and it seemed to work as far as I could test
> it without more extensive changes.  (My test didn't actually transfer data, but
> the join succeeded, the MGID/MLID was exported to userspace, and different
> applications joined different groups.)
> 
> What would be needed is a way for the user to indicate that they need a unique
> address.  An obvious way to accomplish this is for the user to specify an IP
> address of 0.0.0.0 when calling rdma_join_multicast().  The user would first
> need to bind to a specific device by calling rdma_bind_addr() with a local IP
> address.

Fine with me -- I would be using rdma_bind_addr() all the time anyway. 
Though finding a way to remove this requirement might be useful to 
someone else..

> If more than one group is joined this way, then rdma_leave_multicast() would
> need someway to distinguish between the different groups joined by a single
> user.  (rdma_leave_multicast takes the IP address of the group to leave.)
> Providing a "port number" with the sockaddr would work.  The port number would
> need to match when joining/leaving, but is not part of the multicast address,
> essentially making it a join index specified by the user.
> 
> Your code would look something like this:
> 
> rdma_bind_addr(local IP address)
> rdma_join_multicast(0.0.0.0, port 0)	<- exchange group info out of band
> rdma_join_multicast(0.0.0.0, port 1)	<- exchange group info out of band
> send data to a lot of nodes at once
> rdma_leave_multicast(0.0.0.0, port 0)
> rdma_leave_multicast(0.0.0.0, port 1)

Not sure I understand this -- RDMA CM will be exchanging whatever 
multicast address is selected as part of the rdma_join_multicast call? 
How exactly do you plan to do that?  i.e.  How do you know who to 
communicate that information to?  Unless I'm missing something, it 
sounds like this approach just moves the address collision problem over 
to port numbers, which wouldn't really accomplish anything.

What I would have expected is a means to get at the address that gets 
chosen instead of 0.0.0.0 -- either directly returned at 
rdma_join_multicast or successful join notification, or made available 
in a data structure somewhere.  From there it's trivial (for me) to pass 
the address to the other peers I care about, and have them join by 
expicitly specifying the multicast address.

Andrew

> If this sounds like it would work for you, let me know, and I can create a patch
> to test this idea more.
> 
> - Sean
> 
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list