[openib-general] FW: [PATCH] [RFC] librdmacm: expose device list to users

Andrew Friedley afriedle at open-mpi.org
Wed Jul 26 08:46:22 PDT 2006


Sean Hefty wrote:
> I was trying to ask if there was any way for the processes to generate unique
> addresses.  For example, what TCP port number do the processes listen on when
> establishing their out of band connections?  Is there some way that you can map
> the addresses that are used for out of band communication to a multicast IP
> address, such that the processes get unique addresses?  From reading down into
> your mail, it doesn't sound like this would help much.

Not without breaking many layers of abstraction.. although TCP is all we 
support for OOB right now, the framework is in place for supporting 
other (non-TCP/IP) protocols in the future.

I'm asking some of our runtime developers if there's anything I could 
use.. doesn't look like it right now.

> I think the same basic API can be exposed in userspace.  It may be possible to
> expose a couple of extra helper functions to simplify creating and joining a
> group, but I'm not sure if they will be worth it.

The existing interface seems reasonable - I don't see how adding extra 
functions would improve anything.

> This doesn't end up working well for userspace apps.  To get a callback, the
> library ends up needing to create a thread to poll for events from the kernel.
> It makes more sense to give the application control over the threading, and let
> it poll for the events.

I figured you would say that.  So this would be a separate polling 
interface from a CQ or what the RDMA CM provides?

> Well, after looking at the code, an MGID of 0 doesn't currently work.  The
> implementation doesn't handle it.  I worked on a design to add support for MGID
> 0 to the multicast module, and will start on it in the next day or so.

Okay, I look forward to seeing the patch.

> Another thought I had is to allow ib_get_mcmember_rec() be called with an MGID
> of 0.  Doing so would return an MCMemberRecord with reasonable default values
> that could be used when creating a group.  (The returned values would either be
> hard-coded or copy those from the first join on a given port, if one had
> occurred.  In almost all cases, the first join would come from ipoib.)

This would be very good - it would allow for adjusting such values 
before the group is actually joined.

I see a possible race condition though - consider two processes calling 
ib_get_mcmember_rec().  Both of them return from this before either can 
call ib_join_multicast() and create the multicast group.  Is it possible 
for the same MGID to be returned from ib_get_mcmember_rec() in this 
scenario?

> There is no way to do this.  Note that there may be a delay between a node
> joining a group and the programming of the switch tables.

Thought I'd try.  Are you saying that just because a join has completed, 
that doesn't imply the network is fully ready for handling multicast 
messages for that group?

Andrew




More information about the general mailing list