[openib-general] RDMA CM multicast
Andrew Friedley
afriedle at open-mpi.org
Fri Jan 26 19:47:00 PST 2007
> Once this routing is in place, the only thing they need is to enhance
> the MPI job starter/etc to allocate to each job (say) two unique
> multicast --IP-- addresses on the relevant subnet and provide these IP
> addresses to each rank. Now the rank can use the RDMA CM without any
> hack.
I don't this isn't as easy as you've made it sound. I see two
approaches to preventing address collision -- both require voluntary
participation. First is a centralized authority approach (this has been
used for IP multicast-based protocols). This means running some sort of
daemon in a location all peers can communicate with. I'm not really
keen on the idea of requiring a separate daemon just to support
multicast in Open MPI. Second is peer-to-peer based approaches. These
are doable, but difficult due to numerous race conditions. It's also
highly desireable to minimize the time cost of joining a multicast
group; this is especially difficult with a peer-to-peer solutions.
Also, I'd rather not assume a single MPI job requires a constant (small)
number of multicast groups/addresses. The obvious correllation is to
use one multicast group per MPI communicator. Most applications will
use only a few, though some may use hundreds, and may even vary the
number in use as the app executes. I've also been considering
approaches utilizing many groups per communicator, so again we could be
looking at hundreds of multicast groups per MPI job.
As I've said, implementing solutions at the MPI level is doable but
difficult. I knew from earlier discussions that IB is able to allocate
new, unused multicast addresses and was hoping expose that functionality
and avoid the multicast address allocation problem. However I hadn't
thought of the fact that other networks supported by the RDMA CM might
not have similar functionality.. so this might not be appropriate there.
But maybe it is worth considering how hard it is for those other
networks to provide the functionality?
Andrew
More information about the general
mailing list