[ofa-general] Limited number of multicasts groups that can be joined?

Andrew Friedley afriedle at open-mpi.org
Fri Jun 8 11:05:33 PDT 2007


I've run into a problem where it appears that I cannot join more than 14 
multicast groups from a single HCA.  I'm using the RDMA CM UD/multicast 
interface from an OFED v1.2 nightly build, and using a '0' address when 
joining to have the SM allocate an unused address.  The first 14 
rdma_join_multicast() calls succeed, a MULTICAST_JOIN event comes 
through for each of them and everything works.  But the 15th call to 
rdma_join_multicast() returns -1 and sets errno to 99, 'Cannot assign 
requested address'.

Note that I'm using a single QP per process to do all the joins.  Things 
get weirder if I run two instances of my program on the same node -- as 
soon the total between the two instances is 14, neither instance can 
join any more groups.  Also, right now my code hangs when this happens 
-- if I kill off one of the two instances and run a third instance 
(while leaving the other hung, holding some number of groups), the third 
instance is not able to join ANY groups.  The behavior resets when I 
kill all instances.

Two instances running on separate nodes (on the same network) do not 
appear to interfere with each other like described above; they do still 
error out on the 15th join.

This feels like a bug to me; though regardless this limit is WAY too 
low.  Any ideas what might be going on, or how I can work around it?

Andrew



More information about the general mailing list