[ofa-general] Limited number of multicasts groups that can be joined?

Hal Rosenstock halr at voltaire.com
Thu Jun 28 14:33:11 PDT 2007


On Thu, 2007-06-28 at 17:23, Sean Hefty wrote:
> > Now the more interesting part.  I'm now able to run on a 128 node 
> > machine using open SM running on a node (before, I was running on an 8 
> > node machine which I'm told is running the Cisco SM on a Topspin 
> > switch).  On this machine, if I run my benchmark with two processes per 
> > node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join 
> >  > 750 groups simultaneously from one QP on each process.  To make this 
> > stranger, I can join only 4 groups running the same thing on the 8-node 
> > machine.
> 
> Are the switches and HCAs in the two setups the same?  If you run the 
> same SM on both clusters, do you see the same results?
> 
> > While doing so I noticed that the time from calling 
> > rdma_join_multicast() to the event arrival stayed fairly constant (in 
> > the .001sec range), while the time from the join call to actually 
> > receiving messages on the group steadily increased from around .1 secs 
> > to around 2.7 secs with 750+ groups.  Furthermore, this time does not 
> > drop back to .1 secs if I stop the benchmark and run it (or any of my 
> > other multicast code) again.  This is understandable within a single 
> > program run, but the fact that behavior persists across runs concerns me 
> > -- feels like a bug, but I don't have much concrete here.
> 
> Even after all nodes leave all multicast groups, I don't believe that 
> there's a requirement for the SA to reprogram the switches immediately. 

Right, that is allowed to be "lazy".

Nit: it's the SM rather than SA that reprograms the switches but the SA
multicast leaves is what initiates this process.

-- Hal

>   So if the switches or the configuration of the swtiches are part of 
> the problem, I can imagine seeing issues between runs.
> 
> When rdma_join_multicast() reports the join event, it means either: the 
> SA has been notified of the join request, or, if the port has already 
> joined the group, that a reference count on the group has been 
> incremented.  The SA may still require time to program the switch 
> forwarding tables.
> 
> - Sean
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list