[ofa-general] Limited number of multicasts groups that can be joined?
Hal Rosenstock
halr at voltaire.com
Thu Jun 28 14:33:11 PDT 2007
On Thu, 2007-06-28 at 17:23, Sean Hefty wrote:
> > Now the more interesting part. I'm now able to run on a 128 node
> > machine using open SM running on a node (before, I was running on an 8
> > node machine which I'm told is running the Cisco SM on a Topspin
> > switch). On this machine, if I run my benchmark with two processes per
> > node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join
> > > 750 groups simultaneously from one QP on each process. To make this
> > stranger, I can join only 4 groups running the same thing on the 8-node
> > machine.
>
> Are the switches and HCAs in the two setups the same? If you run the
> same SM on both clusters, do you see the same results?
>
> > While doing so I noticed that the time from calling
> > rdma_join_multicast() to the event arrival stayed fairly constant (in
> > the .001sec range), while the time from the join call to actually
> > receiving messages on the group steadily increased from around .1 secs
> > to around 2.7 secs with 750+ groups. Furthermore, this time does not
> > drop back to .1 secs if I stop the benchmark and run it (or any of my
> > other multicast code) again. This is understandable within a single
> > program run, but the fact that behavior persists across runs concerns me
> > -- feels like a bug, but I don't have much concrete here.
>
> Even after all nodes leave all multicast groups, I don't believe that
> there's a requirement for the SA to reprogram the switches immediately.
Right, that is allowed to be "lazy".
Nit: it's the SM rather than SA that reprograms the switches but the SA
multicast leaves is what initiates this process.
-- Hal
> So if the switches or the configuration of the swtiches are part of
> the problem, I can imagine seeing issues between runs.
>
> When rdma_join_multicast() reports the join event, it means either: the
> SA has been notified of the join request, or, if the port has already
> joined the group, that a reference count on the group has been
> incremented. The SA may still require time to program the switch
> forwarding tables.
>
> - Sean
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list