[ofa-general] Limited number of multicasts groups that can be joined?
Sean Hefty
mshefty at ichips.intel.com
Thu Jun 28 14:23:02 PDT 2007
> Now the more interesting part. I'm now able to run on a 128 node
> machine using open SM running on a node (before, I was running on an 8
> node machine which I'm told is running the Cisco SM on a Topspin
> switch). On this machine, if I run my benchmark with two processes per
> node (instead of one, i.e. mpirun -np 16 with 8 nodes), I'm able to join
> > 750 groups simultaneously from one QP on each process. To make this
> stranger, I can join only 4 groups running the same thing on the 8-node
> machine.
Are the switches and HCAs in the two setups the same? If you run the
same SM on both clusters, do you see the same results?
> While doing so I noticed that the time from calling
> rdma_join_multicast() to the event arrival stayed fairly constant (in
> the .001sec range), while the time from the join call to actually
> receiving messages on the group steadily increased from around .1 secs
> to around 2.7 secs with 750+ groups. Furthermore, this time does not
> drop back to .1 secs if I stop the benchmark and run it (or any of my
> other multicast code) again. This is understandable within a single
> program run, but the fact that behavior persists across runs concerns me
> -- feels like a bug, but I don't have much concrete here.
Even after all nodes leave all multicast groups, I don't believe that
there's a requirement for the SA to reprogram the switches immediately.
So if the switches or the configuration of the swtiches are part of
the problem, I can imagine seeing issues between runs.
When rdma_join_multicast() reports the join event, it means either: the
SA has been notified of the join request, or, if the port has already
joined the group, that a reference count on the group has been
incremented. The SA may still require time to program the switch
forwarding tables.
- Sean
More information about the general
mailing list