[ofa-general] Re: multicast join failed for...
Michael S. Tsirkin
mst at dev.mellanox.co.il
Thu Apr 12 23:17:12 PDT 2007
> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [ofa-general] Re: multicast join failed for...
>
> >When the node is diagnosed and disconnected, SM will bring the rate
> >back up.
>
> But how? Doesn't it require re-registration of all multicast groups and
> clients registered for SA events?
SA detects that rate can be increased and sends another reregister MAD.
> >As I said, there are tens of ways a bad node can hurt performance,
> >and we don't/can't handle them. Why focus on ipoib? It's
> >the only way to connect to node on some fabrics, it
> >really must be up at all times.
>
> But the solution is affecting all multicast traffic, not just that
> related to ipoib. If you want all nodes to be able to join the ipoib
> multicast group, why not just create the group at the lower rate?
If the group is created at a lower rate, there would be no problem.
But the default configuration should be "plug an play".
> ipoib multicast performance doesn't seem that critical.
This is a policy than can be made optional, but should not
be forced on users by default.
> Whereas disrupting
> other multicast groups, which could actively be in use by MPI, may be.
The disruption would be very minor - this would happen at most once when rate changes
from DDR to SDR and once when it changes back.
--
MST
More information about the general
mailing list