[ofa-general] Re: multicast join failed for...

Michael S. Tsirkin mst at dev.mellanox.co.il
Thu Apr 12 23:17:12 PDT 2007


> Quoting Sean Hefty <sean.hefty at intel.com>:
> Subject: RE: [ofa-general] Re: multicast join failed for...
> 
> >When the node is diagnosed and disconnected, SM will bring the rate
> >back up.
> 
> But how?  Doesn't it require re-registration of all multicast groups and
> clients registered for SA events?

SA detects that rate can be increased and sends another reregister MAD.

> >As I said, there are tens of ways a bad node can hurt performance,
> >and we don't/can't handle them. Why focus on ipoib? It's
> >the only way to connect to node on some fabrics, it
> >really must be up at all times.
> 
> But the solution is affecting all multicast traffic, not just that
> related to ipoib.  If you want all nodes to be able to join the ipoib
> multicast group, why not just create the group at the lower rate?

If the group is created at a lower rate, there would be no problem.
But the default configuration should be "plug an play".

> ipoib multicast performance doesn't seem that critical.

This is a policy than can be made optional, but should not
be forced on users by default.

> Whereas disrupting
> other multicast groups, which could actively be in use by MPI, may be. 

The disruption would be very minor - this would happen at most once when rate changes
from DDR to SDR and once when it changes back.


-- 
MST



More information about the general mailing list