[openib-general] Re: Failed multicast join withnew multicast module
Sean Hefty
mshefty at ichips.intel.com
Wed Jun 7 11:25:08 PDT 2006
Sean Hefty wrote:
> The multicast module should work in this specific case, since the only
> client is ipoib, and ipoib first leaves the group before re-joining.
I think that there's a race here. If ipoib leaves, then re-joins quickly
enough, the join request will be processed before the leave. The result is that
the join will be fulfilled locally, without an additional MAD sent. (Trying to
process the leave immediately doesn't fix the problem in the generic case, where
there may be multiple users of a group.)
A temporary fix would be to always send a MAD, even if the join can be fulfilled
locally. But I'm looking at having the multicast module re-join on an event.
This raises the possibility that the new join request may fail, which would
require the multicast module to report that a membership is no longer active.
Another problem is if some nodes are joined as NonMembers or SendOnlyNonMembers,
then the SM will not create the multicast group when they try to re-join. This
leads to a race where NonMembers and SendOnlyNonMembers will fail to re-join
until one of the FullMembers joins.
- Sean
More information about the general
mailing list