[openib-general] Re: Failed multicast join withnew multicast module

Sean Hefty mshefty at ichips.intel.com
Wed Jun 7 11:25:08 PDT 2006


Sean Hefty wrote:
> The multicast module should work in this specific case, since the only 
> client is ipoib, and ipoib first leaves the group before re-joining.

I think that there's a race here.  If ipoib leaves, then re-joins quickly 
enough, the join request will be processed before the leave.  The result is that 
the join will be fulfilled locally, without an additional MAD sent.  (Trying to 
process the leave immediately doesn't fix the problem in the generic case, where 
there may be multiple users of a group.)

A temporary fix would be to always send a MAD, even if the join can be fulfilled 
locally.  But I'm looking at having the multicast module re-join on an event. 
This raises the possibility that the new join request may fail, which would 
require the multicast module to report that a membership is no longer active.

Another problem is if some nodes are joined as NonMembers or SendOnlyNonMembers, 
then the SM will not create the multicast group when they try to re-join.  This 
leads to a race where NonMembers and SendOnlyNonMembers will fail to re-join 
until one of the FullMembers joins.

- Sean




More information about the general mailing list