[PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
Ira Weiny
weiny2 at llnl.gov
Mon Apr 28 09:19:23 PDT 2008
On Sun, 27 Apr 2008 11:47:54 +0300
Or Gerlitz <ogerlitz at voltaire.com> wrote:
> Ira Weiny wrote:
> >
> > I did not get any output with multicast_debug_level!
> why should you, as from the node's point of view nothing has happened
> (the exact param name is mcast_debug_level)
> >
> > Here is a patch which fixes the problem. (At least with the partial sub-nets
> > configuration I explained before.) I will have to verify this fixes the problem
> > I originally reported.
> OK, good. Does this problem exist in the released openSM? if yes, what
> would be the trigger for the SM to "really discover" (i.e do PortInfo
> SET) this sub-fabric and how much time would it take to reach this
> trigger, worst case wise?
Yes, this is in the current released version of OpenSM, AFAICT. The trigger
is: the single link separating the partial sub net will come up and that trap
will cause OpenSM to resweep. I believe this will happen on the next resweep
cycle which is by default 10 sec. (But this is configurable.) I don't think
there is an issue with allowing OpenSM to resweep as designed.
>
> The failure configuration you have set to reproduce the problem is very
> untypical, I think.
I agree. I made a patch to turn off the processing of MAD's in the kernel to
test my original theory, that the node is not responding to MAD's. Using this
patch I have been able to verify that if a node stops responding that the rereg
is sent by OpenSM when the node comes back.
See my next email response to Sasha concerning the original issue.
Ira
>
> Since under common clos etc topologies which don't
> have a 1:n blocking nature, failure of such link would cause re-route
> etc by the SM which would not (and should not) be noted by the nodes (I
> hope I am not falling into another problem here...)
>
> Or.
>
>
>
More information about the general
mailing list