[ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.
Hal Rosenstock
hrosenstock at xsigo.com
Wed Apr 23 18:27:21 PDT 2008
On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > Hey all,
> >
> > We have just started to experience a situation which I don't think is strictly
> > a bug but I think could be fixed within the OFED software.
> >
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic. The details are:
> >
> > 1) Issues on a node cause a soft lockup of the node.
> > 2) OpenSM does a normal light sweep.
> > 3) MADs to the node time out since the node is in a "bad state"
> > 4) OpenSM marks the node down and drops it from internal tables, including
> > mcast groups.
> > 5) Node recovers from soft lock up condition.
> > 6) A subsequent sweep causes OpenSM see the node and add it back to the
> > fabric.
> > 7) Node is fully functional on the verbs layer but IPoIB never knew anything
> > was wrong so it does _not_ rejoin the mcast groups. (This is different
> > from the condition where the link actually goes down.)
> >
> > As far as we can see there is nothing wrong with the node. It just went
> > catatonic for a while. Obviously this is not a good condition, however, I was
> > thinking of a couple of things which could be done to "fix" the above
> > situation. I am writing here to see which solution might be best, and accepted
> > by the community. Alternatively this may have already been addressed.
> > However, I don't see a bug in the bug list, nor do I find anything in the
> > archive.
> >
> > Solutions I can think of are:
> >
> > A) Modify OpenSM to move the node to a "questionable" state for a period of X
> > sweeps. If after X sweeps the node still does not respond, drop it. If
> > the node does respond return it to it's original state.
> > B) When OpenSM queries the node as if it is new on the fabric and the SMA
> > "thinks" it is not new, have the SMA detect this and notify the IPoIB
> > layer (or ULPs in general) that something has gone wrong. The IPoIB
> > layer could then check/rejoin the group.
> > C) put some code in IPoIB which might detect "lost cycles" and check/rejoin
> > the mcast group.
> >
> > I have not worked out details for any solution. I believe that A and B are
> > "outside the spec". However, I can see merit in A and B.
> >
> > Solution A would help if MAD's are lost due to reasons other than node issues.
> > (Perhaps a bad link. Although I don't know of anyone having problems like
> > that.)
> >
> > Solution B puts the solution closer to the original problem but I am unsure how
> > the SMA would know what is going on.
> >
> > Solution C is really close to the problem however I don't know how it would be
> > done. I do think that this would be within the specification as it really is
> > the ULP's job to maintain its membership in the group. But how would it do
> > this without help from the lower layers. (Of course it could poll for
> > membership but I think that is a bad idea.)
>
> > Thoughts?
>
> Having OpenSM request client reregistration (used in other places by
> OpenSM) of such nodes will resolve this issue. As little or as much
> policy can be built into OpenSM in determining "such" nodes to scope
> down the application of this mechanism for this case.
One side comment on the non OpenSM aspect of this:
Why is the node temporarily unavailable ? There is a "contract" that the
node makes with the SM that it clearly isn't honoring. Is any
investigation going on relative to this aspect of the issue ?
-- Hal
> -- Hal
>
> > Ira Weiny
> > Lawrence Livermore National Lab
> > weiny2 at llnl.gov
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
More information about the general
mailing list