[ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.

Hal Rosenstock hrosenstock at xsigo.com
Wed Apr 23 18:27:21 PDT 2008


On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > Hey all,
> > 
> > We have just started to experience a situation which I don't think is strictly
> > a bug but I think could be fixed within the OFED software.
> > 
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic.  The details are:
> > 
> >    1) Issues on a node cause a soft lockup of the node.
> >    2) OpenSM does a normal light sweep.
> >    3) MADs to the node time out since the node is in a "bad state"
> >    4) OpenSM marks the node down and drops it from internal tables, including
> >       mcast groups.
> >    5) Node recovers from soft lock up condition.
> >    6) A subsequent sweep causes OpenSM see the node and add it back to the
> >       fabric.
> >    7) Node is fully functional on the verbs layer but IPoIB never knew anything
> >       was wrong so it does _not_ rejoin the mcast groups.  (This is different
> >       from the condition where the link actually goes down.)
> > 
> > As far as we can see there is nothing wrong with the node.  It just went
> > catatonic for a while.  Obviously this is not a good condition, however, I was
> > thinking of a couple of things which could be done to "fix" the above
> > situation.  I am writing here to see which solution might be best, and accepted
> > by the community.  Alternatively this may have already been addressed.
> > However, I don't see a bug in the bug list, nor do I find anything in the
> > archive.
> > 
> > Solutions I can think of are:
> > 
> >    A) Modify OpenSM to move the node to a "questionable" state for a period of X
> >       sweeps.  If after X sweeps the node still does not respond, drop it.  If
> >       the node does respond return it to it's original state.
> >    B) When OpenSM queries the node as if it is new on the fabric and the SMA
> >       "thinks" it is not new, have the SMA detect this and notify the IPoIB
> >       layer (or ULPs in general) that something has gone wrong.  The IPoIB
> >       layer could then check/rejoin the group.
> >    C) put some code in IPoIB which might detect "lost cycles" and check/rejoin
> >       the mcast group.
> > 
> > I have not worked out details for any solution.  I believe that A and B are
> > "outside the spec".  However, I can see merit in A and B.
> > 
> > Solution A would help if MAD's are lost due to reasons other than node issues.
> > (Perhaps a bad link.  Although I don't know of anyone having problems like
> > that.)
> > 
> > Solution B puts the solution closer to the original problem but I am unsure how
> > the SMA would know what is going on.
> > 
> > Solution C is really close to the problem however I don't know how it would be
> > done.  I do think that this would be within the specification as it really is
> > the ULP's job to maintain its membership in the group.  But how would it do
> > this without help from the lower layers.  (Of course it could poll for
> > membership but I think that is a bad idea.)
> 
> > Thoughts?
> 
> Having OpenSM request client reregistration (used in other places by
> OpenSM) of such nodes will resolve this issue. As little or as much
> policy can be built into OpenSM in determining "such" nodes to scope
> down the application of this mechanism for this case.

One side comment on the non OpenSM aspect of this: 

Why is the node temporarily unavailable ? There is a "contract" that the
node makes with the SM that it clearly isn't honoring. Is any
investigation going on relative to this aspect of the issue ?

-- Hal

> -- Hal
> 
> > Ira Weiny
> > Lawrence Livermore National Lab
> > weiny2 at llnl.gov
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list