[ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.
Ira Weiny
weiny2 at llnl.gov
Thu Apr 24 14:31:25 PDT 2008
On Thu, 24 Apr 2008 16:52:07 +0300
Or Gerlitz <ogerlitz at voltaire.com> wrote:
> Ira Weiny wrote:
> > The symptom is that nodes drop out of the IPoIB mcast group after a node
> > temporarily goes catatonic. The details are:
> >
> > 1) Issues on a node cause a soft lockup of the node.
> > 2) OpenSM does a normal light sweep.
> > 3) MADs to the node time out since the node is in a "bad state"
> > 4) OpenSM marks the node down and drops it from internal tables, including
> > mcast groups.
> > 5) Node recovers from soft lock up condition.
> > 6) A subsequent sweep causes OpenSM see the node and add it back to the
> > fabric.
> As Hal noted, client reregister is the way to go.
>
> In a similar discussion in the past the conclusion was that the SM
> should (maybe even according to the spec, but according to common sense
> is fine as well, I think) set the re-register bit where in that case
> IPoIB rejoins and we are done. At the time, I understood that openSM
> would do so
> (http://lists.openfabrics.org/pipermail/general/2007-September/041237.html),
> am I wrong, or maybe the case brought on that thread (switch/port going
> down and a whole sub fabric is removed from the SM point of view where
> the links remain up from the view point of the nodes) was different? the
> basic point is a case where a node link is UP and the SM lost this node
> for some time and now sees it again. We used to call it "the
> active/active" transition and an SM maybe need special logic for it.
>
I have set up the following as a test situation
switch B
/ \ (link X)
switch A switch C
/ / \
Node1 node2 node3
(SM)
When I down link X and re-enable it node 2 and 3 do _not_ rejoin the mcast
group.
Debug output from OpenSM indicates it is setting the rereg bit but I don't see
the rejoin in the debug output from the node 2's IPoIB mcast layer. Perhaps
there is a bug to be squashed here?
Just in case anyone is curious, this is with OFED 1.2.5 on a RHEL 5.1 based
kernel, and OpenSM 3.2.1-8341058-dirty.
I am in the process of tracking this down,
Ira
More information about the general
mailing list