[ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.
weiny2 at llnl.gov
Thu Apr 24 09:57:52 PDT 2008
On Wed, 23 Apr 2008 18:27:21 -0700
Hal Rosenstock <hrosenstock at xsigo.com> wrote:
> On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> > On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > > Hey all,
> > >
> > > Thoughts?
> > Having OpenSM request client reregistration (used in other places by
> > OpenSM) of such nodes will resolve this issue. As little or as much
> > policy can be built into OpenSM in determining "such" nodes to scope
> > down the application of this mechanism for this case.
> One side comment on the non OpenSM aspect of this:
> Why is the node temporarily unavailable ? There is a "contract" that the
> node makes with the SM that it clearly isn't honoring. Is any
> investigation going on relative to this aspect of the issue ?
Yes, we are working on finding the root cause. I agree that the "contract" is
not being honored. This is one of the reasons I was hesitant to implement any
fix to be submitted. I don't think this is truly a bug in the stack.
However, I could see this causing issues for people[*] and it might be nice to
have a "fix".
[*] Particularly those who do not have any other connection to nodes other than
More information about the general