[ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.

Ira Weiny weiny2 at llnl.gov
Thu Apr 24 09:57:52 PDT 2008


On Wed, 23 Apr 2008 18:27:21 -0700
Hal Rosenstock <hrosenstock at xsigo.com> wrote:

> On Wed, 2008-04-23 at 17:05 -0700, Hal Rosenstock wrote:
> > On Wed, 2008-04-23 at 13:38 -0700, Ira Weiny wrote:
> > > Hey all,
> > > 

<snip>

> > 
> > > Thoughts?
> > 
> > Having OpenSM request client reregistration (used in other places by
> > OpenSM) of such nodes will resolve this issue. As little or as much
> > policy can be built into OpenSM in determining "such" nodes to scope
> > down the application of this mechanism for this case.
> 
> One side comment on the non OpenSM aspect of this: 
> 
> Why is the node temporarily unavailable ? There is a "contract" that the
> node makes with the SM that it clearly isn't honoring. Is any
> investigation going on relative to this aspect of the issue ?
> 

Yes, we are working on finding the root cause.  I agree that the "contract" is
not being honored.  This is one of the reasons I was hesitant to implement any
fix to be submitted.  I don't think this is truly a bug in the stack.
However, I could see this causing issues for people[*] and it might be nice to
have a "fix".

Ira

[*] Particularly those who do not have any other connection to nodes other than
IB.




More information about the general mailing list