[ofa-general] Re: OpenSM and fat tree

Hal Rosenstock hrosenstock at xsigo.com
Thu May 15 10:50:40 PDT 2008


On Thu, 2008-05-15 at 11:35 -0600, Chris Worley wrote:
> On Thu, May 15, 2008 at 11:12 AM, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
> > On Thu, 2008-05-15 at 10:26 -0600, Chris Worley wrote:
> >> On Thu, May 15, 2008 at 10:10 AM, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
> >> > Chris,
> >> >
> >> > On Thu, 2008-05-15 at 09:52 -0600, Chris Worley wrote:
> >> >> Is there any command line utility to tell nodes that don't see the
> >> >> route change to "go ask the SM again for your routes"... or "clear the
> >> >> route table"?
> >> >
> >> > I'm not sure what you're asking. There is no route table at end nodes;
> >> > only switch nodes and the SM maintains these. The end node only has path
> >> > records which it has retrieved and perhaps cached. Path records should
> >> > be refreshed when SM or local LID changes which are local events to the
> >> > end node.
> >>
> >> After an sm change (i.e. using the "-r" switch),
> >
> > That should be a local LID change.
> >
> >>  nodes can't ping each
> >> other over IPoIB (other protocols also can't communicate).
> >
> > Sounds like ULP issue(s) in handling this. What kernel and/or OFED
> > version are you running ?
> 
> Currently, the SM is running OFED 1.3 on an RHEL4 2.6.9-67.0.4 kernel
> with Lustre 1.6.4.2 changes.
> 
> The compute nodes are running the same kernel w/ OFED 1.2.5.5... which
> will be upgraded to 1.3 by the end of the day.

Maybe that will be better for LID change; Let us know.

-- Hal

> 
> Chris
> >
> > -- Hal
> >
> >> Restarting the OFED stack works, but modules won't unload if there was
> >> something active (i.e. Lustre), so the only recource to getting the
> >> OFED stack working again is a hard reboot.
> >>
> >> That's what I'd like to avoid if possible.
> >>
> >> Chris
> >> >
> >> > -- Hal
> >> >
> >> >> On Thu, May 15, 2008 at 5:19 AM, Sasha Khapyorsky <sashak at voltaire.com> wrote:
> >> >> > Hi Hal,
> >> >> >
> >> >> > On 04:19 Mon 12 May     , Hal Rosenstock wrote:
> >> >> >>
> >> >> >> I filed this as bug 1031:
> >> >> >> https://bugs.openfabrics.org/show_bug.cgi?id=1031
> >> >> >>
> >> >> >> > It would be nice if I could reproduce it in simulation.
> >> >> >>
> >> >> >> Yes, that would be nice; but I don't have a sim case.
> >> >> >
> >> >> > Do you have ibnetdiscover file for this case? If not from where report
> >> >> > is coming?
> >> >> >
> >> >> > Sasha
> >> >> > _______________________________________________
> >> >> > general mailing list
> >> >> > general at lists.openfabrics.org
> >> >> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >> >> >
> >> >> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >> >> >
> >> >> _______________________________________________
> >> >> general mailing list
> >> >> general at lists.openfabrics.org
> >> >> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >> >>
> >> >> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >> >
> >> >
> >
> >




More information about the general mailing list