[ofa-general] Re: OpenSM and fat tree

Chris Worley worleys at gmail.com
Thu May 15 11:45:08 PDT 2008


On Thu, May 15, 2008 at 11:50 AM, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
> On Thu, 2008-05-15 at 11:35 -0600, Chris Worley wrote:
>> On Thu, May 15, 2008 at 11:12 AM, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
>> > On Thu, 2008-05-15 at 10:26 -0600, Chris Worley wrote:
>> >> On Thu, May 15, 2008 at 10:10 AM, Hal Rosenstock <hrosenstock at xsigo.com> wrote:
>> >> > Chris,
>> >> >
>> >> > On Thu, 2008-05-15 at 09:52 -0600, Chris Worley wrote:
>> >> >> Is there any command line utility to tell nodes that don't see the
>> >> >> route change to "go ask the SM again for your routes"... or "clear the
>> >> >> route table"?
>> >> >
>> >> > I'm not sure what you're asking. There is no route table at end nodes;
>> >> > only switch nodes and the SM maintains these. The end node only has path
>> >> > records which it has retrieved and perhaps cached. Path records should
>> >> > be refreshed when SM or local LID changes which are local events to the
>> >> > end node.
>> >>
>> >> After an sm change (i.e. using the "-r" switch),
>> >
>> > That should be a local LID change.
>> >
>> >>  nodes can't ping each
>> >> other over IPoIB (other protocols also can't communicate).
>> >
>> > Sounds like ULP issue(s) in handling this. What kernel and/or OFED
>> > version are you running ?
>>
>> Currently, the SM is running OFED 1.3 on an RHEL4 2.6.9-67.0.4 kernel
>> with Lustre 1.6.4.2 changes.
>>
>> The compute nodes are running the same kernel w/ OFED 1.2.5.5... which
>> will be upgraded to 1.3 by the end of the day.
>
> Maybe that will be better for LID change; Let us know.

Unfortunately, it isn't a good day to test; a critical job is running.
 After upgrading all but the nodes the critical job was running on, I
found the opensmd hung, the rebooted nodes were not getting
initialized, I had to "kill -9" it.  Upon opensmd restart, I couldn't
risk using the "-r" switch, but, w/o it, the fat-tree came up w/o
error.

Chris



More information about the general mailing list