[openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5
Paul
paul.lundin at gmail.com
Tue May 30 08:06:01 PDT 2006
Hi All,
I will be working on this as time permits this week. Unfortunately my
employer is not crazy about giving out remote access, so I will have to be
your hands on this. If you want me to do something just tell me what it is.
I know its a pain I have been there myself.
Regards.
On 5/30/06, Don.Albert at bull.com <Don.Albert at bull.com> wrote:
>
>
> Hal,
>
> With your patch to OpenSM, I think everything is ok on the local node.
> The remote node is definitely having some problems, resulting in not
> responding to the MAD packets. I have entered a separate message on the
> problems with the "ib0" interface on that machine.
>
> >
> > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote:
> > > > What next, coach?
> > >
> > > Can you turn on madeye on the remote node and see what packets are
> > > received and sent ? Let me know if you need help with that. I think
> you
> > > said you were running OFED, right ?
> >
>
> Yes, I am running kernel 2.6.16 with the OFED RC5 release. I will
> investigate how to run madeye, but the hangs on the remote machine are
> probably the root cause of the link failure.
>
>
> > I don't think madeye is part of OFED :-( Can it get added for RC6,
> > Tziporet ? I think it would be a useful tool to add for problems like
> > this.
> >
> > Also, was this a working setup before ? Did anything else change besides
> > installing RC5 on both nodes ?
> >
>
> This back to back setup was working originally with a backported 2.6.11-34kernel and I believe it was revision 6500 from the OpenIB svn trunk at that
> time. The problems started when I tried to move to RC4 and now RC5 of the
> OFED release, with the 2.6.16 kernel.
>
>
> > I have two more experiments I'd like you to try, before we go down the
> > madeye "route":
> >
> > 1. Do you have another IB cable to try ?
> >
> > 2. Can you completely shutdown and repower the remote node and see if it
> > starts responding ?
> >
>
> It is difficult for me to debug this sort of thing, since I telecommute
> from Tucson and the machines are located in Phoenix. But I can get someone
> there to power the machine down and reboot.
>
> -Don Albert-
>
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit
> http://openib.org/mailman/listinfo/openib-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060530/89e9ffbf/attachment.html>
More information about the general
mailing list