[openfabrics-ewg]	Re:	[openib-general]	OpenSM	segmentation	fault	on RC5
    Don.Albert at Bull.com 
    Don.Albert at Bull.com
       
    Tue May 30 07:55:34 PDT 2006
    
    
  
Hal,
With your patch to OpenSM, I think everything is ok on the local node. The 
remote node is definitely having some problems, resulting in not 
responding to the MAD packets.  I have entered a separate message on the 
problems with the "ib0" interface on that machine.
> 
> On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote:
> > > What next, coach?
> > 
> > Can you turn on madeye on the remote node and see what packets are
> > received and sent ? Let me know if you need help with that. I think 
you
> > said you were running OFED, right ?
> 
Yes, I am running kernel 2.6.16 with the OFED RC5 release.  I will 
investigate how to run madeye, but the hangs on the remote machine are 
probably the root cause of the link failure.
> I don't think madeye is part of OFED :-( Can it get added for RC6,
> Tziporet ? I think it would be a useful tool to add for problems like
> this.
> 
> Also, was this a working setup before ? Did anything else change besides
> installing RC5 on both nodes ?
> 
This back to back setup was working originally with a backported 2.6.11-34 
kernel and I believe it was revision 6500 from the OpenIB svn trunk at 
that time.  The problems started when I tried to move to RC4 and now RC5 
of the OFED release, with the 2.6.16 kernel.
> I have two more experiments I'd like you to try, before we go down the
> madeye "route":
> 
> 1. Do you have another IB cable to try ?
> 
> 2. Can you completely shutdown and repower the remote node and see if it
> starts responding ?
> 
It is difficult for me to debug this sort of thing, since I telecommute 
from Tucson and the machines are located in Phoenix.  But I can get 
someone there to power the machine down and reboot.
  -Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060530/57f6911e/attachment.html>
    
    
More information about the general
mailing list