[openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5
Don.Albert at Bull.com
Don.Albert at Bull.com
Fri May 26 11:35:18 PDT 2006
Hal,
> Yes, that is very useful. I had been working on trying to come up with
> what the problem was but this narrows it down to something I was
> thinking might be going on.
>
> It looks like you are running back to back HCAs, right ?
Yes, the HCAs are 4X DDR, connected back to back.
>
> It also looks to me like your remote (in terms of OpenSM) CA node is not
> responding to SMA requests like SubnGet NodeInfo yet the link is active.
> Can you describe what state that node is in (what modules are loaded,
> etc.) ? Can you do an ibstat/ibstatus on that node ?
Both systems are booted and the link appears active. Here is the
information you asked for:
>>>>>>>>>>>>>>>>>>>
Local System (where OpenSM is attempting to run)
[koa] (ib) ib> ibstat
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.0.800
Hardware version: a0
Node GUID: 0x0002c90200216dc4
System image GUID: 0x0002c90200216dc7
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 20
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c90200216dc5
[koa] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5
base lid: 0x0
sm lid: 0x0
state: 2: INIT
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
[koa] (ib) ib> /sbin/lsmod
Module Size Used by
parport_pc 28008 0
lp 12872 0
parport 37260 2 parport_pc,lp
ib_ipath 58392 0
ipath_core 154596 1 ib_ipath
pcmcia 34864 0
yenta_socket 25484 0
rsrc_nonstatic 12160 1 yenta_socket
pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic
button 7328 0
battery 10120 0
ac 5512 0
uhci_hcd 31776 0
hw_random 6824 0
i2c_i801 10260 0
i2c_core 20992 1 i2c_i801
ib_mthca 109744 0
ib_ipoib 48792 0
ib_uverbs 34128 0
ib_umad 14000 0
ib_ucm 16520 0
ib_sa 13884 1 ib_ipoib
ib_cm 30144 1 ib_ucm
ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm
ib_core 45952 9
ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad
floppy 67400 0
>>>>>>>>>>>>>>>>>>>
Remote system (no OpenSM instance)
[jatoba] (ib) ib> ibstat
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.0.800
Hardware version: a0
Node GUID: 0x0002c90200216e40
System image GUID: 0x0002c90200216e43
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 20
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c90200216e41
[jatoba] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0021:6e41
base lid: 0x0
sm lid: 0x0
state: 2: INIT
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
[jatoba] (ib) ib> /sbin/lsmod
Module Size Used by
parport_pc 28008 0
lp 12872 0
parport 37260 2 parport_pc,lp
ib_ipath 58392 0
ipath_core 154596 1 ib_ipath
pcmcia 34864 0
yenta_socket 25484 0
rsrc_nonstatic 12160 1 yenta_socket
pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic
button 7328 0
battery 10120 0
ac 5512 0
uhci_hcd 31776 0
hw_random 6824 0
i2c_i801 10260 0
i2c_core 20992 1 i2c_i801
ib_mthca 109744 0
ib_ipoib 48792 0
ib_uverbs 34128 0
ib_umad 14000 2
ib_ucm 16520 0
ib_sa 13884 1 ib_ipoib
ib_cm 30144 1 ib_ucm
ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm
ib_core 45952 9
ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad
floppy 67400 0
>>>>>>>>>>>>>>>>>>>
>
> Can you try this patch to see if it gets you further and let me know ?
> Note that this is just a potential workaround right now.
>
I will try rebuilding with the patch and let you know the results.
Thanks,
-Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060526/bd875b42/attachment.html>
More information about the general
mailing list