[openfabrics-ewg] Re: [openib-general] OpenSM segmentation fault on RC5

Don.Albert at Bull.com Don.Albert at Bull.com
Fri May 26 11:35:18 PDT 2006


Hal,
 
> Yes, that is very useful. I had been working on trying to come up with
> what the problem was but this narrows it down to something I was
> thinking might be going on.
> 
> It looks like you are running back to back HCAs, right ?

Yes, the HCAs are 4X DDR, connected back to back.

> 
> It also looks to me like your remote (in terms of OpenSM) CA node is not
> responding to SMA requests like SubnGet NodeInfo yet the link is active.
> Can you describe what state that node is in (what modules are loaded,
> etc.) ? Can you do an ibstat/ibstatus on that node ?

Both systems are booted and the link appears active.  Here is the 
information you asked for:

>>>>>>>>>>>>>>>>>>>

Local System (where OpenSM is attempting to run)

[koa] (ib) ib> ibstat
CA 'mthca0'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.0.800
        Hardware version: a0
        Node GUID: 0x0002c90200216dc4
        System image GUID: 0x0002c90200216dc7
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 20
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0002c90200216dc5
[koa] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c902:0021:6dc5
        base lid:        0x0
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            20 Gb/sec (4X DDR)

[koa] (ib) ib> /sbin/lsmod
Module                  Size  Used by
parport_pc             28008  0
lp                     12872  0
parport                37260  2 parport_pc,lp
ib_ipath               58392  0
ipath_core            154596  1 ib_ipath
pcmcia                 34864  0
yenta_socket           25484  0
rsrc_nonstatic         12160  1 yenta_socket
pcmcia_core            38068  3 pcmcia,yenta_socket,rsrc_nonstatic
button                  7328  0
battery                10120  0
ac                      5512  0
uhci_hcd               31776  0
hw_random               6824  0
i2c_i801               10260  0
i2c_core               20992  1 i2c_i801
ib_mthca              109744  0
ib_ipoib               48792  0
ib_uverbs              34128  0
ib_umad                14000  0
ib_ucm                 16520  0
ib_sa                  13884  1 ib_ipoib
ib_cm                  30144  1 ib_ucm
ib_mad                 35896  4 ib_mthca,ib_umad,ib_sa,ib_cm
ib_core                45952  9 
ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad
floppy                 67400  0

>>>>>>>>>>>>>>>>>>>

Remote system (no OpenSM instance)

[jatoba] (ib) ib> ibstat
CA 'mthca0'
        CA type: MT25204
        Number of ports: 1
        Firmware version: 1.0.800
        Hardware version: a0
        Node GUID: 0x0002c90200216e40
        System image GUID: 0x0002c90200216e43
        Port 1:
                State: Initializing
                Physical state: LinkUp
                Rate: 20
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x02510a68
                Port GUID: 0x0002c90200216e41
[jatoba] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
        default gid:     fe80:0000:0000:0000:0002:c902:0021:6e41
        base lid:        0x0
        sm lid:          0x0
        state:           2: INIT
        phys state:      5: LinkUp
        rate:            20 Gb/sec (4X DDR)

[jatoba] (ib) ib> /sbin/lsmod
Module                  Size  Used by
parport_pc             28008  0
lp                     12872  0
parport                37260  2 parport_pc,lp
ib_ipath               58392  0
ipath_core            154596  1 ib_ipath
pcmcia                 34864  0
yenta_socket           25484  0
rsrc_nonstatic         12160  1 yenta_socket
pcmcia_core            38068  3 pcmcia,yenta_socket,rsrc_nonstatic
button                  7328  0
battery                10120  0
ac                      5512  0
uhci_hcd               31776  0
hw_random               6824  0
i2c_i801               10260  0
i2c_core               20992  1 i2c_i801
ib_mthca              109744  0
ib_ipoib               48792  0
ib_uverbs              34128  0
ib_umad                14000  2
ib_ucm                 16520  0
ib_sa                  13884  1 ib_ipoib
ib_cm                  30144  1 ib_ucm
ib_mad                 35896  4 ib_mthca,ib_umad,ib_sa,ib_cm
ib_core                45952  9 
ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad
floppy                 67400  0

>>>>>>>>>>>>>>>>>>>

> 
> Can you try this patch to see if it gets you further and let me know ?
> Note that this is just a potential workaround right now.
> 

I will try rebuilding with the patch and let you know the results.

Thanks,
        -Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060526/bd875b42/attachment.html>


More information about the ewg mailing list