[openfabrics-ewg] Re: NOP problem in ib_mthca on OFED RC4

Don.Albert at Bull.com Don.Albert at Bull.com
Tue May 30 07:35:18 PDT 2006


Michael,

> >   The ib_mthca module now initializes correctly on
> > both EM64T machines.  I noticed some discussion between you and Roland 
about
> > making the parameter "fw_cmd_doorbell=0" the default.  Did this 
> > occur in RC5?
> 
> Yes, we changed fw_cmd_doorbell to 0 by default for now because it 
seemed
> safer. I expect if you load mthca with fw_cmd_doorbell=1 you still get 
an
> error, isn't that right?
> 

Although the change in RC5 for fw_cmd_doorbell *seemed* to allow the 
ib_mthca module to initialize, I don't think I am out of the woods yet on 
this particular machine.  The link never comes up, and the other machine, 
which is connected back to back with this one, and on which I am trying to 
run OpenSM,  does not get a response to its MAD packets.  When I try to 
shut down the openib stack with the "/etc/init.d/openibd stop" script, the 
processes hang trying to set device "ib0" down.  Here is an excerpt from a 
terminal session:

    [jatoba] (ib) ib> ibstat
    CA 'mthca0'
    CA type: MT25204
    Number of ports: 1
    Firmware version: 1.0.800
    Hardware version: a0
    Node GUID: 0x0002c90200216e40
    System image GUID: 0x0002c90200216e43
    Port 1:
    State: Initializing
    Physical state: LinkUp
    Rate: 20
    Base lid: 0
    LMC: 0
    SM lid: 0
    Capability mask: 0x02510a68
    Port GUID: 0x0002c90200216e41
    [jatoba] (ib) ib> ibstatus
    Infiniband device 'mthca0' port 1 status:
    default gid:     fe80:0000:0000:0000:0002:c902:0021:6e41
    base lid:        0x0
    sm lid:          0x0
    state:           2: INIT
    phys state:      5: LinkUp
    rate:            20 Gb/sec (4X DDR)

    [jatoba] (ib) ib> /etc/init.d/opensmd status
    opensm is stopped
    [jatoba] (ib) ib> /etc/init.d/openibd status

    HCA driver loaded

    Configured devices:
    ib0

    Currently active devices:
    ib0

    The following modules are also loaded:

    ib_cm

    [jatoba] (ib) ib> /etc/init.d/openibd stop


At this point the command hangs.  Doing a "ps -ef" from another terminal 
reveals:

    root      6882  6755  0 15:31 pts/0    00:00:00 /bin/bash 
/etc/init.d/openibd stop
    root      7012  6882  0 15:31 pts/0    00:00:00 /bin/bash /sbin/ifdown 
ib0
    root      7031  7012  0 15:31 pts/0    00:00:00 ip link set dev ib0 
down

I tried using gdb to "attach" to process 7031 to see its stack, but that 
hung too, as well as an attempt to see what the status of the interface 
was with "/sbin/ifconfig". 

It is rather difficult for me to debug this sort of hang, since I 
telecommute from Tucson and the machines are located in Phoenix.  Anyone 
have any suggestions?

  -Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060530/9101e1ec/attachment.html>


More information about the ewg mailing list