[openfabrics-ewg] Re: NOP problem in ib_mthca on OFED RC4
Don.Albert at Bull.com
Don.Albert at Bull.com
Tue May 30 07:35:18 PDT 2006
Michael,
> > The ib_mthca module now initializes correctly on
> > both EM64T machines. I noticed some discussion between you and Roland
about
> > making the parameter "fw_cmd_doorbell=0" the default. Did this
> > occur in RC5?
>
> Yes, we changed fw_cmd_doorbell to 0 by default for now because it
seemed
> safer. I expect if you load mthca with fw_cmd_doorbell=1 you still get
an
> error, isn't that right?
>
Although the change in RC5 for fw_cmd_doorbell *seemed* to allow the
ib_mthca module to initialize, I don't think I am out of the woods yet on
this particular machine. The link never comes up, and the other machine,
which is connected back to back with this one, and on which I am trying to
run OpenSM, does not get a response to its MAD packets. When I try to
shut down the openib stack with the "/etc/init.d/openibd stop" script, the
processes hang trying to set device "ib0" down. Here is an excerpt from a
terminal session:
[jatoba] (ib) ib> ibstat
CA 'mthca0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.0.800
Hardware version: a0
Node GUID: 0x0002c90200216e40
System image GUID: 0x0002c90200216e43
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 20
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510a68
Port GUID: 0x0002c90200216e41
[jatoba] (ib) ib> ibstatus
Infiniband device 'mthca0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0021:6e41
base lid: 0x0
sm lid: 0x0
state: 2: INIT
phys state: 5: LinkUp
rate: 20 Gb/sec (4X DDR)
[jatoba] (ib) ib> /etc/init.d/opensmd status
opensm is stopped
[jatoba] (ib) ib> /etc/init.d/openibd status
HCA driver loaded
Configured devices:
ib0
Currently active devices:
ib0
The following modules are also loaded:
ib_cm
[jatoba] (ib) ib> /etc/init.d/openibd stop
At this point the command hangs. Doing a "ps -ef" from another terminal
reveals:
root 6882 6755 0 15:31 pts/0 00:00:00 /bin/bash
/etc/init.d/openibd stop
root 7012 6882 0 15:31 pts/0 00:00:00 /bin/bash /sbin/ifdown
ib0
root 7031 7012 0 15:31 pts/0 00:00:00 ip link set dev ib0
down
I tried using gdb to "attach" to process 7031 to see its stack, but that
hung too, as well as an attempt to see what the status of the interface
was with "/sbin/ifconfig".
It is rather difficult for me to debug this sort of hang, since I
telecommute from Tucson and the machines are located in Phoenix. Anyone
have any suggestions?
-Don Albert-
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20060530/9101e1ec/attachment.html>
More information about the ewg
mailing list