[ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition
Celine Bourde
celine.bourde at ext.bull.net
Wed Dec 17 07:04:00 PST 2008
Hal Rosenstock wrote:
> Hi,
>
> On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde
> <celine.bourde at ext.bull.net> wrote:
>
>> Hi,
>>
>> I can't mount an NFS/RDMA partition.
>> I've applied
>> http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt
>> instructions.
>>
>> Every steps (loading modules, /etc/exports implementation, starting nfs
>> daemon,
>> etc..) seems to be ok, but when I do the last command :
>> mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/
>> the mount processus blocks even last dmesg output seems correct :
>> "RPC: Registered rdma transport module.
>> rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16
>> "
>> If I try "ibstat" after that, I have a kernel panic message :
>> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or
>> resource busy)" because device is in use.
>>
>
> That's an application "panic" meaning some sort of abnormal condition.
>
> I'm not familiar with what NFS/RDMA does with the MAD layer but there
> may be some conflict with the diagnostic tools in this area. Another
> possibility is that the firmware error causes this error condition.
>
>
I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT failed (-16).
But I don't think there is an agreement with mount bug. I saw this
error could be occured with old firmware version but mine is 2.5.9..
My configuration is :
kernel : 2.6.27 with NFS options
last stable OFED 1.4
mount.nfs (linux nfs-utils 1.1.4)
ibstat output (before doing mount) :
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.5.900
Hardware version: a0
Node GUID: 0x0002c903000290b2
System image GUID: 0x0002c903000290b5
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x02510868
Port GUID: 0x0002c903000290b3
Port 2:
State: Initializing
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02510868
>> 100 % of processus is used by ib_mad1
>>
>
>
>> [root at test]top
>> top - 14:55:07 up 19 min, 3 users, load average: 2.00, 1.87, 1.12
>> Tasks: 190 total, 2 running, 188 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 0.0%us, 12.5%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si,
>> 0.0%st
>> Mem: 8066156k total, 615096k used, 7451060k free, 45604k buffers
>> Swap: 8193140k total, 0k used, 8193140k free, 343436k cached
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 2952 root 15 -5 0 0 0 R 100 0.0 5:23.55 ib_mad1
>> 1 root 20 0 10320 688 572 S 0 0.0 0:02.04 init
>> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
>> 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
>> 4 root 15 -5 0 0 0 S 0 0.0 0:00.01 ksoftirqd/0
>>
>>
>> I can't kill mount process (kill -9 or shutdown -R or echo b >
>> sysrq-trigger)
>> and I have to restart the computer using "ipmitool target chassis power
>> reset".
>>
>> Have any idea ?
>>
>
> Is there anything in dmesg or /var/log/messages relating to ib_mad ?
>
No, there is no message relating to ib_mad.
Céline Bourde.
> -- Hal
>
>
>> Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT
>> failed (-16). (I don't think there is an agreement with mount bug). I saw
>> this
>> error could be occured with old firmeware version but mine is 2.5.9 ..
>> For more details see bug report :
>> https://bugs.openfabrics.org/show_bug.cgi?id=1459
>>
>> Thanks for your help.
>>
>> Céline Bourde.
>>
>>
>>
>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>>
>
>
>
More information about the general
mailing list