[ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition

Celine Bourde celine.bourde at ext.bull.net
Wed Dec 17 07:04:00 PST 2008


Hal Rosenstock wrote:
> Hi,
>
> On Wed, Dec 17, 2008 at 7:56 AM, Celine Bourde
> <celine.bourde at ext.bull.net> wrote:
>   
>> Hi,
>>
>> I can't mount an NFS/RDMA partition.
>> I've applied
>> http://www.openfabrics.org//downloads/OFED/ofed-1.4/OFED-1.4-docs/nfs-rdma.release-notes.txt
>> instructions.
>>
>> Every steps (loading modules, /etc/exports implementation, starting nfs
>> daemon,
>> etc..) seems to be ok, but when I do the last command :
>> mount -o rdma,port=2050 192.168.0.13:/export /tmp/nfs_client/
>> the mount processus blocks even last dmesg output seems correct  :
>> "RPC: Registered rdma transport module.
>> rpcrdma: connection to 192.168.0.13:2050 on mlx4_0, memreg 5 slots 32 ird 16
>> "
>> If I try "ibstat" after that, I have a kernel panic message :
>> "ibpanic: [4826] main: stat of IB device 'mlx4_0' failed: (Device or
>> resource busy)" because device is in use.
>>     
>
> That's an application "panic" meaning some sort of abnormal condition.
>
> I'm not familiar with what NFS/RDMA does with the MAD layer but there
> may be some conflict with the diagnostic tools in this area. Another
> possibility is that the firmware error causes this error condition.
>
>   
I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT failed (-16). 
But I don't think there is an agreement with mount bug. I saw this
error could be occured with old firmware version but mine is 2.5.9.. 


My configuration is :

kernel : 2.6.27 with NFS options
last stable OFED 1.4
mount.nfs (linux nfs-utils 1.1.4)
ibstat output (before doing mount) :

CA 'mlx4_0'

        CA type: MT26428
	Number of ports: 2
	Firmware version: 2.5.900
	Hardware version: a0
	Node GUID: 0x0002c903000290b2
	System image GUID: 0x0002c903000290b5
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 40
		Base lid: 2
		LMC: 0
		SM lid: 1
		Capability mask: 0x02510868
		Port GUID: 0x0002c903000290b3

        Port 2:
		State: Initializing
		Physical state: LinkUp
		Rate: 40
		Base lid: 0
		LMC: 0
		SM lid: 0
		Capability mask: 0x02510868


>> 100 % of processus is used by ib_mad1
>>     
>
>   
>> [root at test]top
>> top - 14:55:07 up 19 min,  3 users,  load average: 2.00, 1.87, 1.12
>> Tasks: 190 total,   2 running, 188 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.0%us, 12.5%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>>  0.0%st
>> Mem:   8066156k total,   615096k used,  7451060k free,    45604k buffers
>> Swap:  8193140k total,        0k used,  8193140k free,   343436k cached
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 2952 root      15  -5     0    0    0 R  100  0.0   5:23.55 ib_mad1
>>   1 root      20   0 10320  688  572 S    0  0.0   0:02.04 init
>>   2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
>>   3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0
>>   4 root      15  -5     0    0    0 S    0  0.0   0:00.01 ksoftirqd/0
>>
>>
>> I can't kill mount process (kill -9 or shutdown -R or echo b >
>> sysrq-trigger)
>> and I have to restart the computer using "ipmitool target chassis power
>> reset".
>>
>> Have any idea ?
>>     
>
> Is there anything in dmesg or /var/log/messages relating to ib_mad ?
>   
No, there is no message relating to ib_mad.

Céline Bourde.

> -- Hal
>
>   
>> Moreover, I sometimes have this dmesg log: mlx4_core 0000:01:00.0: HW2SW_MPT
>> failed (-16). (I don't think there is an agreement with mount bug). I saw
>> this
>> error could be occured with old firmeware version but mine is 2.5.9 ..
>> For more details see bug report :
>> https://bugs.openfabrics.org/show_bug.cgi?id=1459
>>
>> Thanks for your help.
>>
>> Céline Bourde.
>>
>>
>>
>>
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit
>> http://openib.org/mailman/listinfo/openib-general
>>
>>     
>
>
>   




More information about the general mailing list