[Fwd: Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition]]

Diego Moreno Diego.Moreno-Lazaro at bull.net
Tue Apr 28 05:45:01 PDT 2009


Hi,

I'm working with Celine trying to make NFS RDMA work. We installed a new 
  firmware (2.6.636). We still have the problem but now we have more 
information on client side.

- With the workaround (memreg 6) we can mount without any problem. We 
can read a file but if we try to create a file with dd, application 
hangs and then we have to do 'umount -f'. There is no message on server. 
Message on client:

rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)


- With fast registration:

There is no message on server. dmesg client output with fast registration:


rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
ird 16
------------[ cut here ]------------
WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3c/0x92()
Modules linked in: xprtrdma autofs4 hidp nfs lockd nfs_acl rfcomm l2cap 
bluetooth sunrpc iptable_filter ip_tables ip6t_REJECT xt_tcpudp 
ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq 
freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa 
ipv6 ib_uverbs ib_umad iw_nes ib_ipath ib_mthca dm_multipath scsi_dh 
raid0 sbs sbshc battery acpi_memhotplug ac parport_pc lp parport mlx4_ib 
ib_mad ib_core e1000e sr_mod joydev cdrom mlx4_core i5000_edac edac_core 
shpchp rtc_cmos sg pcspkr rtc_core rtc_lib i2c_i801 i2c_core serio_raw 
button dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix 
libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last 
unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.27_ofa_compil #2

Call Trace:
  <IRQ>  [<ffffffff80235b8d>] warn_on_slowpath+0x51/0x77
  [<ffffffff80229b79>] __wake_up+0x38/0x4f
  [<ffffffff80246d57>] __wake_up_bit+0x28/0x2d
  [<ffffffffa05485af>] rpc_wake_up_task_queue_locked+0x223/0x24b [sunrpc]
  [<ffffffffa054861e>] rpc_wake_up_status+0x47/0x82 [sunrpc]
  [<ffffffff80239c49>] local_bh_enable_ip+0x3c/0x92
  [<ffffffffa0638fd1>] rpcrdma_conn_func+0x6d/0x7c [xprtrdma]
  [<ffffffffa063b316>] rpcrdma_qp_async_error_upcall+0x45/0x5a [xprtrdma]
  [<ffffffffa0294bb3>] mlx4_ib_qp_event+0xf9/0x100 [mlx4_ib]
  [<ffffffff802443da>] __queue_work+0x22/0x32
  [<ffffffffa01fc5d4>] mlx4_qp_event+0x8a/0xad [mlx4_core]
  [<ffffffffa01f50a5>] mlx4_eq_int+0x55/0x291 [mlx4_core]
  [<ffffffffa01f52f0>] mlx4_msi_x_interrupt+0xf/0x16 [mlx4_core]
  [<ffffffff802624f4>] handle_IRQ_event+0x25/0x53
  [<ffffffff80263c0a>] handle_edge_irq+0xe3/0x123
  [<ffffffff8020e907>] do_IRQ+0xf1/0x15e
  [<ffffffff8020c381>] ret_from_intr+0x0/0xa
  <EOI>  [<ffffffffa0549c3e>] nul_marshal+0x0/0x20 [sunrpc]
  [<ffffffff80212474>] mwait_idle+0x41/0x45
  [<ffffffff8020abdf>] cpu_idle+0x7e/0x9c

---[ end trace 5cc994fbe7e141af ]---
rpcrdma: connection to 192.168.0.215:2050 closed (-103)
rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
ird 16
rpcrdma: connection to 192.168.0.215:2050 closed (-103)


Thanks,

Diego

Vu Pham wrote:
> Celine Bourde wrote:
>> We have still the same problem, even changing the registration method.
>>
>> mount doesn't reply and this is the output of dmesg on client:
>>
>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
>> ird 16
>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
>> ird 16
>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>> ib0: multicast join failed for 
>> ff12:401b:ffff:0000:0000:0000:0000:0001, status -22
>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
>> ird 16
>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>
>> I have still another doubt: if the firmware is the problem, why is NFS 
>> RDMA working with a kernel 2.6.27.10 and without OFED 1.4 with these 
>> same cards??
> 
> On 2.6.27.10 nfsrdma does not use fast registration work request; 
> therefore, it works well with connectX
> 
>  From 2.6.28 and so on, nfsrdma start implementing/using fast 
> registration work request and commit without verifying it with connectX
> 
> I'm looking and trying to resolve those glitches/issues now
> 
> -vu
> 
>>
>> Thanks,
>>
>> Céline Bourde.
>>
>> Tom Talpey wrote:
>>> At 06:56 AM 4/27/2009, Celine Bourde wrote:
>>>  
>>>> Thanks for the explanation.
>>>> Let me know if you have additional information.
>>>>
>>>> We have a contact at Mellanox. I will contact him.
>>>>
>>>> Thanks,
>>>>
>>>> Céline.
>>>>
>>>> Vu Pham wrote:
>>>>   
>>>>> Celine,
>>>>>
>>>>> I'm seeing mlx4 in the log so it is connectX.
>>>>>
>>>>> nfsrdma does not work with any official connectX' fw release 2.6.0 
>>>>> because of fast registering work request problems between nfsrdma 
>>>>> and the firmware.
>>>>>       
>>>
>>> There is a very simple workaround if you don't have the latest mlx4 
>>> firmware.
>>>
>>> Just set the client to use the all-physical memory registration mode. 
>>> This will
>>> avoid making unsupported reregistration requests, which the firmware 
>>> advertised.
>>>
>>> Before mounting, enter (as root)
>>>
>>>     sysctl -w sunrpc.rdma_memreg_strategy = 6
>>>
>>> The client should work properly after this.
>>>
>>> If you do have access to the fixed firmware, I recommend using the 
>>> default
>>> setting (5) as it provides greater safety on the client.
>>>
>>> Tom.
>>>
>>>  
>>>>> We are currently debugging/fixing those problems.
>>>>>
>>>>> Do you have direct contact with Mellanox field application 
>>>>> engineer? Please contact him/her.
>>>>> If not I can send you a contact on private channel.
>>>>>
>>>>> thanks,
>>>>> -vu
>>>>>
>>>>>     
>>>>>> Hi Celine,
>>>>>>
>>>>>> What HCA do you have on your system? Is it ConnectX? If yes, what 
>>>>>> is its firmware version?
>>>>>>
>>>>>> -vu
>>>>>>
>>>>>>       
>>>>>>> Hey Celine,
>>>>>>>
>>>>>>> Thanks for gathering all this info!  So the rdma connections work 
>>>>>>> fine with everything _but_ nfsrdma.  And errno 103 indicates the 
>>>>>>> connection was aborted, maybe by the server (since no failures 
>>>>>>> are logged by the client).
>>>>>>>
>>>>>>>
>>>>>>> More below:
>>>>>>>
>>>>>>>
>>>>>>> Celine Bourde wrote:
>>>>>>>         
>>>>>>>> Hi Steve,
>>>>>>>>
>>>>>>>> This email summarizes the situation:
>>>>>>>>
>>>>>>>> Standard mount -> OK
>>>>>>>> ---------------------
>>>>>>>>
>>>>>>>> [root at twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
>>>>>>>> Command works fine.
>>>>>>>>
>>>>>>>> rdma mount -> KO
>>>>>>>> -----------------
>>>>>>>>
>>>>>>>> [root at twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
>>>>>>>> Command blocks ! I should perform Ctr+C to kill process.
>>>>>>>>
>>>>>>>> or
>>>>>>>>
>>>>>>>> [root at twind ofa_kernel-1.4.1]# strace mount.nfs 
>>>>>>>> 192.168.0.215:/vol0 /mnt/ -o rdma,port=2050
>>>>>>>> [..]
>>>>>>>> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>>>>>>>> connect(3, {sa_family=AF_INET, sin_port=htons(610), 
>>>>>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
>>>>>>>> fcntl(3, F_SETFL, O_RDWR)               = 0
>>>>>>>> sendto(3,
>>>>>>>>             
>>>> "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>>>   
>>>>>>>> 40, 0, {sa_family=AF_INET, sin_port=htons(610), 
>>>>>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = 40
>>>>>>>> poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, 
>>>>>>>> revents=POLLIN}])
>>>>>>>> recvfrom(3, 
>>>>>>>> "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 8800, 
>>>>>>>> MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610), 
>>>>>>>> sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
>>>>>>>> close(3)                                = 0
>>>>>>>> mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, 
>>>>>>>> "rdma,port=2050,addr=192.168.0.215"
>>>>>>>> ..same problem
>>>>>>>>
>>>>>>>> [root at twind tmp]# dmesg
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 
>>>>>>>> slots 32 ird 16
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 
>>>>>>>> slots 32 ird 16
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 
>>>>>>>> slots 32 ird 16
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 
>>>>>>>> slots 32 ird 16
>>>>>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>> Is there anything logged on the server side?
>>>>>>>
>>>>>>> Also, can you try this again, but on both systems do this before 
>>>>>>> attempting the mount:
>>>>>>>
>>>>>>> echo 32768 > /proc/sys/sunrpc/rpc_debug
>>>>>>>
>>>>>>> This will enable all the rpc trace points and add a bunch of 
>>>>>>> logging to /var/log/messages.
>>>>>>> Maybe that will show us something.  It think the server is 
>>>>>>> aborting the connection for some reason.
>>>>>>>
>>>>>>> Steve.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> general mailing list
>>>>>>> general at lists.openfabrics.org
>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>>>>
>>>>>>> To unsubscribe, please visit 
>>>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>>>           
>>>>>> _______________________________________________
>>>>>> general mailing list
>>>>>> general at lists.openfabrics.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>>>
>>>>>> To unsubscribe, please visit 
>>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>>         
>>>>> _______________________________________________
>>>>> general mailing list
>>>>> general at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>>
>>>>> To unsubscribe, please visit 
>>>>> http://openib.org/mailman/listinfo/openib-general
>>>>>
>>>>>
>>>>>       
>>>> _______________________________________________
>>>> general mailing list
>>>> general at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>>
>>>> To unsubscribe, please visit 
>>>> http://openib.org/mailman/listinfo/openib-general
>>>>
>>>>     
>>>
>>>
>>>
>>>   
>>
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
> 
> 




More information about the general mailing list