[Fwd: Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition]]

Celine Bourde celine.bourde at ext.bull.net
Mon Apr 27 03:56:46 PDT 2009


Thanks for the explanation.
Let me know if you have additional information.

We have a contact at Mellanox. I will contact him.

Thanks,

CĂ©line.

Vu Pham wrote:
> Celine,
>
> I'm seeing mlx4 in the log so it is connectX.
>
> nfsrdma does not work with any official connectX' fw release 2.6.0 
> because of fast registering work request problems between nfsrdma and 
> the firmware.
>
> We are currently debugging/fixing those problems.
>
> Do you have direct contact with Mellanox field application engineer? 
> Please contact him/her.
> If not I can send you a contact on private channel.
>
> thanks,
> -vu
>
>> Hi Celine,
>>
>> What HCA do you have on your system? Is it ConnectX? If yes, what is 
>> its firmware version?
>>
>> -vu
>>
>>> Hey Celine,
>>>
>>> Thanks for gathering all this info!  So the rdma connections work 
>>> fine with everything _but_ nfsrdma.  And errno 103 indicates the 
>>> connection was aborted, maybe by the server (since no failures are 
>>> logged by the client).
>>>
>>>
>>> More below:
>>>
>>>
>>> Celine Bourde wrote:
>>>> Hi Steve,
>>>>
>>>> This email summarizes the situation:
>>>>
>>>> Standard mount -> OK
>>>> ---------------------
>>>>
>>>> [root at twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
>>>> Command works fine.
>>>>
>>>> rdma mount -> KO
>>>> -----------------
>>>>
>>>> [root at twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
>>>> Command blocks ! I should perform Ctr+C to kill process.
>>>>
>>>> or
>>>>
>>>> [root at twind ofa_kernel-1.4.1]# strace mount.nfs 192.168.0.215:/vol0 
>>>> /mnt/ -o rdma,port=2050
>>>> [..]
>>>> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>>>> connect(3, {sa_family=AF_INET, sin_port=htons(610), 
>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
>>>> fcntl(3, F_SETFL, O_RDWR)               = 0
>>>> sendto(3, 
>>>> "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 
>>>> 40, 0, {sa_family=AF_INET, sin_port=htons(610), 
>>>> sin_addr=inet_addr("127.0.0.1")}, 16) = 40
>>>> poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
>>>> recvfrom(3, "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 
>>>> 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610), 
>>>> sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
>>>> close(3)                                = 0
>>>> mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, 
>>>> "rdma,port=2050,addr=192.168.0.215"
>>>> ..same problem
>>>>
>>>> [root at twind tmp]# dmesg
>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 
>>>> 32 ird 16
>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 
>>>> 32 ird 16
>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 
>>>> 32 ird 16
>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 
>>>> 32 ird 16
>>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>>>
>>>>
>>>
>>> Is there anything logged on the server side?
>>>
>>> Also, can you try this again, but on both systems do this before 
>>> attempting the mount:
>>>
>>> echo 32768 > /proc/sys/sunrpc/rpc_debug
>>>
>>> This will enable all the rpc trace points and add a bunch of logging 
>>> to /var/log/messages.
>>> Maybe that will show us something.  It think the server is aborting 
>>> the connection for some reason.
>>>
>>> Steve.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> general mailing list
>>> general at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>>
>>> To unsubscribe, please visit 
>>> http://openib.org/mailman/listinfo/openib-general
>>
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit 
>> http://openib.org/mailman/listinfo/openib-general
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit 
> http://openib.org/mailman/listinfo/openib-general
>
>




More information about the general mailing list