[Fwd: Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition]]

Steve Wise swise at opengridcomputing.com
Fri Apr 24 07:40:44 PDT 2009


Hey Celine,

Thanks for gathering all this info!  So the rdma connections work fine 
with everything _but_ nfsrdma.  And errno 103 indicates the connection 
was aborted, maybe by the server (since no failures are logged by the 
client).


More below:


Celine Bourde wrote:
> Hi Steve,
>
> This email summarizes the situation:
>
> Standard mount -> OK
> ---------------------
>
> [root at twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
> Command works fine.
>
> rdma mount -> KO
> -----------------
>
> [root at twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
> Command blocks ! I should perform Ctr+C to kill process.
>
> or
>
> [root at twind ofa_kernel-1.4.1]# strace mount.nfs 192.168.0.215:/vol0 
> /mnt/ -o rdma,port=2050
> [..]
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(610), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> fcntl(3, F_SETFL, O_RDWR)               = 0
> sendto(3, 
> "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"..., 
> 40, 0, {sa_family=AF_INET, sin_port=htons(610), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = 40
> poll([{fd=3, events=POLLIN}], 1, 3000)  = 1 ([{fd=3, revents=POLLIN}])
> recvfrom(3, "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 
> 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610), 
> sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
> close(3)                                = 0
> mount("192.168.0.215:/vol0", "/mnt", "nfs", 0, 
> "rdma,port=2050,addr=192.168.0.215"
> ..same problem
>
> [root at twind tmp]# dmesg
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>
>

Is there anything logged on the server side?

Also, can you try this again, but on both systems do this before 
attempting the mount:

echo 32768 > /proc/sys/sunrpc/rpc_debug

This will enable all the rpc trace points and add a bunch of logging to 
/var/log/messages. 

Maybe that will show us something.  It think the server is aborting the 
connection for some reason. 


Steve.







More information about the general mailing list