[Fwd: Re: [ofa-general] [NFS/RDMA] Can't mount NFS/RDMA partition]]
Steve Wise
swise at opengridcomputing.com
Fri Apr 24 07:40:44 PDT 2009
Hey Celine,
Thanks for gathering all this info! So the rdma connections work fine
with everything _but_ nfsrdma. And errno 103 indicates the connection
was aborted, maybe by the server (since no failures are logged by the
client).
More below:
Celine Bourde wrote:
> Hi Steve,
>
> This email summarizes the situation:
>
> Standard mount -> OK
> ---------------------
>
> [root at twind ~]# mount -o rw 192.168.0.215:/vol0 /mnt/
> Command works fine.
>
> rdma mount -> KO
> -----------------
>
> [root at twind ~]# mount -o rdma,port=2050 192.168.0.215:/vol0 /mnt/
> Command blocks ! I should perform Ctr+C to kill process.
>
> or
>
> [root at twind ofa_kernel-1.4.1]# strace mount.nfs 192.168.0.215:/vol0
> /mnt/ -o rdma,port=2050
> [..]
> fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
> connect(3, {sa_family=AF_INET, sin_port=htons(610),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> fcntl(3, F_SETFL, O_RDWR) = 0
> sendto(3,
> "-3\245\357\0\0\0\0\0\0\0\2\0\1\206\270\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 40, 0, {sa_family=AF_INET, sin_port=htons(610),
> sin_addr=inet_addr("127.0.0.1")}, 16) = 40
> poll([{fd=3, events=POLLIN}], 1, 3000) = 1 ([{fd=3, revents=POLLIN}])
> recvfrom(3, "-3\245\357\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 8800, MSG_DONTWAIT, {sa_family=AF_INET, sin_port=htons(610),
> sin_addr=inet_addr("127.0.0.1")}, [16]) = 24
> close(3) = 0
> mount("192.168.0.215:/vol0", "/mnt", "nfs", 0,
> "rdma,port=2050,addr=192.168.0.215"
> ..same problem
>
> [root at twind tmp]# dmesg
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>
>
Is there anything logged on the server side?
Also, can you try this again, but on both systems do this before
attempting the mount:
echo 32768 > /proc/sys/sunrpc/rpc_debug
This will enable all the rpc trace points and add a bunch of logging to
/var/log/messages.
Maybe that will show us something. It think the server is aborting the
connection for some reason.
Steve.
More information about the general
mailing list