[ewg] [Fwd: nfs-rdma hanging with Ubuntu 9.10]
Ross Smith
myxiplx at googlemail.com
Tue Jan 26 07:09:33 PST 2010
Hmm, the portlist doesn't look good:
$ cat /proc/fs/nfsd/portlist
tcp 2049
udp 2049
But attempting to modify that fails:
# echo 20049 > /proc/fs/nfsd/portlist
-bash: echo: write error: Bad file descriptor
And I get similar problems attempting to enable the debugging logs:
# echo 32767 > /proc/sys/sunrpc/rpc_debug
-bash: /proc/sys/sunrpc/rpc_debug: Permission denied
Up to that point through everything looks like it's loading fine:
Ubuntu server:
===========
# modprobe mlx4_ib
# modprobe ib_ipoib
# ifconfig ib0 192.168.101.5 netmask 255.255.255.0 up
dmesg results:
[ 456.793661] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
[ 456.987043] ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 459.988683] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 470.686631] ib0: no IPv6 routers present
# modprobe svcrdma
# /etc/init.d/nfs-kernel-server restart
dmesg:
[ 524.520198] nfsd: last server has exited, flushing export cache
[ 529.292366] svc: failed to register lockdv1 RPC service (errno 97).
[ 529.293289] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
recovery directory
[ 529.293304] NFSD: starting 90-second grace period
Ubuntu client:
==========
# modprobe mlx4_ib
# modprobe ib_ipoib
# ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
dmesg:
[ 97.576507] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
[ 97.769582] ADDRCONF(NETDEV_UP): ib0: link is not ready
[ 100.765318] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[ 110.899591] ib0: no IPv6 routers present
# modprobe xprtrdma
dmesg:
[ 169.269689] RPC: Registered udp transport module.
[ 169.269691] RPC: Registered tcp transport module.
[ 169.289755] RPC: Registered rdma transport module.
Ross
On Tue, Jan 26, 2010 at 2:32 PM, Tom Tucker <tom at opengridcomputing.com> wrote:
> Ross Smith wrote:
>>
>> A quick addendum to that, I've just had a look at rpcinfo on both the
>> Ubuntu and Solaris NFS servers, does this indicate that nfs-rdma is
>> not actually running?
>>
>> rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100024 1 udp 37031 status
>> 100024 1 tcp 58463 status
>> 100021 1 udp 34989 nlockmgr
>> 100021 3 udp 34989 nlockmgr
>> 100021 4 udp 34989 nlockmgr
>> 100021 1 tcp 47979 nlockmgr
>> 100021 3 tcp 47979 nlockmgr
>> 100021 4 tcp 47979 nlockmgr
>> 100003 2 udp 2049 nfs
>> 100003 3 udp 2049 nfs
>> 100003 4 udp 2049 nfs
>> 100003 2 tcp 2049 nfs
>> 100003 3 tcp 2049 nfs
>> 100003 4 tcp 2049 nfs
>>
>>
>
> Hi Ross:
>
> No, although that would be very nice, the Linux network maintainer didn't
> want RDMA transports sharing the network port space unfortunately.
>
> You would need to do this on the server to see if it is listening:
>
> # cat /proc/fs/nfsd/portlist
>
> You should see something like this:
>
> rdma 20049
> tcp 2049
> udp 2049
>
> The top line indicates that the rdma transport is listening on port 20049.
>
> If it's not showing, do this:
>
> # echo 20049 > /proc/fs/nfsd/portlist
>
> and repeat the 'cat' step above.
>
> To give us a little more detail to help debug, do this:
>
> # echo 32767 > /proc/sys/sunrpc/rpc_debug
>
> on both the client and server, then try the mount again. The dmesg log
> should have a detail trace on what is happening.
>
> Turn off the debug output as follows:
>
> # echo 0 > /proc/sys/sunrpc/rpc_debug
>
> Tom
>
>>
>> On Tue, Jan 26, 2010 at 12:24 PM, Ross Smith <myxiplx at googlemail.com>
>> wrote:
>>
>>>
>>> Hey everyone,
>>>
>>> It's taken me a week, but I've finally gotten the 2.7.00 firmware for
>>> this system. I've also taken the step of installing a Ubuntu 9.10
>>> server for testing in addition to the Solaris server I already have.
>>>
>>> So far I'm still having no joy, nfs mounts fine over TCP, but if I try
>>> to use RDMA it fails.
>>>
>>> Machines in use:
>>> ============
>>> Solaris Server, build 129 (about 4 weeks old), using built in Infiniband
>>> drivers
>>> Solaris Client, same build
>>> Ubuntu 9.10 Server, using kernel drivers
>>> Ubuntu 9.10 Client
>>> CentOS 5.2 Client, with OFED 1.4.2 and nfs-utils 1.1.6
>>>
>>> All five machines are on identical hardware, with Mellanox ConnectX
>>> infiniband cards running firmware 2.7.00.
>>>
>>> They all seem to be running Infiniband fine, ipoib works perfectly and
>>> I can connect regular tcp nfs mounts over the infiniband links without
>>> any issues.
>>>
>>> With regular tcp nfs I'm getting consistent speeds of 300MB/s.
>>>
>>> However, nfs-rdma just does not want to work, no matter which
>>> combination of servers and clients I try:
>>>
>>> Ubuntu Client -> Solaris
>>> =================
>>> Commands used:
>>> # modprobe xprtrdma
>>> # mount -o proto=rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>
>>> This is the entire dmesg log, from first loading the driver, to
>>> attempting to connect nfs-rdma:
>>>
>>> [ 46.834146] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
>>> 4, 2008)
>>> [ 47.028093] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>> [ 52.018562] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>> [ 52.018698] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [ 54.014289] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [ 58.006864] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [ 62.027202] ib0: no IPv6 routers present
>>> [ 65.120791] RPC: Registered udp transport module.
>>> [ 65.120795] RPC: Registered tcp transport module.
>>> [ 65.129162] RPC: Registered rdma transport module.
>>> [ 65.992081] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [ 81.962465] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [ 83.593144] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 148.476967] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>> [ 148.480488] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>> [ 148.484421] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>> [ 148.488376] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>> [ 4311.663188] svc: failed to register lockdv1 RPC service (errno 97).
>>>
>>> At this point, the attempt crashed the Solaris server, and hung the
>>> mount attempt on the Ubuntu client, requiring ctrl-c on the client,
>>> and automatically rebooting the server.
>>>
>>> I then tried again, connecting to the Ubuntu nfs server. This time
>>> neither device hung or crashed, but I had very similar messages in the
>>> client log:
>>>
>>> # mount -o proto=rdma,port=20049 192.168.101.5:/home/ross/nfsexport
>>> ./nfstest
>>>
>>> [ 4435.102852] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> [ 4435.107492] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> [ 4435.111471] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> [ 4435.115468] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>
>>> So it seems that it's not the server: both Solaris and Ubuntu have
>>> the same problem, although Ubuntu at least does not crash when clients
>>> attempt to connect.
>>>
>>> I also get the same error if I attempt to connect from the CentOS 5.2
>>> machine which is using regular OFED to the Ubuntu server:
>>>
>>> CentOS 5.2 -> Ubuntu
>>> ================
>>> This time I'm running mount.rnfs directly as per the instructions in
>>> the OFED nfs-rdma release notes.
>>>
>>> commands used:
>>> # modprobe xprtrdma
>>> # mount.rnfs 192.168.101.5:/home/ross/nfsexport ./rdmatest -i -o
>>> proto=rdma,port=20049
>>>
>>> dmesg results look very similar:
>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>
>>> However attempting this has a bad effect on CentOS - the client
>>> crashes and I loose my ssh session.
>>>
>>> Does anybody have any ideas?
>>>
>>> thanks,
>>>
>>> Ross
>>>
>>>
>>>
>>> On Mon, Jan 18, 2010 at 6:31 PM, David Brean <David.Brean at sun.com> wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> I agree, update the HCA firmware before proceeding. [The description in
>>>> Bugzilla Bug 1711 seems to match the problem that you are observing.]
>>>>
>>>> Also, if you want to help diagnose the "ib0: post_send failed", take a
>>>> look
>>>> at http://lists.openfabrics.org/pipermail/general/2009-July/061118.html.
>>>>
>>>> -David
>>>>
>>>> Ross Smith wrote:
>>>>
>>>> Hi Tom,
>>>>
>>>> No, you're right - I'm just using the support that's built into the
>>>> kernel, and I agree, diagnostics from Solaris is proving very tricky.
>>>> I do have a Solaris client connected to this and showing some decent
>>>> speeds (over 900Mb/s), but I've been thinking that I might need to get
>>>> a Linux server running for testing before I spend much more time
>>>> trying to get the two separate systems working.
>>>>
>>>> However, I have found over the weekend that I'm running older firmware
>>>> and need that updating. I'd missed that in the nfs-rdma readme so I'm
>>>> pretty sure that's going to be causing problems. I'm trying to get
>>>> that resolved before I do too much other testing.
>>>>
>>>> Regular NFS running over the ipoib link seems fine, and I don't get
>>>> any extra warnings using that. I can also run a full virtual machine
>>>> quite happily over NFS, so despite the warnings, the link does appear
>>>> stable and reliable.
>>>>
>>>> Ross
>>>>
>>>>
>>>>
>>>> On Mon, Jan 18, 2010 at 4:30 PM, Tom Tucker <tom at opengridcomputing.com>
>>>> wrote:
>>>>
>>>>
>>>> Hi Ross:
>>>>
>>>> I would check that you have IB RDMA actually working. The core transport
>>>> issues suggest that there may be network problems that will prevent
>>>> NFSRDMA
>>>> from working properly.
>>>>
>>>> The first question is whether or not you are actually using OFED. You're
>>>> not
>>>> -- right? You're just using the support built into the 2.6.31 kernel?
>>>>
>>>> Second I don't think the mount is actually completing. I think the
>>>> command
>>>> is returning, but the mount never actually finishes. It's sitting there
>>>> hung
>>>> trying to perform the first RPC to the server (RPC_NOP) and it's never
>>>> succeeding. That's why you see all those connect/disconnect messages in
>>>> your
>>>> log file. It tries to send, gets an error, disconnects, reconnects,
>>>> tries to
>>>> send .... you get the picture.
>>>>
>>>> Step 1 I think would be to ensure that you actually have IB up and
>>>> running.
>>>> IPoIB between the two seems a little dodgy given the dmesg log. Do you
>>>> have
>>>> another Linux box you can use to test out connectivity/configuration
>>>> with
>>>> your victim? There are test programs in OFED (rping) that would help you
>>>> do
>>>> this, but I don't believe they are available on Solaris.
>>>>
>>>> Tom
>>>>
>>>> Steve Wise wrote:
>>>>
>>>>
>>>> nfsrdma hang on ewg...
>>>>
>>>>
>>>>
>>>> -------- Original Message --------
>>>> Subject: [ewg] nfs-rdma hanging with Ubuntu 9.10
>>>> Date: Fri, 15 Jan 2010 13:28:31 +0000
>>>> From: Ross Smith <myxiplx at googlemail.com>
>>>> To: ewg at openfabrics.org
>>>>
>>>>
>>>>
>>>> Hi folks, it's me again I'm afraid.
>>>>
>>>> Thanks to the help from this list, I have ipoib working, however I
>>>> seem to be having a few problems, not least of which is commands
>>>> hanging if I attempt to use nfs-rdma.
>>>>
>>>> Although the rmda mount command completes, the system then becomes
>>>> unresponsive if I attempt any command such as 'ls', even outside of
>>>> the mounted folder. Umount also fails with the error "device is
>>>> busy".
>>>>
>>>> If anybody can spare the time to help it would be very much
>>>> appreciated. I do seem to have a lot of warnings in the logs, but
>>>> although I've tried searching for solutions haven't found anything
>>>> yet.
>>>>
>>>>
>>>> System details
>>>> ============
>>>> - Ubuntu 9.10
>>>> (kernel 2.6.31)
>>>> - Mellanox ConnectX QDR card
>>>> - Flextronics DDR switch
>>>> - OpenSolaris NFS server, running one of the latest builds for
>>>> troubleshooting
>>>> - OpenSM running on another Ubuntu 9.10 box with a Mellanox
>>>> Infinihost III Lx card
>>>>
>>>> I am using the kernel drivers only, I have not installed OFED on this
>>>> machine.
>>>>
>>>>
>>>> Loading driver
>>>> ============
>>>> The driver appears to load, and ipoib works, but there are rather a
>>>> lot of warnings from dmesg.
>>>>
>>>> I am loading the driver with:
>>>> $ sudo modprobe mlx4_ib
>>>> $ sudo modprobe ib_ipoib
>>>> $ sudo ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>>>>
>>>> And that leaves me with:
>>>> $ lsmod
>>>> Module Size Used by
>>>> ib_ipoib 72452 0
>>>> ib_cm 37196 1 ib_ipoib
>>>> ib_sa 19812 2 ib_ipoib,ib_cm
>>>> mlx4_ib 42720 0
>>>> ib_mad 37524 3 ib_cm,ib_sa,mlx4_ib
>>>> ib_core 57884 5 ib_ipoib,ib_cm,ib_sa,mlx4_ib,ib_mad
>>>> binfmt_misc 8356 1
>>>> ppdev 6688 0
>>>> psmouse 56180 0
>>>> serio_raw 5280 0
>>>> mlx4_core 84728 1 mlx4_ib
>>>> joydev 10272 0
>>>> lp 8964 0
>>>> parport 35340 2 ppdev,lp
>>>> iptable_filter 3100 0
>>>> ip_tables 11692 1 iptable_filter
>>>> x_tables 16544 1 ip_tables
>>>> usbhid 38208 0
>>>> e1000e 122124 0
>>>>
>>>>
>>>> At this point I can ping the Solaris server over the IP link.
>>>> Although I do need to issue a ping from Solaris before I get a reply.
>>>> I'm mentioning that it in case it's relevant, but at this point I'm
>>>> assuming that's just a firewall setting on the server.
>>>>
>>>> But although ping works, I am starting to get some dmesg warnings, I
>>>> just don't know if they are relevant:
>>>> [ 313.692072] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
>>>> 4,
>>>> 2008)
>>>> [ 313.885220] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>> [ 316.880450] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 316.880573] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>> [ 316.880789] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 320.873613] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 327.147114] ib0: no IPv6 routers present
>>>> [ 328.861550] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 344.834440] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 360.808312] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [ 376.782186] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>
>>>> And at this point however, regular nfs mounts work fine over the ipoib
>>>> link:
>>>> $ sudo mount 192.168.100.1:/test/rdma ./nfstest
>>>>
>>>> Bug again, that again adds warnings to dmesg:
>>>> [ 826.456902] RPC: Registered udp transport module.
>>>> [ 826.456905] RPC: Registered tcp transport module.
>>>> [ 841.553135] svc: failed to register lockdv1 RPC service (errno 97).
>>>>
>>>> And the speed is definitely nothing to write home about, copying a
>>>> 100mb file takes over 10 seconds:
>>>> $ time cp ./100mb ./100mb2
>>>>
>>>> real 0m10.472s
>>>> user 0m0.000s
>>>> sys 0m1.248s
>>>>
>>>> And again with warnings appearing in dmesg:
>>>> [ 872.373364] ib0: post_send failed
>>>> [ 872.373407] ib0: post_send failed
>>>> [ 872.373448] ib0: post_send failed
>>>>
>>>> I think this is a client issue rather than a problem on the server as
>>>> the same test on an OpenSolaris client takes under half a second:
>>>> # time cp ./100mb ./100mb2
>>>>
>>>> real 0m0.334s
>>>> user 0m0.001s
>>>> sys 0m0.176s
>>>>
>>>> Although the system is definitely not right, my long term aim is to
>>>> run nfs-rdma on this system, so my next test was to try that and see
>>>> if the speed improved:
>>>>
>>>> $ sudo umount ./nfstest
>>>> $ sudo mount -o rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>>
>>>> That takes a long time to connect. It does eventually go through, but
>>>> only after the following errors in dmesg:
>>>>
>>>> [ 1140.698659] RPC: Registered rdma transport module.
>>>> [ 1155.697672] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1160.688455] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1160.693818] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1160.695131] svc: failed to register lockdv1 RPC service (errno 97).
>>>> [ 1170.676049] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1170.681458] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1190.647355] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1190.652778] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1220.602353] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1220.607809] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1250.557397] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1250.562817] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1281.522735] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1281.528442] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1311.477845] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1311.482983] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [ 1341.432758] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>> [ 1341.438212] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>>
>>>> However, at this point my shell session becomes unresponsive if I
>>>> attempt so much as a 'ls'. The system hasn't hung completely however
>>>> as I can still connect another ssh session and restart with
>>>> $ sudo init 6
>>>>
>>>> Can anybody help? Is there anything obvious I am doing wrong here?
>>>>
>>>> thanks,
>>>>
>>>> Ross
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>>
>>
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>
>
>
More information about the ewg
mailing list