[ewg] [Fwd: nfs-rdma hanging with Ubuntu 9.10]

Tue Jan 26 06:32:51 PST 2010

Ross Smith wrote:
> A quick addendum to that, I've just had a look at rpcinfo on both the
> Ubuntu and Solaris NFS servers, does this indicate that nfs-rdma is
> not actually running?
>
> rpcinfo -p
>    program vers proto   port
>     100000    2   tcp    111  portmapper
>     100000    2   udp    111  portmapper
>     100024    1   udp  37031  status
>     100024    1   tcp  58463  status
>     100021    1   udp  34989  nlockmgr
>     100021    3   udp  34989  nlockmgr
>     100021    4   udp  34989  nlockmgr
>     100021    1   tcp  47979  nlockmgr
>     100021    3   tcp  47979  nlockmgr
>     100021    4   tcp  47979  nlockmgr
>     100003    2   udp   2049  nfs
>     100003    3   udp   2049  nfs
>     100003    4   udp   2049  nfs
>     100003    2   tcp   2049  nfs
>     100003    3   tcp   2049  nfs
>     100003    4   tcp   2049  nfs
>
>   
Hi Ross:

No, although that would be very nice, the Linux network maintainer 
didn't want RDMA transports sharing the network port space unfortunately.

You would need to do this on the server to see if it is listening:

# cat /proc/fs/nfsd/portlist

You should see something like this:

rdma 20049
tcp 2049
udp 2049

The top line indicates that the rdma transport is listening on port 20049.

If it's not showing, do this:

# echo 20049 > /proc/fs/nfsd/portlist

and repeat the 'cat' step above.

To give us a little more detail to help debug, do this:

# echo 32767 > /proc/sys/sunrpc/rpc_debug

on both the client and server, then try the mount again. The dmesg log 
should have a detail trace on what is happening.

Turn off the debug output as follows:

# echo 0 > /proc/sys/sunrpc/rpc_debug

Tom

>
> On Tue, Jan 26, 2010 at 12:24 PM, Ross Smith <myxiplx at googlemail.com> wrote:
>   
>> Hey everyone,
>>
>> It's taken me a week, but I've finally gotten the 2.7.00 firmware for
>> this system.  I've also taken the step of installing a Ubuntu 9.10
>> server for testing in addition to the Solaris server I already have.
>>
>> So far I'm still having no joy, nfs mounts fine over TCP, but if I try
>> to use RDMA it fails.
>>
>> Machines in use:
>> ============
>> Solaris Server, build 129 (about 4 weeks old), using built in Infiniband drivers
>> Solaris Client, same build
>> Ubuntu 9.10 Server, using kernel drivers
>> Ubuntu 9.10 Client
>> CentOS 5.2 Client, with OFED 1.4.2 and nfs-utils 1.1.6
>>
>> All five machines are on identical hardware, with Mellanox ConnectX
>> infiniband cards running firmware 2.7.00.
>>
>> They all seem to be running Infiniband fine, ipoib works perfectly and
>> I can connect regular tcp nfs mounts over the infiniband links without
>> any issues.
>>
>> With regular tcp nfs I'm getting consistent speeds of 300MB/s.
>>
>> However, nfs-rdma just does not want to work, no matter which
>> combination of servers and clients I try:
>>
>> Ubuntu Client -> Solaris
>> =================
>> Commands used:
>> # modprobe xprtrdma
>> # mount -o proto=rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>
>> This is the entire dmesg log, from first loading the driver, to
>> attempting to connect nfs-rdma:
>>
>> [   46.834146] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
>> [   47.028093] ADDRCONF(NETDEV_UP): ib0: link is not ready
>> [   52.018562] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>> [   52.018698] ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>> [   54.014289] ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>> [   58.006864] ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>> [   62.027202] ib0: no IPv6 routers present
>> [   65.120791] RPC: Registered udp transport module.
>> [   65.120795] RPC: Registered tcp transport module.
>> [   65.129162] RPC: Registered rdma transport module.
>> [   65.992081] ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>> [   81.962465] ib0: multicast join failed for
>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>> [   83.593144] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>> memreg 5 slots 32 ird 4
>> [  148.476967] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>> [  148.480488] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>> [  148.484421] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>> [  148.488376] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>> [ 4311.663188] svc: failed to register lockdv1 RPC service (errno 97).
>>
>> At this point, the attempt crashed the Solaris server, and hung the
>> mount attempt on the Ubuntu client, requiring ctrl-c on the client,
>> and automatically rebooting the server.
>>
>> I then tried again, connecting to the Ubuntu nfs server.  This time
>> neither device hung or crashed, but I had very similar messages in the
>> client log:
>>
>> # mount -o proto=rdma,port=20049 192.168.101.5:/home/ross/nfsexport ./nfstest
>>
>> [ 4435.102852] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> [ 4435.107492] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> [ 4435.111471] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> [ 4435.115468] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>
>> So it seems that it's not the server:  both Solaris and Ubuntu have
>> the same problem, although Ubuntu at least does not crash when clients
>> attempt to connect.
>>
>> I also get the same error if I attempt to connect from the CentOS 5.2
>> machine which is using regular OFED to the Ubuntu server:
>>
>> CentOS 5.2 -> Ubuntu
>> ================
>> This time I'm running mount.rnfs directly as per the instructions in
>> the OFED nfs-rdma release notes.
>>
>> commands used:
>> # modprobe xprtrdma
>> # mount.rnfs 192.168.101.5:/home/ross/nfsexport ./rdmatest -i -o
>> proto=rdma,port=20049
>>
>> dmesg results look very similar:
>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>
>> However attempting this has a bad effect on CentOS - the client
>> crashes and I loose my ssh session.
>>
>> Does anybody have any ideas?
>>
>> thanks,
>>
>> Ross
>>
>>
>>
>> On Mon, Jan 18, 2010 at 6:31 PM, David Brean <David.Brean at sun.com> wrote:
>>     
>>> Hello,
>>>
>>> I agree, update the HCA firmware before proceeding.  [The description in
>>> Bugzilla Bug 1711 seems to match the problem that you are observing.]
>>>
>>> Also, if you want to help diagnose the "ib0: post_send failed", take a look
>>> at http://lists.openfabrics.org/pipermail/general/2009-July/061118.html.
>>>
>>> -David
>>>
>>> Ross Smith wrote:
>>>
>>> Hi Tom,
>>>
>>> No, you're right - I'm just using the support that's built into the
>>> kernel, and I agree, diagnostics from Solaris is proving very tricky.
>>> I do have a Solaris client connected to this and showing some decent
>>> speeds (over 900Mb/s), but I've been thinking that I might need to get
>>> a Linux server running for testing before I spend much more time
>>> trying to get the two separate systems working.
>>>
>>> However, I have found over the weekend that I'm running older firmware
>>> and need that updating.  I'd missed that in the nfs-rdma readme so I'm
>>> pretty sure that's going to be causing problems.  I'm trying to get
>>> that resolved before I do too much other testing.
>>>
>>> Regular NFS running over the ipoib link seems fine, and I don't get
>>> any extra warnings using that.  I can also run a full virtual machine
>>> quite happily over NFS, so despite the warnings, the link does appear
>>> stable and reliable.
>>>
>>> Ross
>>>
>>>
>>>
>>> On Mon, Jan 18, 2010 at 4:30 PM, Tom Tucker <tom at opengridcomputing.com>
>>> wrote:
>>>
>>>
>>> Hi Ross:
>>>
>>> I would check that you have IB RDMA actually working. The core transport
>>> issues suggest that there may be network problems that will prevent NFSRDMA
>>> from working properly.
>>>
>>> The first question is whether or not you are actually using OFED. You're not
>>>  -- right? You're just using the support built into the 2.6.31 kernel?
>>>
>>> Second I don't think the mount is actually completing. I think the command
>>> is returning, but the mount never actually finishes. It's sitting there hung
>>> trying to perform the first RPC to the server (RPC_NOP) and it's never
>>> succeeding. That's why you see all those connect/disconnect messages in your
>>> log file. It tries to send, gets an error, disconnects, reconnects, tries to
>>> send .... you get the picture.
>>>
>>> Step 1 I think would be to ensure that you actually have IB up and running.
>>> IPoIB between the two seems a little dodgy given the dmesg log. Do you have
>>> another Linux box you can use to test out connectivity/configuration with
>>> your victim? There are test programs in OFED (rping) that would help you do
>>> this, but I don't believe they are available on Solaris.
>>>
>>> Tom
>>>
>>> Steve Wise wrote:
>>>
>>>
>>> nfsrdma hang on ewg...
>>>
>>>
>>>
>>> -------- Original Message --------
>>> Subject:     [ewg] nfs-rdma hanging with Ubuntu 9.10
>>> Date:     Fri, 15 Jan 2010 13:28:31 +0000
>>> From:     Ross Smith <myxiplx at googlemail.com>
>>> To:     ewg at openfabrics.org
>>>
>>>
>>>
>>> Hi folks, it's me again I'm afraid.
>>>
>>> Thanks to the help from this list, I have ipoib working, however I
>>> seem to be having a few problems, not least of which is commands
>>> hanging if I attempt to use nfs-rdma.
>>>
>>> Although the rmda mount command completes, the system then becomes
>>> unresponsive if I attempt any command such as 'ls', even outside of
>>> the mounted folder.  Umount also fails with the error "device is
>>> busy".
>>>
>>> If anybody can spare the time to help it would be very much
>>> appreciated.  I do seem to have a lot of warnings in the logs, but
>>> although I've tried searching for solutions haven't found anything
>>> yet.
>>>
>>>
>>> System details
>>> ============
>>> - Ubuntu 9.10
>>>  (kernel 2.6.31)
>>> - Mellanox ConnectX QDR card
>>> - Flextronics DDR switch
>>> - OpenSolaris NFS server, running one of the latest builds for
>>> troubleshooting
>>> - OpenSM running on another Ubuntu 9.10 box with a Mellanox
>>> Infinihost III Lx card
>>>
>>> I am using the kernel drivers only, I have not installed OFED on this
>>> machine.
>>>
>>>
>>> Loading driver
>>> ============
>>> The driver appears to load, and ipoib works, but there are rather a
>>> lot of warnings from dmesg.
>>>
>>> I am loading the driver with:
>>> $ sudo modprobe mlx4_ib
>>> $ sudo modprobe ib_ipoib
>>> $ sudo ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>>>
>>> And that leaves me with:
>>> $ lsmod
>>> Module                  Size  Used by
>>> ib_ipoib               72452  0
>>> ib_cm                  37196  1 ib_ipoib
>>> ib_sa                  19812  2 ib_ipoib,ib_cm
>>> mlx4_ib                42720  0
>>> ib_mad                 37524  3 ib_cm,ib_sa,mlx4_ib
>>> ib_core                57884  5 ib_ipoib,ib_cm,ib_sa,mlx4_ib,ib_mad
>>> binfmt_misc             8356  1
>>> ppdev                   6688  0
>>> psmouse                56180  0
>>> serio_raw               5280  0
>>> mlx4_core              84728  1 mlx4_ib
>>> joydev                 10272  0
>>> lp                      8964  0
>>> parport                35340  2 ppdev,lp
>>> iptable_filter          3100  0
>>> ip_tables              11692  1 iptable_filter
>>> x_tables               16544  1 ip_tables
>>> usbhid                 38208  0
>>> e1000e                122124  0
>>>
>>>
>>> At this point I can ping the Solaris server over the IP link.
>>> Although I do need to issue a ping from Solaris before I get a reply.
>>> I'm mentioning that it in case it's relevant, but at this point I'm
>>> assuming that's just a firewall setting on the server.
>>>
>>> But although ping works, I am starting to get some dmesg warnings, I
>>> just don't know if they are relevant:
>>> [  313.692072] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4,
>>> 2008)
>>> [  313.885220] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>> [  316.880450] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  316.880573] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>> [  316.880789] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  320.873613] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  327.147114] ib0: no IPv6 routers present
>>> [  328.861550] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  344.834440] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  360.808312] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>> [  376.782186] ib0: multicast join failed for
>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>
>>> And at this point however, regular nfs mounts work fine over the ipoib
>>> link:
>>> $ sudo mount 192.168.100.1:/test/rdma ./nfstest
>>>
>>> Bug again, that again adds warnings to dmesg:
>>> [  826.456902] RPC: Registered udp transport module.
>>> [  826.456905] RPC: Registered tcp transport module.
>>> [  841.553135] svc: failed to register lockdv1 RPC service (errno 97).
>>>
>>> And the speed is definitely nothing to write home about, copying a
>>> 100mb file takes over 10 seconds:
>>> $ time cp ./100mb ./100mb2
>>>
>>> real    0m10.472s
>>> user    0m0.000s
>>> sys    0m1.248s
>>>
>>> And again with warnings appearing in dmesg:
>>> [  872.373364] ib0: post_send failed
>>> [  872.373407] ib0: post_send failed
>>> [  872.373448] ib0: post_send failed
>>>
>>> I think this is a client issue rather than a problem on the server as
>>> the same test on an OpenSolaris client takes under half a second:
>>> # time cp ./100mb ./100mb2
>>>
>>> real    0m0.334s
>>> user    0m0.001s
>>> sys     0m0.176s
>>>
>>> Although the system is definitely not right, my long term aim is to
>>> run nfs-rdma on this system, so my next test was to try that and see
>>> if the speed improved:
>>>
>>> $ sudo umount ./nfstest
>>> $ sudo mount -o rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>
>>> That takes a long time to connect.  It does eventually go through, but
>>> only after the following errors in dmesg:
>>>
>>> [ 1140.698659] RPC: Registered rdma transport module.
>>> [ 1155.697672] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1160.688455] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1160.693818] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1160.695131] svc: failed to register lockdv1 RPC service (errno 97).
>>> [ 1170.676049] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1170.681458] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1190.647355] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1190.652778] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1220.602353] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1220.607809] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1250.557397] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1250.562817] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1281.522735] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1281.528442] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1311.477845] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1311.482983] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>> [ 1341.432758] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>> [ 1341.438212] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>> memreg 5 slots 32 ird 4
>>>
>>> However, at this point my shell session becomes unresponsive if I
>>> attempt so much as a 'ls'.  The system hasn't hung completely however
>>> as I can still connect another ssh session and restart with
>>> $ sudo init 6
>>>
>>> Can anybody help?  Is there anything obvious I am doing wrong here?
>>>
>>> thanks,
>>>
>>> Ross
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>
>>>       
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>