[ewg] [Fwd: nfs-rdma hanging with Ubuntu 9.10]

Tue Jan 26 07:49:21 PST 2010

Ross Smith wrote:
> Hmm, the portlist doesn't look good:
>
> $ cat /proc/fs/nfsd/portlist
> tcp 2049
> udp 2049
>
>   

No it looks great, that's an easy one! No one is listening on 20049, so 
you get 111 (ECONNREFUSED)

> But attempting to modify that fails:
>
> # echo 20049 > /proc/fs/nfsd/portlist
> -bash: echo: write error: Bad file descriptor
>
>   
That's because I gave you the wrong syntax for the write command. It 
should be the following:

# echo "rdma 20049" > /proc/fs/nfsd/portlist

Sorry about that.

Tom

> And I get similar problems attempting to enable the debugging logs:
>
> # echo 32767 > /proc/sys/sunrpc/rpc_debug
> -bash: /proc/sys/sunrpc/rpc_debug: Permission denied
>
> Up to that point through everything looks like it's loading fine:
>
> Ubuntu server:
> ===========
> # modprobe mlx4_ib
> # modprobe ib_ipoib
> # ifconfig ib0 192.168.101.5 netmask 255.255.255.0 up
>
> dmesg results:
> [  456.793661] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
> [  456.987043] ADDRCONF(NETDEV_UP): ib0: link is not ready
> [  459.988683] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> [  470.686631] ib0: no IPv6 routers present
>
> # modprobe svcrdma
> # /etc/init.d/nfs-kernel-server restart
>
> dmesg:
> [  524.520198] nfsd: last server has exited, flushing export cache
> [  529.292366] svc: failed to register lockdv1 RPC service (errno 97).
> [  529.293289] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
> recovery directory
> [  529.293304] NFSD: starting 90-second grace period
>
> Ubuntu client:
> ==========
> # modprobe mlx4_ib
> # modprobe ib_ipoib
> # ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>
> dmesg:
> [   97.576507] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
> [   97.769582] ADDRCONF(NETDEV_UP): ib0: link is not ready
> [  100.765318] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> [  110.899591] ib0: no IPv6 routers present
>
> # modprobe xprtrdma
>
> dmesg:
> [  169.269689] RPC: Registered udp transport module.
> [  169.269691] RPC: Registered tcp transport module.
> [  169.289755] RPC: Registered rdma transport module.
>
> Ross
>
>
> On Tue, Jan 26, 2010 at 2:32 PM, Tom Tucker <tom at opengridcomputing.com> wrote:
>   
>> Ross Smith wrote:
>>     
>>> A quick addendum to that, I've just had a look at rpcinfo on both the
>>> Ubuntu and Solaris NFS servers, does this indicate that nfs-rdma is
>>> not actually running?
>>>
>>> rpcinfo -p
>>>   program vers proto   port
>>>    100000    2   tcp    111  portmapper
>>>    100000    2   udp    111  portmapper
>>>    100024    1   udp  37031  status
>>>    100024    1   tcp  58463  status
>>>    100021    1   udp  34989  nlockmgr
>>>    100021    3   udp  34989  nlockmgr
>>>    100021    4   udp  34989  nlockmgr
>>>    100021    1   tcp  47979  nlockmgr
>>>    100021    3   tcp  47979  nlockmgr
>>>    100021    4   tcp  47979  nlockmgr
>>>    100003    2   udp   2049  nfs
>>>    100003    3   udp   2049  nfs
>>>    100003    4   udp   2049  nfs
>>>    100003    2   tcp   2049  nfs
>>>    100003    3   tcp   2049  nfs
>>>    100003    4   tcp   2049  nfs
>>>
>>>
>>>       
>> Hi Ross:
>>
>> No, although that would be very nice, the Linux network maintainer didn't
>> want RDMA transports sharing the network port space unfortunately.
>>
>> You would need to do this on the server to see if it is listening:
>>
>> # cat /proc/fs/nfsd/portlist
>>
>> You should see something like this:
>>
>> rdma 20049
>> tcp 2049
>> udp 2049
>>
>> The top line indicates that the rdma transport is listening on port 20049.
>>
>> If it's not showing, do this:
>>
>> # echo 20049 > /proc/fs/nfsd/portlist
>>
>> and repeat the 'cat' step above.
>>
>> To give us a little more detail to help debug, do this:
>>
>> # echo 32767 > /proc/sys/sunrpc/rpc_debug
>>
>> on both the client and server, then try the mount again. The dmesg log
>> should have a detail trace on what is happening.
>>
>> Turn off the debug output as follows:
>>
>> # echo 0 > /proc/sys/sunrpc/rpc_debug
>>
>> Tom
>>
>>     
>>> On Tue, Jan 26, 2010 at 12:24 PM, Ross Smith <myxiplx at googlemail.com>
>>> wrote:
>>>
>>>       
>>>> Hey everyone,
>>>>
>>>> It's taken me a week, but I've finally gotten the 2.7.00 firmware for
>>>> this system.  I've also taken the step of installing a Ubuntu 9.10
>>>> server for testing in addition to the Solaris server I already have.
>>>>
>>>> So far I'm still having no joy, nfs mounts fine over TCP, but if I try
>>>> to use RDMA it fails.
>>>>
>>>> Machines in use:
>>>> ============
>>>> Solaris Server, build 129 (about 4 weeks old), using built in Infiniband
>>>> drivers
>>>> Solaris Client, same build
>>>> Ubuntu 9.10 Server, using kernel drivers
>>>> Ubuntu 9.10 Client
>>>> CentOS 5.2 Client, with OFED 1.4.2 and nfs-utils 1.1.6
>>>>
>>>> All five machines are on identical hardware, with Mellanox ConnectX
>>>> infiniband cards running firmware 2.7.00.
>>>>
>>>> They all seem to be running Infiniband fine, ipoib works perfectly and
>>>> I can connect regular tcp nfs mounts over the infiniband links without
>>>> any issues.
>>>>
>>>> With regular tcp nfs I'm getting consistent speeds of 300MB/s.
>>>>
>>>> However, nfs-rdma just does not want to work, no matter which
>>>> combination of servers and clients I try:
>>>>
>>>> Ubuntu Client -> Solaris
>>>> =================
>>>> Commands used:
>>>> # modprobe xprtrdma
>>>> # mount -o proto=rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>>
>>>> This is the entire dmesg log, from first loading the driver, to
>>>> attempting to connect nfs-rdma:
>>>>
>>>> [   46.834146] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
>>>> 4, 2008)
>>>> [   47.028093] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>> [   52.018562] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>> [   52.018698] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [   54.014289] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [   58.006864] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [   62.027202] ib0: no IPv6 routers present
>>>> [   65.120791] RPC: Registered udp transport module.
>>>> [   65.120795] RPC: Registered tcp transport module.
>>>> [   65.129162] RPC: Registered rdma transport module.
>>>> [   65.992081] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [   81.962465] ib0: multicast join failed for
>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>> [   83.593144] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>> memreg 5 slots 32 ird 4
>>>> [  148.476967] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>>> [  148.480488] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>>> [  148.484421] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>>> [  148.488376] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
>>>> [ 4311.663188] svc: failed to register lockdv1 RPC service (errno 97).
>>>>
>>>> At this point, the attempt crashed the Solaris server, and hung the
>>>> mount attempt on the Ubuntu client, requiring ctrl-c on the client,
>>>> and automatically rebooting the server.
>>>>
>>>> I then tried again, connecting to the Ubuntu nfs server.  This time
>>>> neither device hung or crashed, but I had very similar messages in the
>>>> client log:
>>>>
>>>> # mount -o proto=rdma,port=20049 192.168.101.5:/home/ross/nfsexport
>>>> ./nfstest
>>>>
>>>> [ 4435.102852] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> [ 4435.107492] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> [ 4435.111471] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> [ 4435.115468] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>
>>>> So it seems that it's not the server:  both Solaris and Ubuntu have
>>>> the same problem, although Ubuntu at least does not crash when clients
>>>> attempt to connect.
>>>>
>>>> I also get the same error if I attempt to connect from the CentOS 5.2
>>>> machine which is using regular OFED to the Ubuntu server:
>>>>
>>>> CentOS 5.2 -> Ubuntu
>>>> ================
>>>> This time I'm running mount.rnfs directly as per the instructions in
>>>> the OFED nfs-rdma release notes.
>>>>
>>>> commands used:
>>>> # modprobe xprtrdma
>>>> # mount.rnfs 192.168.101.5:/home/ross/nfsexport ./rdmatest -i -o
>>>> proto=rdma,port=20049
>>>>
>>>> dmesg results look very similar:
>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>
>>>> However attempting this has a bad effect on CentOS - the client
>>>> crashes and I loose my ssh session.
>>>>
>>>> Does anybody have any ideas?
>>>>
>>>> thanks,
>>>>
>>>> Ross
>>>>
>>>>
>>>>
>>>> On Mon, Jan 18, 2010 at 6:31 PM, David Brean <David.Brean at sun.com> wrote:
>>>>
>>>>         
>>>>> Hello,
>>>>>
>>>>> I agree, update the HCA firmware before proceeding.  [The description in
>>>>> Bugzilla Bug 1711 seems to match the problem that you are observing.]
>>>>>
>>>>> Also, if you want to help diagnose the "ib0: post_send failed", take a
>>>>> look
>>>>> at http://lists.openfabrics.org/pipermail/general/2009-July/061118.html.
>>>>>
>>>>> -David
>>>>>
>>>>> Ross Smith wrote:
>>>>>
>>>>> Hi Tom,
>>>>>
>>>>> No, you're right - I'm just using the support that's built into the
>>>>> kernel, and I agree, diagnostics from Solaris is proving very tricky.
>>>>> I do have a Solaris client connected to this and showing some decent
>>>>> speeds (over 900Mb/s), but I've been thinking that I might need to get
>>>>> a Linux server running for testing before I spend much more time
>>>>> trying to get the two separate systems working.
>>>>>
>>>>> However, I have found over the weekend that I'm running older firmware
>>>>> and need that updating.  I'd missed that in the nfs-rdma readme so I'm
>>>>> pretty sure that's going to be causing problems.  I'm trying to get
>>>>> that resolved before I do too much other testing.
>>>>>
>>>>> Regular NFS running over the ipoib link seems fine, and I don't get
>>>>> any extra warnings using that.  I can also run a full virtual machine
>>>>> quite happily over NFS, so despite the warnings, the link does appear
>>>>> stable and reliable.
>>>>>
>>>>> Ross
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 18, 2010 at 4:30 PM, Tom Tucker <tom at opengridcomputing.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>> Hi Ross:
>>>>>
>>>>> I would check that you have IB RDMA actually working. The core transport
>>>>> issues suggest that there may be network problems that will prevent
>>>>> NFSRDMA
>>>>> from working properly.
>>>>>
>>>>> The first question is whether or not you are actually using OFED. You're
>>>>> not
>>>>>  -- right? You're just using the support built into the 2.6.31 kernel?
>>>>>
>>>>> Second I don't think the mount is actually completing. I think the
>>>>> command
>>>>> is returning, but the mount never actually finishes. It's sitting there
>>>>> hung
>>>>> trying to perform the first RPC to the server (RPC_NOP) and it's never
>>>>> succeeding. That's why you see all those connect/disconnect messages in
>>>>> your
>>>>> log file. It tries to send, gets an error, disconnects, reconnects,
>>>>> tries to
>>>>> send .... you get the picture.
>>>>>
>>>>> Step 1 I think would be to ensure that you actually have IB up and
>>>>> running.
>>>>> IPoIB between the two seems a little dodgy given the dmesg log. Do you
>>>>> have
>>>>> another Linux box you can use to test out connectivity/configuration
>>>>> with
>>>>> your victim? There are test programs in OFED (rping) that would help you
>>>>> do
>>>>> this, but I don't believe they are available on Solaris.
>>>>>
>>>>> Tom
>>>>>
>>>>> Steve Wise wrote:
>>>>>
>>>>>
>>>>> nfsrdma hang on ewg...
>>>>>
>>>>>
>>>>>
>>>>> -------- Original Message --------
>>>>> Subject:     [ewg] nfs-rdma hanging with Ubuntu 9.10
>>>>> Date:     Fri, 15 Jan 2010 13:28:31 +0000
>>>>> From:     Ross Smith <myxiplx at googlemail.com>
>>>>> To:     ewg at openfabrics.org
>>>>>
>>>>>
>>>>>
>>>>> Hi folks, it's me again I'm afraid.
>>>>>
>>>>> Thanks to the help from this list, I have ipoib working, however I
>>>>> seem to be having a few problems, not least of which is commands
>>>>> hanging if I attempt to use nfs-rdma.
>>>>>
>>>>> Although the rmda mount command completes, the system then becomes
>>>>> unresponsive if I attempt any command such as 'ls', even outside of
>>>>> the mounted folder.  Umount also fails with the error "device is
>>>>> busy".
>>>>>
>>>>> If anybody can spare the time to help it would be very much
>>>>> appreciated.  I do seem to have a lot of warnings in the logs, but
>>>>> although I've tried searching for solutions haven't found anything
>>>>> yet.
>>>>>
>>>>>
>>>>> System details
>>>>> ============
>>>>> - Ubuntu 9.10
>>>>>  (kernel 2.6.31)
>>>>> - Mellanox ConnectX QDR card
>>>>> - Flextronics DDR switch
>>>>> - OpenSolaris NFS server, running one of the latest builds for
>>>>> troubleshooting
>>>>> - OpenSM running on another Ubuntu 9.10 box with a Mellanox
>>>>> Infinihost III Lx card
>>>>>
>>>>> I am using the kernel drivers only, I have not installed OFED on this
>>>>> machine.
>>>>>
>>>>>
>>>>> Loading driver
>>>>> ============
>>>>> The driver appears to load, and ipoib works, but there are rather a
>>>>> lot of warnings from dmesg.
>>>>>
>>>>> I am loading the driver with:
>>>>> $ sudo modprobe mlx4_ib
>>>>> $ sudo modprobe ib_ipoib
>>>>> $ sudo ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>>>>>
>>>>> And that leaves me with:
>>>>> $ lsmod
>>>>> Module                  Size  Used by
>>>>> ib_ipoib               72452  0
>>>>> ib_cm                  37196  1 ib_ipoib
>>>>> ib_sa                  19812  2 ib_ipoib,ib_cm
>>>>> mlx4_ib                42720  0
>>>>> ib_mad                 37524  3 ib_cm,ib_sa,mlx4_ib
>>>>> ib_core                57884  5 ib_ipoib,ib_cm,ib_sa,mlx4_ib,ib_mad
>>>>> binfmt_misc             8356  1
>>>>> ppdev                   6688  0
>>>>> psmouse                56180  0
>>>>> serio_raw               5280  0
>>>>> mlx4_core              84728  1 mlx4_ib
>>>>> joydev                 10272  0
>>>>> lp                      8964  0
>>>>> parport                35340  2 ppdev,lp
>>>>> iptable_filter          3100  0
>>>>> ip_tables              11692  1 iptable_filter
>>>>> x_tables               16544  1 ip_tables
>>>>> usbhid                 38208  0
>>>>> e1000e                122124  0
>>>>>
>>>>>
>>>>> At this point I can ping the Solaris server over the IP link.
>>>>> Although I do need to issue a ping from Solaris before I get a reply.
>>>>> I'm mentioning that it in case it's relevant, but at this point I'm
>>>>> assuming that's just a firewall setting on the server.
>>>>>
>>>>> But although ping works, I am starting to get some dmesg warnings, I
>>>>> just don't know if they are relevant:
>>>>> [  313.692072] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
>>>>> 4,
>>>>> 2008)
>>>>> [  313.885220] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>>> [  316.880450] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  316.880573] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>>> [  316.880789] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  320.873613] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  327.147114] ib0: no IPv6 routers present
>>>>> [  328.861550] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  344.834440] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  360.808312] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>> [  376.782186] ib0: multicast join failed for
>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>
>>>>> And at this point however, regular nfs mounts work fine over the ipoib
>>>>> link:
>>>>> $ sudo mount 192.168.100.1:/test/rdma ./nfstest
>>>>>
>>>>> Bug again, that again adds warnings to dmesg:
>>>>> [  826.456902] RPC: Registered udp transport module.
>>>>> [  826.456905] RPC: Registered tcp transport module.
>>>>> [  841.553135] svc: failed to register lockdv1 RPC service (errno 97).
>>>>>
>>>>> And the speed is definitely nothing to write home about, copying a
>>>>> 100mb file takes over 10 seconds:
>>>>> $ time cp ./100mb ./100mb2
>>>>>
>>>>> real    0m10.472s
>>>>> user    0m0.000s
>>>>> sys    0m1.248s
>>>>>
>>>>> And again with warnings appearing in dmesg:
>>>>> [  872.373364] ib0: post_send failed
>>>>> [  872.373407] ib0: post_send failed
>>>>> [  872.373448] ib0: post_send failed
>>>>>
>>>>> I think this is a client issue rather than a problem on the server as
>>>>> the same test on an OpenSolaris client takes under half a second:
>>>>> # time cp ./100mb ./100mb2
>>>>>
>>>>> real    0m0.334s
>>>>> user    0m0.001s
>>>>> sys     0m0.176s
>>>>>
>>>>> Although the system is definitely not right, my long term aim is to
>>>>> run nfs-rdma on this system, so my next test was to try that and see
>>>>> if the speed improved:
>>>>>
>>>>> $ sudo umount ./nfstest
>>>>> $ sudo mount -o rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>>>
>>>>> That takes a long time to connect.  It does eventually go through, but
>>>>> only after the following errors in dmesg:
>>>>>
>>>>> [ 1140.698659] RPC: Registered rdma transport module.
>>>>> [ 1155.697672] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1160.688455] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1160.693818] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1160.695131] svc: failed to register lockdv1 RPC service (errno 97).
>>>>> [ 1170.676049] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1170.681458] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1190.647355] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1190.652778] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1220.602353] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1220.607809] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1250.557397] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1250.562817] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1281.522735] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1281.528442] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1311.477845] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1311.482983] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>> [ 1341.432758] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
>>>>> [ 1341.438212] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
>>>>> memreg 5 slots 32 ird 4
>>>>>
>>>>> However, at this point my shell session becomes unresponsive if I
>>>>> attempt so much as a 'ls'.  The system hasn't hung completely however
>>>>> as I can still connect another ssh session and restart with
>>>>> $ sudo init 6
>>>>>
>>>>> Can anybody help?  Is there anything obvious I am doing wrong here?
>>>>>
>>>>> thanks,
>>>>>
>>>>> Ross
>>>>> _______________________________________________
>>>>> ewg mailing list
>>>>> ewg at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ewg mailing list
>>>>> ewg at lists.openfabrics.org
>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>
>>>>>
>>>>>           
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>
>>>       
>>