[ewg] [Fwd: nfs-rdma hanging with Ubuntu 9.10]

Ross Smith myxiplx at googlemail.com
Tue Jan 26 04:24:43 PST 2010


Hey everyone,

It's taken me a week, but I've finally gotten the 2.7.00 firmware for
this system.  I've also taken the step of installing a Ubuntu 9.10
server for testing in addition to the Solaris server I already have.

So far I'm still having no joy, nfs mounts fine over TCP, but if I try
to use RDMA it fails.

Machines in use:
============
Solaris Server, build 129 (about 4 weeks old), using built in Infiniband drivers
Solaris Client, same build
Ubuntu 9.10 Server, using kernel drivers
Ubuntu 9.10 Client
CentOS 5.2 Client, with OFED 1.4.2 and nfs-utils 1.1.6

All five machines are on identical hardware, with Mellanox ConnectX
infiniband cards running firmware 2.7.00.

They all seem to be running Infiniband fine, ipoib works perfectly and
I can connect regular tcp nfs mounts over the infiniband links without
any issues.

With regular tcp nfs I'm getting consistent speeds of 300MB/s.

However, nfs-rdma just does not want to work, no matter which
combination of servers and clients I try:

Ubuntu Client -> Solaris
=================
Commands used:
# modprobe xprtrdma
# mount -o proto=rdma,port=20049 192.168.101.1:/test/rdma ./nfstest

This is the entire dmesg log, from first loading the driver, to
attempting to connect nfs-rdma:

[   46.834146] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
[   47.028093] ADDRCONF(NETDEV_UP): ib0: link is not ready
[   52.018562] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
[   52.018698] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
[   54.014289] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
[   58.006864] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
[   62.027202] ib0: no IPv6 routers present
[   65.120791] RPC: Registered udp transport module.
[   65.120795] RPC: Registered tcp transport module.
[   65.129162] RPC: Registered rdma transport module.
[   65.992081] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
[   81.962465] ib0: multicast join failed for
ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
[   83.593144] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
memreg 5 slots 32 ird 4
[  148.476967] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
[  148.480488] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
[  148.484421] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
[  148.488376] rpcrdma: connection to 192.168.101.1:20049 closed (-111)
[ 4311.663188] svc: failed to register lockdv1 RPC service (errno 97).

At this point, the attempt crashed the Solaris server, and hung the
mount attempt on the Ubuntu client, requiring ctrl-c on the client,
and automatically rebooting the server.

I then tried again, connecting to the Ubuntu nfs server.  This time
neither device hung or crashed, but I had very similar messages in the
client log:

# mount -o proto=rdma,port=20049 192.168.101.5:/home/ross/nfsexport ./nfstest

[ 4435.102852] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
[ 4435.107492] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
[ 4435.111471] rpcrdma: connection to 192.168.101.5:20049 closed (-111)
[ 4435.115468] rpcrdma: connection to 192.168.101.5:20049 closed (-111)

So it seems that it's not the server:  both Solaris and Ubuntu have
the same problem, although Ubuntu at least does not crash when clients
attempt to connect.

I also get the same error if I attempt to connect from the CentOS 5.2
machine which is using regular OFED to the Ubuntu server:

CentOS 5.2 -> Ubuntu
================
This time I'm running mount.rnfs directly as per the instructions in
the OFED nfs-rdma release notes.

commands used:
# modprobe xprtrdma
# mount.rnfs 192.168.101.5:/home/ross/nfsexport ./rdmatest -i -o
proto=rdma,port=20049

dmesg results look very similar:
rpcrdma: connection to 192.168.101.5:20049 closed (-111)
rpcrdma: connection to 192.168.101.5:20049 closed (-111)
rpcrdma: connection to 192.168.101.5:20049 closed (-111)
rpcrdma: connection to 192.168.101.5:20049 closed (-111)

However attempting this has a bad effect on CentOS - the client
crashes and I loose my ssh session.

Does anybody have any ideas?

thanks,

Ross



On Mon, Jan 18, 2010 at 6:31 PM, David Brean <David.Brean at sun.com> wrote:
> Hello,
>
> I agree, update the HCA firmware before proceeding.  [The description in
> Bugzilla Bug 1711 seems to match the problem that you are observing.]
>
> Also, if you want to help diagnose the "ib0: post_send failed", take a look
> at http://lists.openfabrics.org/pipermail/general/2009-July/061118.html.
>
> -David
>
> Ross Smith wrote:
>
> Hi Tom,
>
> No, you're right - I'm just using the support that's built into the
> kernel, and I agree, diagnostics from Solaris is proving very tricky.
> I do have a Solaris client connected to this and showing some decent
> speeds (over 900Mb/s), but I've been thinking that I might need to get
> a Linux server running for testing before I spend much more time
> trying to get the two separate systems working.
>
> However, I have found over the weekend that I'm running older firmware
> and need that updating.  I'd missed that in the nfs-rdma readme so I'm
> pretty sure that's going to be causing problems.  I'm trying to get
> that resolved before I do too much other testing.
>
> Regular NFS running over the ipoib link seems fine, and I don't get
> any extra warnings using that.  I can also run a full virtual machine
> quite happily over NFS, so despite the warnings, the link does appear
> stable and reliable.
>
> Ross
>
>
>
> On Mon, Jan 18, 2010 at 4:30 PM, Tom Tucker <tom at opengridcomputing.com>
> wrote:
>
>
> Hi Ross:
>
> I would check that you have IB RDMA actually working. The core transport
> issues suggest that there may be network problems that will prevent NFSRDMA
> from working properly.
>
> The first question is whether or not you are actually using OFED. You're not
>  -- right? You're just using the support built into the 2.6.31 kernel?
>
> Second I don't think the mount is actually completing. I think the command
> is returning, but the mount never actually finishes. It's sitting there hung
> trying to perform the first RPC to the server (RPC_NOP) and it's never
> succeeding. That's why you see all those connect/disconnect messages in your
> log file. It tries to send, gets an error, disconnects, reconnects, tries to
> send .... you get the picture.
>
> Step 1 I think would be to ensure that you actually have IB up and running.
> IPoIB between the two seems a little dodgy given the dmesg log. Do you have
> another Linux box you can use to test out connectivity/configuration with
> your victim? There are test programs in OFED (rping) that would help you do
> this, but I don't believe they are available on Solaris.
>
> Tom
>
> Steve Wise wrote:
>
>
> nfsrdma hang on ewg...
>
>
>
> -------- Original Message --------
> Subject:     [ewg] nfs-rdma hanging with Ubuntu 9.10
> Date:     Fri, 15 Jan 2010 13:28:31 +0000
> From:     Ross Smith <myxiplx at googlemail.com>
> To:     ewg at openfabrics.org
>
>
>
> Hi folks, it's me again I'm afraid.
>
> Thanks to the help from this list, I have ipoib working, however I
> seem to be having a few problems, not least of which is commands
> hanging if I attempt to use nfs-rdma.
>
> Although the rmda mount command completes, the system then becomes
> unresponsive if I attempt any command such as 'ls', even outside of
> the mounted folder.  Umount also fails with the error "device is
> busy".
>
> If anybody can spare the time to help it would be very much
> appreciated.  I do seem to have a lot of warnings in the logs, but
> although I've tried searching for solutions haven't found anything
> yet.
>
>
> System details
> ============
> - Ubuntu 9.10
>  (kernel 2.6.31)
> - Mellanox ConnectX QDR card
> - Flextronics DDR switch
> - OpenSolaris NFS server, running one of the latest builds for
> troubleshooting
> - OpenSM running on another Ubuntu 9.10 box with a Mellanox
> Infinihost III Lx card
>
> I am using the kernel drivers only, I have not installed OFED on this
> machine.
>
>
> Loading driver
> ============
> The driver appears to load, and ipoib works, but there are rather a
> lot of warnings from dmesg.
>
> I am loading the driver with:
> $ sudo modprobe mlx4_ib
> $ sudo modprobe ib_ipoib
> $ sudo ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>
> And that leaves me with:
> $ lsmod
> Module                  Size  Used by
> ib_ipoib               72452  0
> ib_cm                  37196  1 ib_ipoib
> ib_sa                  19812  2 ib_ipoib,ib_cm
> mlx4_ib                42720  0
> ib_mad                 37524  3 ib_cm,ib_sa,mlx4_ib
> ib_core                57884  5 ib_ipoib,ib_cm,ib_sa,mlx4_ib,ib_mad
> binfmt_misc             8356  1
> ppdev                   6688  0
> psmouse                56180  0
> serio_raw               5280  0
> mlx4_core              84728  1 mlx4_ib
> joydev                 10272  0
> lp                      8964  0
> parport                35340  2 ppdev,lp
> iptable_filter          3100  0
> ip_tables              11692  1 iptable_filter
> x_tables               16544  1 ip_tables
> usbhid                 38208  0
> e1000e                122124  0
>
>
> At this point I can ping the Solaris server over the IP link.
> Although I do need to issue a ping from Solaris before I get a reply.
> I'm mentioning that it in case it's relevant, but at this point I'm
> assuming that's just a firewall setting on the server.
>
> But although ping works, I am starting to get some dmesg warnings, I
> just don't know if they are relevant:
> [  313.692072] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4,
> 2008)
> [  313.885220] ADDRCONF(NETDEV_UP): ib0: link is not ready
> [  316.880450] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  316.880573] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
> [  316.880789] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  320.873613] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  327.147114] ib0: no IPv6 routers present
> [  328.861550] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  344.834440] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  360.808312] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
> [  376.782186] ib0: multicast join failed for
> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>
> And at this point however, regular nfs mounts work fine over the ipoib
> link:
> $ sudo mount 192.168.100.1:/test/rdma ./nfstest
>
> Bug again, that again adds warnings to dmesg:
> [  826.456902] RPC: Registered udp transport module.
> [  826.456905] RPC: Registered tcp transport module.
> [  841.553135] svc: failed to register lockdv1 RPC service (errno 97).
>
> And the speed is definitely nothing to write home about, copying a
> 100mb file takes over 10 seconds:
> $ time cp ./100mb ./100mb2
>
> real    0m10.472s
> user    0m0.000s
> sys    0m1.248s
>
> And again with warnings appearing in dmesg:
> [  872.373364] ib0: post_send failed
> [  872.373407] ib0: post_send failed
> [  872.373448] ib0: post_send failed
>
> I think this is a client issue rather than a problem on the server as
> the same test on an OpenSolaris client takes under half a second:
> # time cp ./100mb ./100mb2
>
> real    0m0.334s
> user    0m0.001s
> sys     0m0.176s
>
> Although the system is definitely not right, my long term aim is to
> run nfs-rdma on this system, so my next test was to try that and see
> if the speed improved:
>
> $ sudo umount ./nfstest
> $ sudo mount -o rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>
> That takes a long time to connect.  It does eventually go through, but
> only after the following errors in dmesg:
>
> [ 1140.698659] RPC: Registered rdma transport module.
> [ 1155.697672] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1160.688455] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1160.693818] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1160.695131] svc: failed to register lockdv1 RPC service (errno 97).
> [ 1170.676049] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1170.681458] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1190.647355] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1190.652778] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1220.602353] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1220.607809] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1250.557397] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1250.562817] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1281.522735] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1281.528442] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1311.477845] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1311.482983] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
> [ 1341.432758] rpcrdma: connection to 192.168.101.1:20049 closed (-103)
> [ 1341.438212] rpcrdma: connection to 192.168.101.1:20049 on mlx4_0,
> memreg 5 slots 32 ird 4
>
> However, at this point my shell session becomes unresponsive if I
> attempt so much as a 'ls'.  The system hasn't hung completely however
> as I can still connect another ssh session and restart with
> $ sudo init 6
>
> Can anybody help?  Is there anything obvious I am doing wrong here?
>
> thanks,
>
> Ross
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
>
>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>



More information about the ewg mailing list