[ewg] [Fwd: nfs-rdma hanging with Ubuntu 9.10]

Tom Tucker tom at opengridcomputing.com
Tue Jan 26 08:35:30 PST 2010


Ross Smith wrote:
> No problem, reporting odd stuff is one of the few things I can contribute :)
>
> $ uname -a
> Linux ubuntu-server 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16
> 14:04:26 UTC 2009 i686 GNU/Linux
>
>   
Hi Ross:

I don't see this behavior and my review of the code implies that it 
should not work this way.
Otherwise, any error, etc... would end up shutting down all listening 
endpoints.

So could you try this again. Something like this:

# cat /proc/fs/nfsd/portlist
# echo "rdma 10049" > /proc/fs/nfsd/portlist
# cat /proc/fs/nfsd/portlist
# echo "rdma 20049" > /proc/fs/nfsd/portlist
# cat /proc/fs/nfsd/portlist

It should be additive.  To shut down a transport you have to prepend the 
transport name with
a '-' character. Like this:

# echo "-rdma 20049" > /proc/fs/nfsd/portlist

Thanks,
Tom
>
> On Tue, Jan 26, 2010 at 4:26 PM, Tom Tucker <tom at opengridcomputing.com> wrote:
>   
>> Ross Smith wrote:
>>     
>>> Interesting, the single '>' didn't work for me, it removed the tcp and
>>> udp entries, leaving me with just rdma.  It looks like you do need the
>>>
>>>       
>> Huh. Can you do a 'uname -a' for me. Someone has changed that. Thank you for
>> the heads up.
>>
>> Tom
>>
>>     
>>>>> on Ubuntu 9.10.
>>>>>
>>>>>           
>>> On Tue, Jan 26, 2010 at 4:20 PM, Tom Tucker <tom at opengridcomputing.com>
>>> wrote:
>>>
>>>       
>>>> Ross Smith wrote:
>>>>
>>>>         
>>>>> No problem, but I think you need an extra > too :)
>>>>>
>>>>> # echo "rdma 20049" >> /proc/fs/nfsd/portlist
>>>>>
>>>>>
>>>>>           
>>>> Actually, it's not a "real" file. So just the single '>' will work fine.
>>>> There is logic inside the kernel
>>>> that handles the write and converts the 'rdma 20049' to calls inside the
>>>> kernel that create a
>>>> listening endpoint for the rdma transport.
>>>>
>>>>         
>>>>> And that was enough to get me going, although there was one more step
>>>>> I'd missed:
>>>>>
>>>>> # mount 192.168.101.5:/home/ross/nfsexport ./nfstest -o
>>>>> proto=rdma,port=20049
>>>>> mount.nfs: Operation not permitted
>>>>>
>>>>> Googling that lead me to modify /etc/exports on the server to add the
>>>>> insecure option.  With that added it works fine.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> Awesome.
>>>>
>>>>
>>>>         
>>>>> Now I just need to get it connecting to Solaris without crashing the
>>>>> server :)
>>>>>
>>>>> Many thanks for all the help.
>>>>>
>>>>> Ross
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 26, 2010 at 3:49 PM, Tom Tucker <tom at opengridcomputing.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Ross Smith wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Hmm, the portlist doesn't look good:
>>>>>>>
>>>>>>> $ cat /proc/fs/nfsd/portlist
>>>>>>> tcp 2049
>>>>>>> udp 2049
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> No it looks great, that's an easy one! No one is listening on 20049, so
>>>>>> you
>>>>>> get 111 (ECONNREFUSED)
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> But attempting to modify that fails:
>>>>>>>
>>>>>>> # echo 20049 > /proc/fs/nfsd/portlist
>>>>>>> -bash: echo: write error: Bad file descriptor
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> That's because I gave you the wrong syntax for the write command. It
>>>>>> should
>>>>>> be the following:
>>>>>>
>>>>>> # echo "rdma 20049" > /proc/fs/nfsd/portlist
>>>>>>
>>>>>> Sorry about that.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> And I get similar problems attempting to enable the debugging logs:
>>>>>>>
>>>>>>> # echo 32767 > /proc/sys/sunrpc/rpc_debug
>>>>>>> -bash: /proc/sys/sunrpc/rpc_debug: Permission denied
>>>>>>>
>>>>>>> Up to that point through everything looks like it's loading fine:
>>>>>>>
>>>>>>> Ubuntu server:
>>>>>>> ===========
>>>>>>> # modprobe mlx4_ib
>>>>>>> # modprobe ib_ipoib
>>>>>>> # ifconfig ib0 192.168.101.5 netmask 255.255.255.0 up
>>>>>>>
>>>>>>> dmesg results:
>>>>>>> [  456.793661] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0
>>>>>>> (April
>>>>>>> 4,
>>>>>>> 2008)
>>>>>>> [  456.987043] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>>>>> [  459.988683] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>>>>> [  470.686631] ib0: no IPv6 routers present
>>>>>>>
>>>>>>> # modprobe svcrdma
>>>>>>> # /etc/init.d/nfs-kernel-server restart
>>>>>>>
>>>>>>> dmesg:
>>>>>>> [  524.520198] nfsd: last server has exited, flushing export cache
>>>>>>> [  529.292366] svc: failed to register lockdv1 RPC service (errno 97).
>>>>>>> [  529.293289] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state
>>>>>>> recovery directory
>>>>>>> [  529.293304] NFSD: starting 90-second grace period
>>>>>>>
>>>>>>> Ubuntu client:
>>>>>>> ==========
>>>>>>> # modprobe mlx4_ib
>>>>>>> # modprobe ib_ipoib
>>>>>>> # ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>>>>>>>
>>>>>>> dmesg:
>>>>>>> [   97.576507] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0
>>>>>>> (April
>>>>>>> 4,
>>>>>>> 2008)
>>>>>>> [   97.769582] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>>>>> [  100.765318] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>>>>> [  110.899591] ib0: no IPv6 routers present
>>>>>>>
>>>>>>> # modprobe xprtrdma
>>>>>>>
>>>>>>> dmesg:
>>>>>>> [  169.269689] RPC: Registered udp transport module.
>>>>>>> [  169.269691] RPC: Registered tcp transport module.
>>>>>>> [  169.289755] RPC: Registered rdma transport module.
>>>>>>>
>>>>>>> Ross
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 26, 2010 at 2:32 PM, Tom Tucker
>>>>>>> <tom at opengridcomputing.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Ross Smith wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> A quick addendum to that, I've just had a look at rpcinfo on both
>>>>>>>>> the
>>>>>>>>> Ubuntu and Solaris NFS servers, does this indicate that nfs-rdma is
>>>>>>>>> not actually running?
>>>>>>>>>
>>>>>>>>> rpcinfo -p
>>>>>>>>>  program vers proto   port
>>>>>>>>>  100000    2   tcp    111  portmapper
>>>>>>>>>  100000    2   udp    111  portmapper
>>>>>>>>>  100024    1   udp  37031  status
>>>>>>>>>  100024    1   tcp  58463  status
>>>>>>>>>  100021    1   udp  34989  nlockmgr
>>>>>>>>>  100021    3   udp  34989  nlockmgr
>>>>>>>>>  100021    4   udp  34989  nlockmgr
>>>>>>>>>  100021    1   tcp  47979  nlockmgr
>>>>>>>>>  100021    3   tcp  47979  nlockmgr
>>>>>>>>>  100021    4   tcp  47979  nlockmgr
>>>>>>>>>  100003    2   udp   2049  nfs
>>>>>>>>>  100003    3   udp   2049  nfs
>>>>>>>>>  100003    4   udp   2049  nfs
>>>>>>>>>  100003    2   tcp   2049  nfs
>>>>>>>>>  100003    3   tcp   2049  nfs
>>>>>>>>>  100003    4   tcp   2049  nfs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> Hi Ross:
>>>>>>>>
>>>>>>>> No, although that would be very nice, the Linux network maintainer
>>>>>>>> didn't
>>>>>>>> want RDMA transports sharing the network port space unfortunately.
>>>>>>>>
>>>>>>>> You would need to do this on the server to see if it is listening:
>>>>>>>>
>>>>>>>> # cat /proc/fs/nfsd/portlist
>>>>>>>>
>>>>>>>> You should see something like this:
>>>>>>>>
>>>>>>>> rdma 20049
>>>>>>>> tcp 2049
>>>>>>>> udp 2049
>>>>>>>>
>>>>>>>> The top line indicates that the rdma transport is listening on port
>>>>>>>> 20049.
>>>>>>>>
>>>>>>>> If it's not showing, do this:
>>>>>>>>
>>>>>>>> # echo 20049 > /proc/fs/nfsd/portlist
>>>>>>>>
>>>>>>>> and repeat the 'cat' step above.
>>>>>>>>
>>>>>>>> To give us a little more detail to help debug, do this:
>>>>>>>>
>>>>>>>> # echo 32767 > /proc/sys/sunrpc/rpc_debug
>>>>>>>>
>>>>>>>> on both the client and server, then try the mount again. The dmesg
>>>>>>>> log
>>>>>>>> should have a detail trace on what is happening.
>>>>>>>>
>>>>>>>> Turn off the debug output as follows:
>>>>>>>>
>>>>>>>> # echo 0 > /proc/sys/sunrpc/rpc_debug
>>>>>>>>
>>>>>>>> Tom
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> On Tue, Jan 26, 2010 at 12:24 PM, Ross Smith
>>>>>>>>> <myxiplx at googlemail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> It's taken me a week, but I've finally gotten the 2.7.00 firmware
>>>>>>>>>> for
>>>>>>>>>> this system.  I've also taken the step of installing a Ubuntu 9.10
>>>>>>>>>> server for testing in addition to the Solaris server I already
>>>>>>>>>> have.
>>>>>>>>>>
>>>>>>>>>> So far I'm still having no joy, nfs mounts fine over TCP, but if I
>>>>>>>>>> try
>>>>>>>>>> to use RDMA it fails.
>>>>>>>>>>
>>>>>>>>>> Machines in use:
>>>>>>>>>> ============
>>>>>>>>>> Solaris Server, build 129 (about 4 weeks old), using built in
>>>>>>>>>> Infiniband
>>>>>>>>>> drivers
>>>>>>>>>> Solaris Client, same build
>>>>>>>>>> Ubuntu 9.10 Server, using kernel drivers
>>>>>>>>>> Ubuntu 9.10 Client
>>>>>>>>>> CentOS 5.2 Client, with OFED 1.4.2 and nfs-utils 1.1.6
>>>>>>>>>>
>>>>>>>>>> All five machines are on identical hardware, with Mellanox ConnectX
>>>>>>>>>> infiniband cards running firmware 2.7.00.
>>>>>>>>>>
>>>>>>>>>> They all seem to be running Infiniband fine, ipoib works perfectly
>>>>>>>>>> and
>>>>>>>>>> I can connect regular tcp nfs mounts over the infiniband links
>>>>>>>>>> without
>>>>>>>>>> any issues.
>>>>>>>>>>
>>>>>>>>>> With regular tcp nfs I'm getting consistent speeds of 300MB/s.
>>>>>>>>>>
>>>>>>>>>> However, nfs-rdma just does not want to work, no matter which
>>>>>>>>>> combination of servers and clients I try:
>>>>>>>>>>
>>>>>>>>>> Ubuntu Client -> Solaris
>>>>>>>>>> =================
>>>>>>>>>> Commands used:
>>>>>>>>>> # modprobe xprtrdma
>>>>>>>>>> # mount -o proto=rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>>>>>>>>
>>>>>>>>>> This is the entire dmesg log, from first loading the driver, to
>>>>>>>>>> attempting to connect nfs-rdma:
>>>>>>>>>>
>>>>>>>>>> [   46.834146] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0
>>>>>>>>>> (April
>>>>>>>>>> 4, 2008)
>>>>>>>>>> [   47.028093] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>>>>>>>> [   52.018562] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>>>>>>>> [   52.018698] ib0: multicast join failed for
>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>> [   54.014289] ib0: multicast join failed for
>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>> [   58.006864] ib0: multicast join failed for
>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>> [   62.027202] ib0: no IPv6 routers present
>>>>>>>>>> [   65.120791] RPC: Registered udp transport module.
>>>>>>>>>> [   65.120795] RPC: Registered tcp transport module.
>>>>>>>>>> [   65.129162] RPC: Registered rdma transport module.
>>>>>>>>>> [   65.992081] ib0: multicast join failed for
>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>> [   81.962465] ib0: multicast join failed for
>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>> [   83.593144] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>> mlx4_0,
>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>> [  148.476967] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [  148.480488] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [  148.484421] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [  148.488376] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [ 4311.663188] svc: failed to register lockdv1 RPC service (errno
>>>>>>>>>> 97).
>>>>>>>>>>
>>>>>>>>>> At this point, the attempt crashed the Solaris server, and hung the
>>>>>>>>>> mount attempt on the Ubuntu client, requiring ctrl-c on the client,
>>>>>>>>>> and automatically rebooting the server.
>>>>>>>>>>
>>>>>>>>>> I then tried again, connecting to the Ubuntu nfs server.  This time
>>>>>>>>>> neither device hung or crashed, but I had very similar messages in
>>>>>>>>>> the
>>>>>>>>>> client log:
>>>>>>>>>>
>>>>>>>>>> # mount -o proto=rdma,port=20049 192.168.101.5:/home/ross/nfsexport
>>>>>>>>>> ./nfstest
>>>>>>>>>>
>>>>>>>>>> [ 4435.102852] rpcrdma: connection to 192.168.101.5:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [ 4435.107492] rpcrdma: connection to 192.168.101.5:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [ 4435.111471] rpcrdma: connection to 192.168.101.5:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>> [ 4435.115468] rpcrdma: connection to 192.168.101.5:20049 closed
>>>>>>>>>> (-111)
>>>>>>>>>>
>>>>>>>>>> So it seems that it's not the server:  both Solaris and Ubuntu have
>>>>>>>>>> the same problem, although Ubuntu at least does not crash when
>>>>>>>>>> clients
>>>>>>>>>> attempt to connect.
>>>>>>>>>>
>>>>>>>>>> I also get the same error if I attempt to connect from the CentOS
>>>>>>>>>> 5.2
>>>>>>>>>> machine which is using regular OFED to the Ubuntu server:
>>>>>>>>>>
>>>>>>>>>> CentOS 5.2 -> Ubuntu
>>>>>>>>>> ================
>>>>>>>>>> This time I'm running mount.rnfs directly as per the instructions
>>>>>>>>>> in
>>>>>>>>>> the OFED nfs-rdma release notes.
>>>>>>>>>>
>>>>>>>>>> commands used:
>>>>>>>>>> # modprobe xprtrdma
>>>>>>>>>> # mount.rnfs 192.168.101.5:/home/ross/nfsexport ./rdmatest -i -o
>>>>>>>>>> proto=rdma,port=20049
>>>>>>>>>>
>>>>>>>>>> dmesg results look very similar:
>>>>>>>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>>>>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>>>>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>>>>>>> rpcrdma: connection to 192.168.101.5:20049 closed (-111)
>>>>>>>>>>
>>>>>>>>>> However attempting this has a bad effect on CentOS - the client
>>>>>>>>>> crashes and I loose my ssh session.
>>>>>>>>>>
>>>>>>>>>> Does anybody have any ideas?
>>>>>>>>>>
>>>>>>>>>> thanks,
>>>>>>>>>>
>>>>>>>>>> Ross
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 18, 2010 at 6:31 PM, David Brean <David.Brean at sun.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                     
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I agree, update the HCA firmware before proceeding.  [The
>>>>>>>>>>> description
>>>>>>>>>>> in
>>>>>>>>>>> Bugzilla Bug 1711 seems to match the problem that you are
>>>>>>>>>>> observing.]
>>>>>>>>>>>
>>>>>>>>>>> Also, if you want to help diagnose the "ib0: post_send failed",
>>>>>>>>>>> take
>>>>>>>>>>> a
>>>>>>>>>>> look
>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://lists.openfabrics.org/pipermail/general/2009-July/061118.html.
>>>>>>>>>>>
>>>>>>>>>>> -David
>>>>>>>>>>>
>>>>>>>>>>> Ross Smith wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Tom,
>>>>>>>>>>>
>>>>>>>>>>> No, you're right - I'm just using the support that's built into
>>>>>>>>>>> the
>>>>>>>>>>> kernel, and I agree, diagnostics from Solaris is proving very
>>>>>>>>>>> tricky.
>>>>>>>>>>> I do have a Solaris client connected to this and showing some
>>>>>>>>>>> decent
>>>>>>>>>>> speeds (over 900Mb/s), but I've been thinking that I might need to
>>>>>>>>>>> get
>>>>>>>>>>> a Linux server running for testing before I spend much more time
>>>>>>>>>>> trying to get the two separate systems working.
>>>>>>>>>>>
>>>>>>>>>>> However, I have found over the weekend that I'm running older
>>>>>>>>>>> firmware
>>>>>>>>>>> and need that updating.  I'd missed that in the nfs-rdma readme so
>>>>>>>>>>> I'm
>>>>>>>>>>> pretty sure that's going to be causing problems.  I'm trying to
>>>>>>>>>>> get
>>>>>>>>>>> that resolved before I do too much other testing.
>>>>>>>>>>>
>>>>>>>>>>> Regular NFS running over the ipoib link seems fine, and I don't
>>>>>>>>>>> get
>>>>>>>>>>> any extra warnings using that.  I can also run a full virtual
>>>>>>>>>>> machine
>>>>>>>>>>> quite happily over NFS, so despite the warnings, the link does
>>>>>>>>>>> appear
>>>>>>>>>>> stable and reliable.
>>>>>>>>>>>
>>>>>>>>>>> Ross
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 18, 2010 at 4:30 PM, Tom Tucker
>>>>>>>>>>> <tom at opengridcomputing.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Ross:
>>>>>>>>>>>
>>>>>>>>>>> I would check that you have IB RDMA actually working. The core
>>>>>>>>>>> transport
>>>>>>>>>>> issues suggest that there may be network problems that will
>>>>>>>>>>> prevent
>>>>>>>>>>> NFSRDMA
>>>>>>>>>>> from working properly.
>>>>>>>>>>>
>>>>>>>>>>> The first question is whether or not you are actually using OFED.
>>>>>>>>>>> You're
>>>>>>>>>>> not
>>>>>>>>>>>  -- right? You're just using the support built into the 2.6.31
>>>>>>>>>>> kernel?
>>>>>>>>>>>
>>>>>>>>>>> Second I don't think the mount is actually completing. I think the
>>>>>>>>>>> command
>>>>>>>>>>> is returning, but the mount never actually finishes. It's sitting
>>>>>>>>>>> there
>>>>>>>>>>> hung
>>>>>>>>>>> trying to perform the first RPC to the server (RPC_NOP) and it's
>>>>>>>>>>> never
>>>>>>>>>>> succeeding. That's why you see all those connect/disconnect
>>>>>>>>>>> messages
>>>>>>>>>>> in
>>>>>>>>>>> your
>>>>>>>>>>> log file. It tries to send, gets an error, disconnects,
>>>>>>>>>>> reconnects,
>>>>>>>>>>> tries to
>>>>>>>>>>> send .... you get the picture.
>>>>>>>>>>>
>>>>>>>>>>> Step 1 I think would be to ensure that you actually have IB up and
>>>>>>>>>>> running.
>>>>>>>>>>> IPoIB between the two seems a little dodgy given the dmesg log. Do
>>>>>>>>>>> you
>>>>>>>>>>> have
>>>>>>>>>>> another Linux box you can use to test out
>>>>>>>>>>> connectivity/configuration
>>>>>>>>>>> with
>>>>>>>>>>> your victim? There are test programs in OFED (rping) that would
>>>>>>>>>>> help
>>>>>>>>>>> you
>>>>>>>>>>> do
>>>>>>>>>>> this, but I don't believe they are available on Solaris.
>>>>>>>>>>>
>>>>>>>>>>> Tom
>>>>>>>>>>>
>>>>>>>>>>> Steve Wise wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> nfsrdma hang on ewg...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -------- Original Message --------
>>>>>>>>>>> Subject:     [ewg] nfs-rdma hanging with Ubuntu 9.10
>>>>>>>>>>> Date:     Fri, 15 Jan 2010 13:28:31 +0000
>>>>>>>>>>> From:     Ross Smith <myxiplx at googlemail.com>
>>>>>>>>>>> To:     ewg at openfabrics.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi folks, it's me again I'm afraid.
>>>>>>>>>>>
>>>>>>>>>>> Thanks to the help from this list, I have ipoib working, however I
>>>>>>>>>>> seem to be having a few problems, not least of which is commands
>>>>>>>>>>> hanging if I attempt to use nfs-rdma.
>>>>>>>>>>>
>>>>>>>>>>> Although the rmda mount command completes, the system then becomes
>>>>>>>>>>> unresponsive if I attempt any command such as 'ls', even outside
>>>>>>>>>>> of
>>>>>>>>>>> the mounted folder.  Umount also fails with the error "device is
>>>>>>>>>>> busy".
>>>>>>>>>>>
>>>>>>>>>>> If anybody can spare the time to help it would be very much
>>>>>>>>>>> appreciated.  I do seem to have a lot of warnings in the logs, but
>>>>>>>>>>> although I've tried searching for solutions haven't found anything
>>>>>>>>>>> yet.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> System details
>>>>>>>>>>> ============
>>>>>>>>>>> - Ubuntu 9.10
>>>>>>>>>>>  (kernel 2.6.31)
>>>>>>>>>>> - Mellanox ConnectX QDR card
>>>>>>>>>>> - Flextronics DDR switch
>>>>>>>>>>> - OpenSolaris NFS server, running one of the latest builds for
>>>>>>>>>>> troubleshooting
>>>>>>>>>>> - OpenSM running on another Ubuntu 9.10 box with a Mellanox
>>>>>>>>>>> Infinihost III Lx card
>>>>>>>>>>>
>>>>>>>>>>> I am using the kernel drivers only, I have not installed OFED on
>>>>>>>>>>> this
>>>>>>>>>>> machine.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Loading driver
>>>>>>>>>>> ============
>>>>>>>>>>> The driver appears to load, and ipoib works, but there are rather
>>>>>>>>>>> a
>>>>>>>>>>> lot of warnings from dmesg.
>>>>>>>>>>>
>>>>>>>>>>> I am loading the driver with:
>>>>>>>>>>> $ sudo modprobe mlx4_ib
>>>>>>>>>>> $ sudo modprobe ib_ipoib
>>>>>>>>>>> $ sudo ifconfig ib0 192.168.101.4 netmask 255.255.255.0 up
>>>>>>>>>>>
>>>>>>>>>>> And that leaves me with:
>>>>>>>>>>> $ lsmod
>>>>>>>>>>> Module                  Size  Used by
>>>>>>>>>>> ib_ipoib               72452  0
>>>>>>>>>>> ib_cm                  37196  1 ib_ipoib
>>>>>>>>>>> ib_sa                  19812  2 ib_ipoib,ib_cm
>>>>>>>>>>> mlx4_ib                42720  0
>>>>>>>>>>> ib_mad                 37524  3 ib_cm,ib_sa,mlx4_ib
>>>>>>>>>>> ib_core                57884  5
>>>>>>>>>>> ib_ipoib,ib_cm,ib_sa,mlx4_ib,ib_mad
>>>>>>>>>>> binfmt_misc             8356  1
>>>>>>>>>>> ppdev                   6688  0
>>>>>>>>>>> psmouse                56180  0
>>>>>>>>>>> serio_raw               5280  0
>>>>>>>>>>> mlx4_core              84728  1 mlx4_ib
>>>>>>>>>>> joydev                 10272  0
>>>>>>>>>>> lp                      8964  0
>>>>>>>>>>> parport                35340  2 ppdev,lp
>>>>>>>>>>> iptable_filter          3100  0
>>>>>>>>>>> ip_tables              11692  1 iptable_filter
>>>>>>>>>>> x_tables               16544  1 ip_tables
>>>>>>>>>>> usbhid                 38208  0
>>>>>>>>>>> e1000e                122124  0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> At this point I can ping the Solaris server over the IP link.
>>>>>>>>>>> Although I do need to issue a ping from Solaris before I get a
>>>>>>>>>>> reply.
>>>>>>>>>>> I'm mentioning that it in case it's relevant, but at this point
>>>>>>>>>>> I'm
>>>>>>>>>>> assuming that's just a firewall setting on the server.
>>>>>>>>>>>
>>>>>>>>>>> But although ping works, I am starting to get some dmesg warnings,
>>>>>>>>>>> I
>>>>>>>>>>> just don't know if they are relevant:
>>>>>>>>>>> [  313.692072] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0
>>>>>>>>>>> (April
>>>>>>>>>>> 4,
>>>>>>>>>>> 2008)
>>>>>>>>>>> [  313.885220] ADDRCONF(NETDEV_UP): ib0: link is not ready
>>>>>>>>>>> [  316.880450] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  316.880573] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready
>>>>>>>>>>> [  316.880789] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  320.873613] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  327.147114] ib0: no IPv6 routers present
>>>>>>>>>>> [  328.861550] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  344.834440] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  360.808312] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>> [  376.782186] ib0: multicast join failed for
>>>>>>>>>>> ff12:401b:ffff:0000:0000:0000:0000:00fb, status -22
>>>>>>>>>>>
>>>>>>>>>>> And at this point however, regular nfs mounts work fine over the
>>>>>>>>>>> ipoib
>>>>>>>>>>> link:
>>>>>>>>>>> $ sudo mount 192.168.100.1:/test/rdma ./nfstest
>>>>>>>>>>>
>>>>>>>>>>> Bug again, that again adds warnings to dmesg:
>>>>>>>>>>> [  826.456902] RPC: Registered udp transport module.
>>>>>>>>>>> [  826.456905] RPC: Registered tcp transport module.
>>>>>>>>>>> [  841.553135] svc: failed to register lockdv1 RPC service (errno
>>>>>>>>>>> 97).
>>>>>>>>>>>
>>>>>>>>>>> And the speed is definitely nothing to write home about, copying a
>>>>>>>>>>> 100mb file takes over 10 seconds:
>>>>>>>>>>> $ time cp ./100mb ./100mb2
>>>>>>>>>>>
>>>>>>>>>>> real    0m10.472s
>>>>>>>>>>> user    0m0.000s
>>>>>>>>>>> sys    0m1.248s
>>>>>>>>>>>
>>>>>>>>>>> And again with warnings appearing in dmesg:
>>>>>>>>>>> [  872.373364] ib0: post_send failed
>>>>>>>>>>> [  872.373407] ib0: post_send failed
>>>>>>>>>>> [  872.373448] ib0: post_send failed
>>>>>>>>>>>
>>>>>>>>>>> I think this is a client issue rather than a problem on the server
>>>>>>>>>>> as
>>>>>>>>>>> the same test on an OpenSolaris client takes under half a second:
>>>>>>>>>>> # time cp ./100mb ./100mb2
>>>>>>>>>>>
>>>>>>>>>>> real    0m0.334s
>>>>>>>>>>> user    0m0.001s
>>>>>>>>>>> sys     0m0.176s
>>>>>>>>>>>
>>>>>>>>>>> Although the system is definitely not right, my long term aim is
>>>>>>>>>>> to
>>>>>>>>>>> run nfs-rdma on this system, so my next test was to try that and
>>>>>>>>>>> see
>>>>>>>>>>> if the speed improved:
>>>>>>>>>>>
>>>>>>>>>>> $ sudo umount ./nfstest
>>>>>>>>>>> $ sudo mount -o rdma,port=20049 192.168.101.1:/test/rdma ./nfstest
>>>>>>>>>>>
>>>>>>>>>>> That takes a long time to connect.  It does eventually go through,
>>>>>>>>>>> but
>>>>>>>>>>> only after the following errors in dmesg:
>>>>>>>>>>>
>>>>>>>>>>> [ 1140.698659] RPC: Registered rdma transport module.
>>>>>>>>>>> [ 1155.697672] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1160.688455] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1160.693818] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1160.695131] svc: failed to register lockdv1 RPC service (errno
>>>>>>>>>>> 97).
>>>>>>>>>>> [ 1170.676049] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1170.681458] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1190.647355] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1190.652778] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1220.602353] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1220.607809] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1250.557397] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1250.562817] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1281.522735] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1281.528442] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1311.477845] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1311.482983] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>> [ 1341.432758] rpcrdma: connection to 192.168.101.1:20049 closed
>>>>>>>>>>> (-103)
>>>>>>>>>>> [ 1341.438212] rpcrdma: connection to 192.168.101.1:20049 on
>>>>>>>>>>> mlx4_0,
>>>>>>>>>>> memreg 5 slots 32 ird 4
>>>>>>>>>>>
>>>>>>>>>>> However, at this point my shell session becomes unresponsive if I
>>>>>>>>>>> attempt so much as a 'ls'.  The system hasn't hung completely
>>>>>>>>>>> however
>>>>>>>>>>> as I can still connect another ssh session and restart with
>>>>>>>>>>> $ sudo init 6
>>>>>>>>>>>
>>>>>>>>>>> Can anybody help?  Is there anything obvious I am doing wrong
>>>>>>>>>>> here?
>>>>>>>>>>>
>>>>>>>>>>> thanks,
>>>>>>>>>>>
>>>>>>>>>>> Ross
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ewg mailing list
>>>>>>>>>>> ewg at lists.openfabrics.org
>>>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> ewg mailing list
>>>>>>>>>>> ewg at lists.openfabrics.org
>>>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>> _______________________________________________
>>>>>>>>> ewg mailing list
>>>>>>>>> ewg at lists.openfabrics.org
>>>>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>                 
>>>>>>             
>>>>         
>>     




More information about the ewg mailing list