[Fwd: Re: [ofa-general][NFS/RDMA]Can'tmountNFS/RDMApartition]]

Diego Moreno Diego.Moreno-Lazaro at bull.net
Tue Apr 28 06:07:50 PDT 2009


Hi Tom,

I'm running 2.6.27.10 vanilla kernel but I'll try with 2.6.29.

Thanks,

Diego

Sysctl config on server:

[root at twing ~]# cat /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1

# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
## MLX4_EN tuning parameters ##
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 0
net.core.netdev_max_backlog = 250000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 16777216
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
## END MLX4_EN ##



tmtalpey at gmail.com wrote:
> In both cases the connection is being lost under load. This usually indicates a credit (slot count) mismatch, or an IRD/ORD one. What kernel version are you running on each end? Any special sysctl settings on the server?
> 
> The oops on the client is troubling, but it,s happening in the error upcall and resembles a problem I fixed a while back. I'll check it when I get back to a source repo. It's not the cause of the issue though.
> 
> Tom.
> 
> 
> -----Original Message-----
> 
> From:  Diego Moreno <Diego.Moreno-Lazaro at bull.net>
> Subj:  Re: [Fwd: Re: [ofa-general][NFS/RDMA]Can'tmountNFS/RDMApartition]]
> Date:  Tue Apr 28, 2009 8:44 am
> Size:  3K
> To:  Vu Pham <vuhuong at mellanox.com>
> cc:  OpenIB <general at lists.openfabrics.org>
> 
> Hi,
> 
> I'm working with Celine trying to make NFS RDMA work. We installed a new 
>   firmware (2.6.636). We still have the problem but now we have more 
> information on client side.
> 
> - With the workaround (memreg 6) we can mount without any problem. We 
> can read a file but if we try to create a file with dd, application 
> hangs and then we have to do 'umount -f'. There is no message on server. 
> Message on client:
> 
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> 
> 
> - With fast registration:
> 
> There is no message on server. dmesg client output with fast registration:
> 
> 
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3c/0x92()
> Modules linked in: xprtrdma autofs4 hidp nfs lockd nfs_acl rfcomm l2cap 
> bluetooth sunrpc iptable_filter ip_tables ip6t_REJECT xt_tcpudp 
> ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq 
> freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa 
> ipv6 ib_uverbs ib_umad iw_nes ib_ipath ib_mthca dm_multipath scsi_dh 
> raid0 sbs sbshc battery acpi_memhotplug ac parport_pc lp parport mlx4_ib 
> ib_mad ib_core e1000e sr_mod joydev cdrom mlx4_core i5000_edac edac_core 
> shpchp rtc_cmos sg pcspkr rtc_core rtc_lib i2c_i801 i2c_core serio_raw 
> button dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix 
> libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last 
> unloaded: microcode]
> Pid: 0, comm: swapper Not tainted 2.6.27_ofa_compil #2
> 
> Call Trace:
>   <IRQ>  [<ffffffff80235b8d>] warn_on_slowpath+0x51/0x77
>   [<ffffffff80229b79>] __wake_up+0x38/0x4f
>   [<ffffffff80246d57>] __wake_up_bit+0x28/0x2d
>   [<ffffffffa05485af>] rpc_wake_up_task_queue_locked+0x223/0x24b [sunrpc]
>   [<ffffffffa054861e>] rpc_wake_up_status+0x47/0x82 [sunrpc]
>   [<ffffffff80239c49>] local_bh_enable_ip+0x3c/0x92
>   [<ffffffffa0638fd1>] rpcrdma_conn_func+0x6d/0x7c [xprtrdma]
>   [<ffffffffa063b316>] rpcrdma_qp_async_error_upcall+0x45/0x5a [xprtrdma]
>   [<ffffffffa0294bb3>] mlx4_ib_qp_event+0xf9/0x100 [mlx4_ib]
>   [<ffffffff802443da>] __queue_work+0x22/0x32
>   [<ffffffffa01fc5d4>] mlx4_qp_event+0x8a/0xad [mlx4_core]
>   [<ffffffffa01f50a5>] mlx4_eq_int+0x55/0x291 [mlx4_core]
>   [<ffffffffa01f52f0>] mlx4_msi_x_interrupt+0xf/0x16 [mlx4_core]
>   [<ffffffff802624f4>] handle_IRQ_event+0x25/0x53
>   [<ffffffff80263c0a>] handle_edge_irq+0xe3/0x123
>   [<ffffffff8020e907>] do_IRQ+0xf1/0x15e
>   [<ffffffff8020c381>] ret_from_intr+0x0/0xa
>   <EOI>  [<ffffffffa0549c3e>] nul_marshal+0x0/0x20 [sunrpc]
>   [<ffffffff80212474>] mwait_idle+0x41/0x45
>   [<ffffffff8020abdf>] cpu_idle+0x7e/0x9c
> 
> ---[ end trace 5cc994fbe7e141af ]---
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32 
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> 
> 
> Thanks,
> 
> Diego
> 
> Vu Pham wrote:
>> Celine Bourde wrote:
>>> We have still the same problem, even changing the registration method.
>>>
>>> mount doesn't reply and this is the output of dmesg on client:
>>>
>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
>>> ird 16
>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32 
>>> ird 16
> 
> --- message truncated ---
> 
> 
> 
> 



More information about the general mailing list