[Fwd: Re: [ofa-general][NFS/RDMA]Can'tmountNFS/RDMApartition]]
Diego Moreno
Diego.Moreno-Lazaro at bull.net
Tue Apr 28 06:07:50 PDT 2009
Hi Tom,
I'm running 2.6.27.10 vanilla kernel but I'll try with 2.6.29.
Thanks,
Diego
Sysctl config on server:
[root at twing ~]# cat /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 0
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
## MLX4_EN tuning parameters ##
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_sack = 0
net.core.netdev_max_backlog = 250000
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 16777216
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
## END MLX4_EN ##
tmtalpey at gmail.com wrote:
> In both cases the connection is being lost under load. This usually indicates a credit (slot count) mismatch, or an IRD/ORD one. What kernel version are you running on each end? Any special sysctl settings on the server?
>
> The oops on the client is troubling, but it,s happening in the error upcall and resembles a problem I fixed a while back. I'll check it when I get back to a source repo. It's not the cause of the issue though.
>
> Tom.
>
>
> -----Original Message-----
>
> From: Diego Moreno <Diego.Moreno-Lazaro at bull.net>
> Subj: Re: [Fwd: Re: [ofa-general][NFS/RDMA]Can'tmountNFS/RDMApartition]]
> Date: Tue Apr 28, 2009 8:44 am
> Size: 3K
> To: Vu Pham <vuhuong at mellanox.com>
> cc: OpenIB <general at lists.openfabrics.org>
>
> Hi,
>
> I'm working with Celine trying to make NFS RDMA work. We installed a new
> firmware (2.6.636). We still have the problem but now we have more
> information on client side.
>
> - With the workaround (memreg 6) we can mount without any problem. We
> can read a file but if we try to create a file with dd, application
> hangs and then we have to do 'umount -f'. There is no message on server.
> Message on client:
>
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>
>
> - With fast registration:
>
> There is no message on server. dmesg client output with fast registration:
>
>
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> ------------[ cut here ]------------
> WARNING: at kernel/softirq.c:136 local_bh_enable_ip+0x3c/0x92()
> Modules linked in: xprtrdma autofs4 hidp nfs lockd nfs_acl rfcomm l2cap
> bluetooth sunrpc iptable_filter ip_tables ip6t_REJECT xt_tcpudp
> ip6table_filter ip6_tables x_tables cpufreq_ondemand acpi_cpufreq
> freq_table rdma_ucm ib_sdp rdma_cm iw_cm ib_addr ib_ipoib ib_cm ib_sa
> ipv6 ib_uverbs ib_umad iw_nes ib_ipath ib_mthca dm_multipath scsi_dh
> raid0 sbs sbshc battery acpi_memhotplug ac parport_pc lp parport mlx4_ib
> ib_mad ib_core e1000e sr_mod joydev cdrom mlx4_core i5000_edac edac_core
> shpchp rtc_cmos sg pcspkr rtc_core rtc_lib i2c_i801 i2c_core serio_raw
> button dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage ata_piix
> libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last
> unloaded: microcode]
> Pid: 0, comm: swapper Not tainted 2.6.27_ofa_compil #2
>
> Call Trace:
> <IRQ> [<ffffffff80235b8d>] warn_on_slowpath+0x51/0x77
> [<ffffffff80229b79>] __wake_up+0x38/0x4f
> [<ffffffff80246d57>] __wake_up_bit+0x28/0x2d
> [<ffffffffa05485af>] rpc_wake_up_task_queue_locked+0x223/0x24b [sunrpc]
> [<ffffffffa054861e>] rpc_wake_up_status+0x47/0x82 [sunrpc]
> [<ffffffff80239c49>] local_bh_enable_ip+0x3c/0x92
> [<ffffffffa0638fd1>] rpcrdma_conn_func+0x6d/0x7c [xprtrdma]
> [<ffffffffa063b316>] rpcrdma_qp_async_error_upcall+0x45/0x5a [xprtrdma]
> [<ffffffffa0294bb3>] mlx4_ib_qp_event+0xf9/0x100 [mlx4_ib]
> [<ffffffff802443da>] __queue_work+0x22/0x32
> [<ffffffffa01fc5d4>] mlx4_qp_event+0x8a/0xad [mlx4_core]
> [<ffffffffa01f50a5>] mlx4_eq_int+0x55/0x291 [mlx4_core]
> [<ffffffffa01f52f0>] mlx4_msi_x_interrupt+0xf/0x16 [mlx4_core]
> [<ffffffff802624f4>] handle_IRQ_event+0x25/0x53
> [<ffffffff80263c0a>] handle_edge_irq+0xe3/0x123
> [<ffffffff8020e907>] do_IRQ+0xf1/0x15e
> [<ffffffff8020c381>] ret_from_intr+0x0/0xa
> <EOI> [<ffffffffa0549c3e>] nul_marshal+0x0/0x20 [sunrpc]
> [<ffffffff80212474>] mwait_idle+0x41/0x45
> [<ffffffff8020abdf>] cpu_idle+0x7e/0x9c
>
> ---[ end trace 5cc994fbe7e141af ]---
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 5 slots 32
> ird 16
> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>
>
> Thanks,
>
> Diego
>
> Vu Pham wrote:
>> Celine Bourde wrote:
>>> We have still the same problem, even changing the registration method.
>>>
>>> mount doesn't reply and this is the output of dmesg on client:
>>>
>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32
>>> ird 16
>>> rpcrdma: connection to 192.168.0.215:2050 closed (-103)
>>> rpcrdma: connection to 192.168.0.215:2050 on mlx4_0, memreg 6 slots 32
>>> ird 16
>
> --- message truncated ---
>
>
>
>
More information about the general
mailing list