[ewg] nfsrdma fails to write big file,
Vu Pham
vuhuong at mellanox.com
Wed Feb 24 10:56:09 PST 2010
Tom,
Did you make any change to have bonnie++, dd of a 10G file and vdbench
concurrently run & finish?
I keep hitting the WQE overflow error below.
I saw that most of the requests have two chunks (32K chunk and
some-bytes chunk), each chunk requires an frmr + invalidate wrs;
However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
then for frmr case you do
ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
causes the wqe overflow happened faster.
After applying the following patch, I have thing vdbench, dd, and copy
10g_file running overnight
-vu
--- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c 2010-02-24
10:41:22.000000000 -0800
+++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c 2010-02-24
10:03:18.000000000 -0800
@@ -649,8 +654,15 @@
ep->rep_attr.cap.max_send_wr = cdata->max_requests;
switch (ia->ri_memreg_strategy) {
case RPCRDMA_FRMR:
- /* Add room for frmr register and invalidate WRs */
- ep->rep_attr.cap.max_send_wr *= 3;
+ /*
+ * Add room for frmr register and invalidate WRs
+ * Requests sometimes have two chunks, each chunk
+ * requires to have different frmr. The safest
+ * WRs required are max_send_wr * 6; however, we
+ * get send completions and poll fast enough, it
+ * is pretty safe to have max_send_wr * 4.
+ */
+ ep->rep_attr.cap.max_send_wr *= 4;
if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
return -EINVAL;
break;
@@ -682,7 +694,8 @@
ep->rep_attr.cap.max_recv_sge);
/* set trigger for requesting send completion */
- ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /* - 1*/;
+ ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
+
switch (ia->ri_memreg_strategy) {
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
> -----Original Message-----
> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
> bounces at lists.openfabrics.org] On Behalf Of Vu Pham
> Sent: Monday, February 22, 2010 12:23 PM
> To: Tom Tucker
> Cc: linux-rdma at vger.kernel.org; Mahesh Siddheshwar;
> ewg at lists.openfabrics.org
> Subject: Re: [ewg] nfsrdma fails to write big file,
>
> Tom,
>
> Some more info on the problem:
> 1. Running with memreg=4 (FMR) I can not reproduce the problem
> 2. I also see different error on client
>
> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
> 'nobody'
> does not map into domain 'localdomain'
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
> returned -12 cq_init 48 cq_count 32
> Feb 22 12:17:00 mellanox-2 kernel: RPC: rpcrdma_event_process:
> send WC status 5, vend_err F5
> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
> 13.20.1.9:20049 closed (-103)
>
> -vu
>
> > -----Original Message-----
> > From: Tom Tucker [mailto:tom at opengridcomputing.com]
> > Sent: Monday, February 22, 2010 10:49 AM
> > To: Vu Pham
> > Cc: linux-rdma at vger.kernel.org; Mahesh Siddheshwar;
> > ewg at lists.openfabrics.org
> > Subject: Re: [ewg] nfsrdma fails to write big file,
> >
> > Vu Pham wrote:
> > > Setup:
> > > 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
> > ConnectX2
> > > QDR HCAs fw 2.7.8-6, RHEL 5.2.
> > > 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
> > >
> > >
> > > Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
> > > count=10000*, operation fail, connection get drop, client cannot
> > > re-establish connection to server.
> > > After rebooting only the client, I can mount again.
> > >
> > > It happens with both solaris and linux nfsrdma servers.
> > >
> > > For linux client/server, I run memreg=5 (FRMR), I don't see
problem
> > with
> > > memreg=6 (global dma key)
> > >
> > >
> >
> > Awesome. This is the key I think.
> >
> > Thanks for the info Vu,
> > Tom
> >
> >
> > > On Solaris server snv 130, we see problem decoding write request
of
> > 32K.
> > > The client send two read chunks (32K & 16-byte), the server fail
to
> > do
> > > rdma read on the 16-byte chunk (cqe.status = 10 ie.
> > > IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
> connection.
> > We
> > > don't see this problem on nfs version 3 on Solaris. Solaris server
> > run
> > > normal memory registration mode.
> > >
> > > On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
> > >
> > > I added these notes in bug #1919 (bugs.openfabrics.org) to track
> the
> > > issue.
> > >
> > > thanks,
> > > -vu
> > > _______________________________________________
> > > ewg mailing list
> > > ewg at lists.openfabrics.org
> > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
> > >
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
More information about the ewg
mailing list