[ewg] nfsrdma fails to write big file,
Tom Tucker
tom at opengridcomputing.com
Wed Feb 24 16:51:31 PST 2010
Vu,
I ran the number of slots down to 8 (echo 8 > rdma_slot_table_entries)
and I can reproduce the issue now. I'm going to try setting the
allocation multiple to 5 and see if I can't prove to myself and Roland
that we've accurately computed the correct factor.
I think overall a better solution might be a different credit system,
however, I think that's a much more substantial change than we can
tackle at this point.
Tom
Tom Tucker wrote:
> Vu,
>
> Based on the mapping code, it looks to me like the worst case is
> RPCRDMA_MAX_SEGS * 2 + 1 as the multiplier.
> However, I think in practice, due to the way that iov are built, the
> actual max is 5 (frmr for head + pagelist plus invalidates for same plus
> one for the send itself). Why did you think the max was 6?
>
> Thanks,
> Tom
>
> Tom Tucker wrote:
>
>> Vu,
>>
>> Are you changing any of the default settings? For example rsize/wsize,
>> etc... I'd like to reproduce this problem if I can.
>>
>> Thanks,
>>
>> Tom
>>
>> Vu Pham wrote:
>>
>>
>>> Tom,
>>>
>>> Did you make any change to have bonnie++, dd of a 10G file and vdbench
>>> concurrently run & finish?
>>>
>>> I keep hitting the WQE overflow error below.
>>> I saw that most of the requests have two chunks (32K chunk and
>>> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
>>> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
>>> then for frmr case you do
>>> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
>>> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
>>> causes the wqe overflow happened faster.
>>>
>>> After applying the following patch, I have thing vdbench, dd, and copy
>>> 10g_file running overnight
>>>
>>> -vu
>>>
>>>
>>> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c 2010-02-24
>>> 10:41:22.000000000 -0800
>>> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c 2010-02-24
>>> 10:03:18.000000000 -0800
>>> @@ -649,8 +654,15 @@
>>> ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>> switch (ia->ri_memreg_strategy) {
>>> case RPCRDMA_FRMR:
>>> - /* Add room for frmr register and invalidate WRs */
>>> - ep->rep_attr.cap.max_send_wr *= 3;
>>> + /*
>>> + * Add room for frmr register and invalidate WRs
>>> + * Requests sometimes have two chunks, each chunk
>>> + * requires to have different frmr. The safest
>>> + * WRs required are max_send_wr * 6; however, we
>>> + * get send completions and poll fast enough, it
>>> + * is pretty safe to have max_send_wr * 4.
>>> + */
>>> + ep->rep_attr.cap.max_send_wr *= 4;
>>> if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>>> return -EINVAL;
>>> break;
>>> @@ -682,7 +694,8 @@
>>> ep->rep_attr.cap.max_recv_sge);
>>>
>>> /* set trigger for requesting send completion */
>>> - ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /* - 1*/;
>>> + ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
>>> +
>>> switch (ia->ri_memreg_strategy) {
>>> case RPCRDMA_MEMWINDOWS_ASYNC:
>>> case RPCRDMA_MEMWINDOWS:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: ewg-bounces at lists.openfabrics.org [mailto:ewg-
>>>> bounces at lists.openfabrics.org] On Behalf Of Vu Pham
>>>> Sent: Monday, February 22, 2010 12:23 PM
>>>> To: Tom Tucker
>>>> Cc: linux-rdma at vger.kernel.org; Mahesh Siddheshwar;
>>>> ewg at lists.openfabrics.org
>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>
>>>> Tom,
>>>>
>>>> Some more info on the problem:
>>>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>>>> 2. I also see different error on client
>>>>
>>>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>>>> 'nobody'
>>>> does not map into domain 'localdomain'
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>>>> returned -12 cq_init 48 cq_count 32
>>>> Feb 22 12:17:00 mellanox-2 kernel: RPC: rpcrdma_event_process:
>>>> send WC status 5, vend_err F5
>>>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>>>> 13.20.1.9:20049 closed (-103)
>>>>
>>>> -vu
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Tom Tucker [mailto:tom at opengridcomputing.com]
>>>>> Sent: Monday, February 22, 2010 10:49 AM
>>>>> To: Vu Pham
>>>>> Cc: linux-rdma at vger.kernel.org; Mahesh Siddheshwar;
>>>>> ewg at lists.openfabrics.org
>>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>>
>>>>> Vu Pham wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Setup:
>>>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>>>
>>>>>>
>>>>>>
>>>>> ConnectX2
>>>>>
>>>>>
>>>>>
>>>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>>>
>>>>>>
>>>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>>>> count=10000*, operation fail, connection get drop, client cannot
>>>>>> re-establish connection to server.
>>>>>> After rebooting only the client, I can mount again.
>>>>>>
>>>>>> It happens with both solaris and linux nfsrdma servers.
>>>>>>
>>>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>>>
>>>>>>
>>>>>>
>>> problem
>>>
>>>
>>>
>>>>> with
>>>>>
>>>>>
>>>>>
>>>>>> memreg=6 (global dma key)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Awesome. This is the key I think.
>>>>>
>>>>> Thanks for the info Vu,
>>>>> Tom
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Solaris server snv 130, we see problem decoding write request
>>>>>>
>>>>>>
>>>>>>
>>> of
>>>
>>>
>>>
>>>>> 32K.
>>>>>
>>>>>
>>>>>
>>>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>>>
>>>>>>
>>>>>>
>>> to
>>>
>>>
>>>
>>>>> do
>>>>>
>>>>>
>>>>>
>>>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>>>
>>>>>>
>>>>>>
>>>> connection.
>>>>
>>>>
>>>>
>>>>> We
>>>>>
>>>>>
>>>>>
>>>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>>>
>>>>>>
>>>>>>
>>>>> run
>>>>>
>>>>>
>>>>>
>>>>>> normal memory registration mode.
>>>>>>
>>>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>>>
>>>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>>>
>>>>>>
>>>>>>
>>>> the
>>>>
>>>>
>>>>
>>>>>> issue.
>>>>>>
>>>>>> thanks,
>>>>>> -vu
>>>>>> _______________________________________________
>>>>>> ewg mailing list
>>>>>> ewg at lists.openfabrics.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg at lists.openfabrics.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> ewg mailing list
>>> ewg at lists.openfabrics.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>
>>>
>>>
>> _______________________________________________
>> ewg mailing list
>> ewg at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>
>>
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
More information about the ewg
mailing list