[ewg] nfsrdma fails to write big file,

Tom Tucker tom at opengridcomputing.com
Mon Mar 1 19:17:16 PST 2010


Roland:

I'll put together a patch based on 5 with a comment that indicates why I 
think 5 is the number. Since Vu has verified this behaviorally as well, 
I'm comfortable that our understanding of the code is sound. I'm on the 
road right now, so it won't be until tomorrow though.

Thanks,
Tom


Vu Pham wrote:
>   
>> -----Original Message-----
>> From: Tom Tucker [mailto:tom at opengridcomputing.com]
>> Sent: Saturday, February 27, 2010 8:23 PM
>> To: Vu Pham
>> Cc: Roland Dreier; linux-rdma at vger.kernel.org; Mahesh Siddheshwar;
>> ewg at lists.openfabrics.org
>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>
>> Roland Dreier wrote:
>>     
>>>  > +               /*
>>>  > +                * Add room for frmr register and invalidate WRs
>>>  > +                * Requests sometimes have two chunks, each chunk
>>>  > +                * requires to have different frmr. The safest
>>>  > +                * WRs required are max_send_wr * 6; however, we
>>>  > +                * get send completions and poll fast enough, it
>>>  > +                * is pretty safe to have max_send_wr * 4.
>>>  > +                */
>>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>>
>>> Seems like a bad design if there is a possibility of work queue
>>> overflow; if you're counting on events occurring in a particular
>>>       
>> order
>>     
>>> or completions being handled "fast enough", then your design is
>>>       
> going
>   
>> to
>>     
>>> fail in some high load situations, which I don't think you want.
>>>
>>>
>>>       
>> Vu,
>>
>> Would you please try the following:
>>
>> - Set the multiplier to 5
>> - Set the number of buffer credits small as follows "echo 4 >
>> /proc/sys/sunrpc/rdma_slot_table_entries"
>> - Rerun your test and see if you can reproduce the problem?
>>
>> I did the above and was unable to reproduce, but I would like to see
>>     
> if
>   
>> you can to convince ourselves that 5 is the right number.
>>
>>
>>     
>
> Tom,
>
> I did the above and can not reproduce either.
>
> I think 5 is the right number; however, we should optimize it later.
>
> -vu
>   




More information about the ewg mailing list