[ewg] nfsrdma fails to write big file,

Mahesh Siddheshwar siddheshwar.mahesh at sun.com
Wed Mar 3 12:26:40 PST 2010


Hi Tom, Vu,

Tom Tucker wrote:
> Roland Dreier wrote:
>>  > +               /*  > +                * Add room for frmr 
>> register and invalidate WRs
>>  > +                * Requests sometimes have two chunks, each chunk
>>  > +                * requires to have different frmr. The safest
>>  > +                * WRs required are max_send_wr * 6; however, we
>>  > +                * get send completions and poll fast enough, it
>>  > +                * is pretty safe to have max_send_wr * 4.  > 
>> +                */
>>  > +               ep->rep_attr.cap.max_send_wr *= 4;
>>
>> Seems like a bad design if there is a possibility of work queue
>> overflow; if you're counting on events occurring in a particular order
>> or completions being handled "fast enough", then your design is going to
>> fail in some high load situations, which I don't think you want.   
>
> Vu,
>
> Would you please try the following:
>
> - Set the multiplier to 5
While trying to test this between a Linux client and Solaris server,
I made the following changes in :
/usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c

diff verbs.c.org verbs.c
653c653
<               ep->rep_attr.cap.max_send_wr *= 3;
---
 >               ep->rep_attr.cap.max_send_wr *= 8;
685c685
<       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
---
 >       ep->rep_cqinit = ep->rep_attr.cap.max

(I bumped it to 8)

did make install. 

On reboot I see the errors on NFS READs as opposed to WRITEs
as seen before, when I try to read a 10G file from the server.

The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
The server is running Solaris based on snv_128.

rpcdebug output from the client:

==
RPC:    85 call_bind (status 0)
RPC:    85 call_connect xprt ec78d800 is connected
RPC:    85 call_transmit (status 0)
RPC:    85 xprt_prepare_transmit
RPC:    85 xprt_cwnd_limited cong = 0 cwnd = 8192
RPC:    85 rpc_xdr_encode (status 0)
RPC:    85 marshaling UNIX cred eddb4dc0
RPC:    85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
RPC:    85 xprt_transmit(164)
RPC:       rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 hdrlen 164
RPC:       rpcrdma_register_frmr_external: Using frmr ec7da920 to map 4 
segments
RPC:       rpcrdma_create_chunks: write chunk elem 
16384 at 0x38536d000:0xa601 (more)
RPC:       rpcrdma_register_frmr_external: Using frmr ec7da960 to map 1 
segments
RPC:       rpcrdma_create_chunks: write chunk elem 108 at 0x31dd153c:0xaa01 
(last)
RPC:       rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 padlen 
0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
RPC:    85 xmit complete
RPC:    85 sleep_on(queue "xprt_pending" time 4683109)
RPC:    85 added to queue ec78d994 "xprt_pending"
RPC:    85 setting alarm for 60000 ms
RPC:       wake_up_next(ec78d944 "xprt_resend")
RPC:       wake_up_next(ec78d8f4 "xprt_sending")
RPC:       rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep 
ec78db40
RPC:    85 __rpc_wake_up_task (now 4683110)
RPC:    85 disabling timer
RPC:    85 removed from queue ec78d994 "xprt_pending"
RPC:       __rpc_wake_up_task done
RPC:    85 __rpc_execute flags=0x1
RPC:    85 call_status (status -107)
RPC:    85 call_bind (status 0)
RPC:    85 call_connect xprt ec78d800 is not connected
RPC:    85 xprt_connect xprt ec78d800 is not connected
RPC:    85 sleep_on(queue "xprt_pending" time 4683110)
RPC:    85 added to queue ec78d994 "xprt_pending"
RPC:    85 setting alarm for 60000 ms
RPC:       rpcrdma_event_process: event rep ec116800 status 5 opcode 80 
length 2493606
RPC:       rpcrdma_event_process: recv WC status 5, connection lost
RPC:       rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep 
0xec78db40 event 0xa)
RPC:       rpcrdma_conn_upcall: disconnected
rpcrdma: connection to ec78dbccI4:20049 closed (-103)
RPC:       xprt_rdma_connect_worker: reconnect
==

On the server I see:

Mar  3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
hermon0: Device Error: CQE remote access error
Mar  3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
bad sendreply
Mar  3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE: 
hermon0: Device Error: CQE remote access error
Mar  3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS: 
bad sendreply

The remote access error is actually seen on RDMA_WRITE.
Doing some more debug on the server with DTrace, I see that
the destination address and length matches the write chunk
element in the Linux debug output above.


  0   9385                  rib_write:entry daddr 38536d000, len 4000, 
hdl a601
  0   9358         rib_init_sendwait:return ffffff44a715d308
  1   9296       rib_svc_scq_handler:return 1f7
  1   9356              rib_sendwait:return 14
  1   9386                 rib_write:return 14

^^^ that is RDMA_FAILED in 

  1  63295    xdrrdma_send_read_data:return 0
  1   5969              xdr_READ3res:return
  1   5969              xdr_READ3res:return 0

Is this a variation of the previously discussed issue or something new?

Thanks,
Mahesh

> - Set the number of buffer credits small as follows "echo 4 > 
> /proc/sys/sunrpc/rdma_slot_table_entries"
> - Rerun your test and see if you can reproduce the problem?
>
> I did the above and was unable to reproduce, but I would like to see 
> if you can to convince ourselves that 5 is the right number.
>
> Thanks,
> Tom
>
>>  - R.
>>   
>




More information about the ewg mailing list