[ewg] nfsrdma fails to write big file,
Tom Tucker
tom at opengridcomputing.com
Wed Mar 3 14:52:03 PST 2010
Mahesh Siddheshwar wrote:
> Hi Tom, Vu,
>
> Tom Tucker wrote:
>> Roland Dreier wrote:
>>> > + /* > + * Add room for frmr
>>> register and invalidate WRs
>>> > + * Requests sometimes have two chunks, each chunk
>>> > + * requires to have different frmr. The safest
>>> > + * WRs required are max_send_wr * 6; however, we
>>> > + * get send completions and poll fast enough, it
>>> > + * is pretty safe to have max_send_wr * 4. >
>>> + */
>>> > + ep->rep_attr.cap.max_send_wr *= 4;
>>>
>>> Seems like a bad design if there is a possibility of work queue
>>> overflow; if you're counting on events occurring in a particular order
>>> or completions being handled "fast enough", then your design is
>>> going to
>>> fail in some high load situations, which I don't think you want.
>>
>> Vu,
>>
>> Would you please try the following:
>>
>> - Set the multiplier to 5
> While trying to test this between a Linux client and Solaris server,
> I made the following changes in :
> /usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c
>
> diff verbs.c.org verbs.c
> 653c653
> < ep->rep_attr.cap.max_send_wr *= 3;
> ---
> > ep->rep_attr.cap.max_send_wr *= 8;
> 685c685
> < ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /* - 1*/;
> ---
> > ep->rep_cqinit = ep->rep_attr.cap.max
>
> (I bumped it to 8)
>
> did make install.
> On reboot I see the errors on NFS READs as opposed to WRITEs
> as seen before, when I try to read a 10G file from the server.
>
> The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
> OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
> HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
> The server is running Solaris based on snv_128.
>
> rpcdebug output from the client:
>
> ==
> RPC: 85 call_bind (status 0)
> RPC: 85 call_connect xprt ec78d800 is connected
> RPC: 85 call_transmit (status 0)
> RPC: 85 xprt_prepare_transmit
> RPC: 85 xprt_cwnd_limited cong = 0 cwnd = 8192
> RPC: 85 rpc_xdr_encode (status 0)
> RPC: 85 marshaling UNIX cred eddb4dc0
> RPC: 85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
> RPC: 85 xprt_transmit(164)
> RPC: rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164
> hdrlen 164
> RPC: rpcrdma_register_frmr_external: Using frmr ec7da920 to map
> 4 segments
> RPC: rpcrdma_create_chunks: write chunk elem
> 16384 at 0x38536d000:0xa601 (more)
> RPC: rpcrdma_register_frmr_external: Using frmr ec7da960 to map
> 1 segments
> RPC: rpcrdma_create_chunks: write chunk elem
> 108 at 0x31dd153c:0xaa01 (last)
> RPC: rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164
> padlen 0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
> RPC: 85 xmit complete
> RPC: 85 sleep_on(queue "xprt_pending" time 4683109)
> RPC: 85 added to queue ec78d994 "xprt_pending"
> RPC: 85 setting alarm for 60000 ms
> RPC: wake_up_next(ec78d944 "xprt_resend")
> RPC: wake_up_next(ec78d8f4 "xprt_sending")
> RPC: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0
> ep ec78db40
> RPC: 85 __rpc_wake_up_task (now 4683110)
> RPC: 85 disabling timer
> RPC: 85 removed from queue ec78d994 "xprt_pending"
> RPC: __rpc_wake_up_task done
> RPC: 85 __rpc_execute flags=0x1
> RPC: 85 call_status (status -107)
> RPC: 85 call_bind (status 0)
> RPC: 85 call_connect xprt ec78d800 is not connected
> RPC: 85 xprt_connect xprt ec78d800 is not connected
> RPC: 85 sleep_on(queue "xprt_pending" time 4683110)
> RPC: 85 added to queue ec78d994 "xprt_pending"
> RPC: 85 setting alarm for 60000 ms
> RPC: rpcrdma_event_process: event rep ec116800 status 5 opcode
> 80 length 2493606
> RPC: rpcrdma_event_process: recv WC status 5, connection lost
> RPC: rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep
> 0xec78db40 event 0xa)
> RPC: rpcrdma_conn_upcall: disconnected
> rpcrdma: connection to ec78dbccI4:20049 closed (-103)
> RPC: xprt_rdma_connect_worker: reconnect
> ==
>
> On the server I see:
>
> Mar 3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE:
> hermon0: Device Error: CQE remote access error
> Mar 3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS:
> bad sendreply
> Mar 3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE:
> hermon0: Device Error: CQE remote access error
> Mar 3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS:
> bad sendreply
>
> The remote access error is actually seen on RDMA_WRITE.
> Doing some more debug on the server with DTrace, I see that
> the destination address and length matches the write chunk
> element in the Linux debug output above.
>
>
> 0 9385 rib_write:entry daddr 38536d000, len 4000,
> hdl a601
> 0 9358 rib_init_sendwait:return ffffff44a715d308
> 1 9296 rib_svc_scq_handler:return 1f7
> 1 9356 rib_sendwait:return 14
> 1 9386 rib_write:return 14
>
> ^^^ that is RDMA_FAILED in
> 1 63295 xdrrdma_send_read_data:return 0
> 1 5969 xdr_READ3res:return
> 1 5969 xdr_READ3res:return 0
>
> Is this a variation of the previously discussed issue or something new?
>
I think this is new. This seems to be some kind of base/bounds or access
violation or perhaps an invalid rkey.
> Thanks,
> Mahesh
>
>> - Set the number of buffer credits small as follows "echo 4 >
>> /proc/sys/sunrpc/rdma_slot_table_entries"
>> - Rerun your test and see if you can reproduce the problem?
>>
>> I did the above and was unable to reproduce, but I would like to see
>> if you can to convince ourselves that 5 is the right number.
>>
>> Thanks,
>> Tom
>>
>>> - R.
>>>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the ewg
mailing list