[ofa-general] RDMA Write Error

Dotan Barak dotanba at gmail.com
Wed Aug 6 07:41:55 PDT 2008


Philip Frey1 wrote:
> Hi,
>
> I am trying to figure out how efficient MR registration followed by an 
> RDMA write is.
> For that matter I am running the following loop:
>
> // create MR of size 64KB
>
> for (i = 0; i < max_writes; i++) {
>
>     // destroy old MR
>
>     // create MR of size 64KB
>
>     // RDMA write from new MR to some remote buffer
>
> }
>
>
> At some point (varying) I get the following error:
>
> iwch_ev_dispatch - CQE Err qpid 0x3d00 opcode 0 status 0x1 type 1 
> wrid.hi 0xb3 wrid.lo 0x0
> post_qp_event - AE qpid 0x3d00 opcode 0 status 0x1 type 1 wrid.hi 0xb3 
> wrid.lo 0x0
>
> ...which basically tells me that the egress (type 1) RDMA write 
> (opcode 0) has failed du to an invaild STag
> (status 0x1 = STAG invalid: either the STAG is offlimit, being 0 or 
> STAG_key mismatch).
>
> The error occurs at ibv_post_send().
>
> Here is a trace of the WRs posted shortly before the 'crash':
>
> wr_id=178
> loc_addr=0x2aaaab64f010
> loc_len=65536
> lkey=4552191
> num_sge=1
> rem_addr=0x2aaaab5d0010
> rkey=1459967
>
> wr_id=179
> loc_addr=0x2aaaab65f010
> loc_len=65536
> lkey=4555263
> num_sge=1
> rem_addr=0x2aaaab5e0010
> rkey=1459967
>
> ASYNC_EVENT: [QP] Local access violation error
> wr_id=180
> loc_addr=0x2aaaab66f010
> loc_len=65536
> lkey=4555519
> num_sge=1
> rem_addr=0x2aaaab5f0010
> rkey=1459967
> ERROR: [rdma_write] failed to post rdma write wr
> ERROR:  rdma write (180/1000) failed
>
>
> Do you have any idea what could be happening here? I noticed that if I 
> do signaled writes and wait for each
> individual completion, this does not happen. It is also not an issue 
> when posting RDMA writes of size 32KB.
> When using 64KB or larger this happens... but why? I assume that as 
> soon as ibv_reg_mr() returns I am free
> to use the MR, right?
Yes.

Do you post RDMA Write and wait for that completion BEFORE deregistering 
the MR that reference to this MR?

Dotan
>
> Many thanks for your advice,
>  Phil
> ------------------------------------------------------------------------
>
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list