[openib-general] RNR_RETRY_EXC_ERR and completion opcode in "send_lat"
Sayantan Sur
surs at cse.ohio-state.edu
Sat Dec 2 13:34:56 PST 2006
Hi,
I have a question about the "status" field for a completion which is due
to RNR retry exceeded error. I trivially modified the `send_lat' program
(from the Gen2 perftest directory) to use SRQ and not post receives
after some specified time. Given the "rnr_retry" attribute of the QP not
to be 7 (infinite retry), I'm expecting the sender to get an erroneous
completion with IBV_WC_RNR_RETRY_EXC_ERR.
So far so good ... however, the completion I pull out of the send_cq,
lists the opcode of the completion to be IBV_WC_RECV! Is this expected?
I am using OFED 1.1 on dual Intel Xeon machines with Mellanox DDR HCAs
(two ports) and in MemFree mode. The distribution used is RH AS4 (Nahant
Update 3), with kernel version 2.6.17.7.
If someone could explain this behavior, or suggest a workaround, it'd be
great.
TIA,
Sayantan.
=======
<--Print out at client-->
Send Completion wth error at client:
wc.status 13, IBV_WC_RNR_RETRY_EXC_ERR 13, wc.opcode 128
Failed status 13: wr_id 1
scnt=26, rcnt=25, ccnt=0
<--Print out-->
<--Poll CQ snippet-->
/* poll on scq */
do {
ne = ibv_poll_cq(ctx->scq, 1, &wc);
} while (!user_param->use_event && ne < 1);
if (ne < 0) {
fprintf(stderr, "poll SCQ failed %d\n", ne);
return 12;
}
if (wc.status != IBV_WC_SUCCESS) {
fprintf(stderr, "Send Completion wth error at %s:\n",
user_param->servername ? "client" : "server");
fprintf(stderr, "wc.status %d, IBV_WC_RNR_RETRY_EXC_ERR
%d, wc.opcode %d\n",
wc.status, IBV_WC_RNR_RETRY_EXC_ERR, wc.opcode);
fprintf(stderr, "Failed status %d: wr_id %d\n",
wc.status, (int) wc.wr_id);
fprintf(stderr, "scnt=%d, rcnt=%d, ccnt=%d\n",
scnt, rcnt, ccnt);
{
...
<--Poll CQ snippet-->
--
http://www.cse.ohio-state.edu/~surs
More information about the general
mailing list