[ofa-general] IBV_WC_RETRY_EXC_ERR causes

Dotan Barak dotanba at gmail.com
Fri Jun 20 03:49:23 PDT 2008


Krishnamoorthy, Sriram wrote:
>
> Can someone please explain what can cause IBV_WC_RETRY_EXC_ERR? I am 
> using a combination of send-receive and RDMA. I have the reliable 
> connection queue pairs initialized as:
>
IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver 
after 4.096*(2 power 18) * 7 usec.
It can happen because of several reasons:
1) bad QP attributes
2) the remote side wasn't exists or it is in bad state
3) rare, but congestion in the network can causes this too
>
>     qp_attr.timeout             = 18;
>     qp_attr.retry_cnt           = 7;
>     qp_attr.rnr_retry           = 7;
>
> From the documentation, I assumed a value of 7 meant infinite retry. 
> Can lack of receive buffers cause this error? I understand 
> IBV_WC_RNR_RETRY_EXC_ERR to be the error caused by lack of receive 
> buffers. Could it be congestion in the network?
>
7 means infinite retry only for RNR flow, for retry flow 7 is the number 
of time of the retransmission.
>
> I could not find much from earlier queries related to this error.  It 
> often occurs in the middle of the computation on large (>=1024) 
> processor counts, when I try to have multiple outstanding send-recvs 
> between a pair of processes (each pair of processes has a RC queue 
> pair initialized). I do not have a small test case yet that can repeat 
> this error.
>

How do you connect the both sides?
maybe the sender send messages to QP wasn't transfered to (at least) RTR 
state?

Dotan




More information about the general mailing list