[ofa-general] IBV_WC_RETRY_EXC_ERR causes
Dotan Barak
dotanba at gmail.com
Fri Jun 20 03:49:23 PDT 2008
Krishnamoorthy, Sriram wrote:
>
> Can someone please explain what can cause IBV_WC_RETRY_EXC_ERR? I am
> using a combination of send-receive and RDMA. I have the reliable
> connection queue pairs initialized as:
>
IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver
after 4.096*(2 power 18) * 7 usec.
It can happen because of several reasons:
1) bad QP attributes
2) the remote side wasn't exists or it is in bad state
3) rare, but congestion in the network can causes this too
>
> qp_attr.timeout = 18;
> qp_attr.retry_cnt = 7;
> qp_attr.rnr_retry = 7;
>
> From the documentation, I assumed a value of 7 meant infinite retry.
> Can lack of receive buffers cause this error? I understand
> IBV_WC_RNR_RETRY_EXC_ERR to be the error caused by lack of receive
> buffers. Could it be congestion in the network?
>
7 means infinite retry only for RNR flow, for retry flow 7 is the number
of time of the retransmission.
>
> I could not find much from earlier queries related to this error. It
> often occurs in the middle of the computation on large (>=1024)
> processor counts, when I try to have multiple outstanding send-recvs
> between a pair of processes (each pair of processes has a RC queue
> pair initialized). I do not have a small test case yet that can repeat
> this error.
>
How do you connect the both sides?
maybe the sender send messages to QP wasn't transfered to (at least) RTR
state?
Dotan
More information about the general
mailing list