[ofa-general] IBV_WC_RETRY_EXC_ERR causes
Dotan Barak
dotanba at gmail.com
Sat Jun 21 06:23:33 PDT 2008
Krishnamoorthy, Sriram wrote:
>> IBV_WC_RETRY_EXE_ERR means that there wasn't any ack by the receiver
>>
> after 4.096*(2 >
>
>> power 18) * 7 usec.
>>
> Does an ack from the receiver require the process/thread to be awake? I
> have been trying to get a small test case, and sleeping without posting
> enough recv-s seems to occasionally result in IBV_WC_RETRY_EXC_ERR
> (instead of IBV_WC_RNR_RETRY_EXC_ERR which occurs a lot more often, and
> of course with much smaller timeout, retry_count, and rnr_retry_count).
>
No, the ack is being handled in the HCA level (unless, the transport of
the IB is being implemented in SW...)
If putting sleep in your code causes IBV_WC_RETRY_EXC_ERR i would
suspect SW bugs ...
>> It can happen because of several reasons:
>> 1) bad QP attributes
>> 2) the remote side wasn't exists or it is in bad state
>> 3) rare, but congestion in the network can causes this too
>>
>
>
>> 7 means infinite retry only for RNR flow, for retry flow 7 is the
>>
> number of time of the
>
>> retransmission.
>>
>
>
>> How do you connect the both sides?
>> maybe the sender send messages to QP wasn't transfered to (at least)
>>
> RTR state?
>
> All queue pairs are transitioned into RTS state before any
> communication. All queue pairs are transitioned to RTR state, then there
> is an MPI barrier (which could be using its own queue pairs or sockets),
> and then all queue pairs are transitioned into RTS state.
>
Good, i was afraid from any race when one side start to send messages
and the other side wasn't in RTR state.
> All error messages out of verbs API are checked. Is it possible for a
> queue pair to transition into an error state and it is identified first
> as an IBV_WC_RETRY_EXC_ERR and not as a local error?
>
Theoretically: no.
Is this is the first message that is being passed in those QPs?
Can you check the QP state of the remote side when you get such an error?
> Thanks,
> Sriram.K
>
You are welcome
Dotan
More information about the general
mailing list