[openib-general] ibv_poll_cq

john t johnt1johnt2 at gmail.com
Wed Aug 30 23:26:01 PDT 2006


Hi Dotan

Is there a way to know if the two QPs (local and remote) are in sync or
to wait for them to get in sync and then do the data transfer.

I think in my case it is more like one QP is sending the message but the
other end (receiver) is not in RTR state at that time (since sender and
receiver are implemented as threads, may be receiver thread on the other
machine is getting scheduled very late).

Is there a way where I can specifiy infinite retry_count/timeout or find out
if remote QP is in RTR state (or error state) and only then do the actual
data tranfer.

Regards,
John


On 8/30/06, Dotan Barak <dotanb at dev.mellanox.co.il> wrote:
>
> Hi.
>
> john t wrote:
> > Hi,
> >
> > In one of my multi-threaded application (simple send/recv application
> > written using uverbs), I am repeatedly getting an error code 12
> > (IB_WC_RETRY_EXC_ERR) from "ibv_poll_cq". Not able to figure out
> > what is going wrong. Cam some one please give a suggestion so that I
> > can investigate on those lines.
> >
> > Also, is there an error handling mechanism in IB, for ex: in the above
> > case what should I do in order to correct the problem.
> This completion status means that the remote side of the QP is not
> sending any response (ack/nack/ anything ...)
> You can have this completion if one of the following scenarios occurs:
> * a QP tries to send a message to a remote QP which is not ready (not in
> at least RTR state)
> * a QP tries to send a message to a remote QP which is being closed (or
> in error state)
> * the QP parameters are not the same as the remote QP parameters (for
> example: if the PSNs are not configured with good values,
> the messages may be silently dropped)
>
>
> I suggest to:
>     sync between the 2 sides before starting to work with the QPs
>    sync between the 2 sides before stop to work with the QPs
>    You can increase the number of retry_cnt / timeout attributes in the
> QP context
>
> you should make sure that the timeout value is not 0 (Zero).
>
>
> Dotan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20060831/dd5d400c/attachment.html>


More information about the general mailing list