[ofa-general] missed cq event
Steve Wise
swise at opengridcomputing.com
Tue Jun 10 08:21:40 PDT 2008
Philip Frey1 wrote:
>
> Steve, thanks for your advice.
>
> Is it possible that there is a bug in OFED 1.3 with regard to
> non-signaled send work requests?
> I noticed that when I post send work requests onto my send queue, It
> eventually fills up until I
> cannot post sends anymore.
> This happens with the Chelsio T3 RNIC and OFED 1.3 whenever I post
> send WR's that have
> their flags set to 0. It does not happen though when I post sends with
> IBV_SEND_SIGNALED.
> The CQ is empty in the case of non-signaled WR's (as expected) but
> they somehow seem to
> be stuck on the send queue.
>
> I use the following code:
> static struct ibv_send_wr tx_wr, *bad_wr;
>
> /* create send work request */
> tx_wr.wr_id = tx_wr_id++;
> tx_wr.next = NULL;
> tx_wr.sg_list = sg_list;
> tx_wr.num_sge = num_sge;
> tx_wr.opcode = IBV_WR_SEND;
> tx_wr.send_flags = 0;
>
> /* post send work request */
> ret = ibv_post_send(qp, &tx_wr, &bad_wr);
> if (ret) {
> //error
> }
>
> I learned that it might be necessary to post a signaled send WR after
> posting a number of non-signaled
> ones in order to clean up the SQ. Is that the case and is there no way
> to post non-signaled WR's that
> do not get stuck on the SQ?
>
That is the case.
You must post at least one signaled WR every SQ-depth's worth of posts
to force the SQ to be cleaned up. From the driver's perspective, it
cannot tell whether an unsignaled WR completed successfully until a
subsequent signaled work request completes. This boils down to a
requirement that you post at least one signaled WR before filling up the SQ.
Steve.
More information about the general
mailing list