[ofa-general] missed cq event

Steve Wise swise at opengridcomputing.com
Tue Jun 10 08:21:40 PDT 2008


Philip Frey1 wrote:
>
> Steve, thanks for your advice.
>
> Is it possible that there is a bug in OFED 1.3 with regard to 
> non-signaled send work requests?
> I noticed that when I post send work requests onto my send queue, It 
> eventually fills up until I
> cannot post sends anymore.
> This happens with the Chelsio T3 RNIC and OFED 1.3 whenever I post 
> send WR's that have
> their flags set to 0. It does not happen though when I post sends with 
> IBV_SEND_SIGNALED.
> The CQ is empty in the case of non-signaled WR's (as expected) but 
> they somehow seem to
> be stuck on the send queue.
>
> I use the following code:
> static struct ibv_send_wr tx_wr, *bad_wr;
>
> /* create send work request */
> tx_wr.wr_id = tx_wr_id++;
> tx_wr.next = NULL;
> tx_wr.sg_list = sg_list;
> tx_wr.num_sge = num_sge;
> tx_wr.opcode = IBV_WR_SEND;
> tx_wr.send_flags = 0;
>        
> /* post send work request */
> ret = ibv_post_send(qp, &tx_wr, &bad_wr);
> if (ret) {
>         //error
> }
>
> I learned that it might be necessary to post a signaled send WR after 
> posting a number of non-signaled
> ones in order to clean up the SQ. Is that the case and is there no way 
> to post non-signaled WR's that
> do not get stuck on the SQ?
>

That is the case.

You must post at least one signaled WR every SQ-depth's worth of posts 
to force the SQ to be cleaned up.  From the driver's perspective, it 
cannot tell whether an unsignaled WR completed successfully until a 
subsequent signaled work request completes.  This boils down to a 
requirement that you post at least one signaled WR before filling up the SQ.


Steve.




More information about the general mailing list