[openib-general] [PATCH] CQ rearm race in IPoIB

ralphc at pathscale.com ralphc at pathscale.com
Fri Mar 17 14:50:44 PST 2006


I'm trying to fix a problem I'm seeing with IPoIB somehow delaying
processing of packets and sometimes timing out for fragment
reassembly.  I was looking at this bit of code and remembering
that the IB spec. says the rearm isn't reliable unless you
call poll afterwards and verify that no new entries have been
added.  I haven't verified that this fixes my problem,
just visual inspection based on the standard as I was reading
the code.


>     ralphc> There is a race in IPoIB where it rearms the CQ by calling
>     ralphc> ib_req_notify_cq() followed by ib_poll_cq().  The loop can
>     ralphc> call ib_poll_cq() multiple times if new packets arrive but
>     ralphc> then fail to rearm the CQ properly.
>
> I don't understand the race you're trying to fix.  Right now, the CQ
> polling code is:
>
> 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
> 	do {
> 		n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc);
> 		for (i = 0; i < n; ++i)
> 			ipoib_ib_handle_wc(dev, priv->ibwc + i);
> 	} while (n == IPOIB_NUM_WC);
>
> The call to ib_req_notify_cq() will result in an event if any entries
> are added to the CQ after the call, so I don't see any way that IPoIB
> can miss a CQ event.  New entries can indeed arrive and be polled
> during the do { } while() loop, but the effect of that is just an
> extra CQ event, which might lead to an extra interrupt and polling an
> empty CQ once.
>
> Can you give a little more detail about the problem you're fixing?
>
>  - R.
>





More information about the general mailing list