[ofa-general] IPoIB-UD post_send failures (OFED 1.3)
akepner at sgi.com
akepner at sgi.com
Sat May 10 12:07:21 PDT 2008
On Thu, May 08, 2008 at 10:50:11AM -0700, Roland Dreier wrote:
> ...
> It might be useful to track the value of tx_outstanding... from a quick
> look at the code I can't see how the transmit queue could be awake when
> the UD send queue is full.
>
I haven't been able to get any new debug data (the only way we
know to reproduce this one is to use a pretty large system - a
scarce resource), but it does look like there's a hole here,
since ipoib_cm.c:ipoib_cm_send() and ipoib_ib.c:ipoib_send()
check on different conditions (off by one) to detect a full
queue.
ipoib_cm.c:ipoib_cm_send() does:
if (++priv->tx_outstanding == ipoib_sendq_size)
netif_stop_queue(dev);
but ipoib_ib.c:ipoib_send() does:
if (++priv->tx_outstanding == (ipoib_sendq_size - 1)) {
netif_stop_queue(dev);
So a call to ipoib_cm_send() with tx_outstanding = (ipoib_sendq_size - 2),
followed by a call to ipoib_send() would get to a situation where
the queue was full, but not stopped.
I'm not saying this is what's happening for us (just dunno yet)
but it looks possible.
--
Arthur
More information about the general
mailing list