[ofa-general] IPoIB-UD post_send failures (OFED 1.3)

akepner at sgi.com akepner at sgi.com
Sat May 10 12:07:21 PDT 2008


On Thu, May 08, 2008 at 10:50:11AM -0700, Roland Dreier wrote:
> ...
> It might be useful to track the value of tx_outstanding... from a quick
> look at the code I can't see how the transmit queue could be awake when
> the UD send queue is full.
> 

I haven't been able to get any new debug data (the only way we 
know to reproduce this one is to use a pretty large system - a 
scarce resource), but it does look like there's a hole here, 
since ipoib_cm.c:ipoib_cm_send() and ipoib_ib.c:ipoib_send() 
check on different conditions (off by one) to detect a full 
queue. 

ipoib_cm.c:ipoib_cm_send() does:
        if (++priv->tx_outstanding == ipoib_sendq_size)
                netif_stop_queue(dev);

but ipoib_ib.c:ipoib_send() does:
        if (++priv->tx_outstanding == (ipoib_sendq_size - 1)) {
                netif_stop_queue(dev);

So a call to ipoib_cm_send() with tx_outstanding = (ipoib_sendq_size - 2), 
followed by a call to ipoib_send() would get to a situation where 
the queue was full, but not stopped.

I'm not saying this is what's happening for us (just dunno yet) 
but it looks possible. 

-- 
Arthur




More information about the general mailing list