[openib-general] Re: LLTX and netif_stop_queue

David S. Miller davem at davemloft.net
Fri Dec 17 21:44:32 PST 2004


On Fri, 17 Dec 2004 13:57:40 -0800
Roland Dreier <roland at topspin.com> wrote:

> While testing my IP-over-InfiniBand driver, I discovered that if a net
> device sets NETIF_F_LLTX, it seems the device's hard_start_xmit method
> can be called even after a netif_stop_queue().
> 
> This is because in the LLTX case, qdisc_restart() holds no locks while
> calling hard_start_xmit, so something like the following can happen:

 ...
>  	if (TX_BUFFS_AVAIL(gp) <= (skb_shinfo(skb)->nr_frags + 1)) {
>  		netif_stop_queue(dev);
>  		spin_unlock_irqrestore(&gp->tx_lock, flags);
> -		printk(KERN_ERR PFX "%s: BUG! Tx Ring full when queue awake!\n",
> -		       dev->name);
>  		return NETDEV_TX_BUSY;
>  	}

I understand the bug, but we're not going to fix it this way.
This is a crucial invariant that we need to check for because it
indicates a pretty serious state error except in this bug case
you've discovered.

Perhaps one way to fix this is to add a pointer to a spinlock to
the netdev struct, and have hold that the upper level grab that
when NETIF_F_LLTX when doing queue state checks.  Actually, that
could end up being racy too.

If we can't find a good fix for this, besides removing the necessary
debugging message, we might have to pull NETIF_F_LLTX out or disable it
temporarily until we figure out a way.



More information about the general mailing list