[openib-general] Re: LLTX and netif_stop_queue
Roland Dreier
roland at topspin.com
Sat Dec 18 07:35:49 PST 2004
David> I understand the bug, but we're not going to fix it this
David> way. This is a crucial invariant that we need to check for
David> because it indicates a pretty serious state error except in
David> this bug case you've discovered.
OK, that's fair enough. I was pretty bummed myself when I realized
hard_start_xmit was getting called even after I stopped the queue.
Thanks for confirming my analysis.
David> Perhaps one way to fix this is to add a pointer to a
David> spinlock to the netdev struct, and have hold that the upper
David> level grab that when NETIF_F_LLTX when doing queue state
David> checks. Actually, that could end up being racy too.
I may be missing something, but it seems to me that we get all of the
benefits of LLTX by just documenting that device drivers can use the
xmit_lock member of struct net_device to synchronize other parts of
the driver against hard_start_xmit. I guess the driver also should
set xmit_lock_owner to -1 after it acquires xmit_lock.
This would mean that NETIF_F_LLTX can go away, and LLTX drivers would
just replace their use of their own private tx_lock with xmit_lock
(except in hard_start_xmit, where the upper layer already holds xmit_lock).
By the way, am I correct in thinking that the use of xmit_lock_owner
in qdisc_restart() is racy?
if (!spin_trylock(&dev->xmit_lock)) {
/* get the lock */
if (!spin_trylock(&dev->xmit_lock)) {
/* fail */
if (dev->xmit_lock_owner == smp_processor_id()) {
/* test the wrong value */
/* set the value too late */
dev->xmit_lock_owner = smp_processor_id();
I don't see a simple way to fix this unfortunately.
Thanks,
Roland
More information about the general
mailing list