[openib-general] Re: LLTX and netif_stop_queue

Sat Dec 18 07:35:49 PST 2004

    David> I understand the bug, but we're not going to fix it this
    David> way.  This is a crucial invariant that we need to check for
    David> because it indicates a pretty serious state error except in
    David> this bug case you've discovered.

OK, that's fair enough.  I was pretty bummed myself when I realized
hard_start_xmit was getting called even after I stopped the queue.
Thanks for confirming my analysis.

    David> Perhaps one way to fix this is to add a pointer to a
    David> spinlock to the netdev struct, and have hold that the upper
    David> level grab that when NETIF_F_LLTX when doing queue state
    David> checks.  Actually, that could end up being racy too.

I may be missing something, but it seems to me that we get all of the
benefits of LLTX by just documenting that device drivers can use the
xmit_lock member of struct net_device to synchronize other parts of
the driver against hard_start_xmit.  I guess the driver also should
set xmit_lock_owner to -1 after it acquires xmit_lock.

This would mean that NETIF_F_LLTX can go away, and LLTX drivers would
just replace their use of their own private tx_lock with xmit_lock
(except in hard_start_xmit, where the upper layer already holds xmit_lock).

By the way, am I correct in thinking that the use of xmit_lock_owner
in qdisc_restart() is racy?

    if (!spin_trylock(&dev->xmit_lock)) {
    /* get the lock */

                                            if (!spin_trylock(&dev->xmit_lock)) {
                                            /* fail */
                                                if (dev->xmit_lock_owner == smp_processor_id()) {
                                                /* test the wrong value */

    /* set the value too late */
    dev->xmit_lock_owner = smp_processor_id();

I don't see a simple way to fix this unfortunately.

Thanks,
  Roland