[openib-general] Re: Re: [PATCH] ipoib: pkt_queue

Michael S. Tsirkin mst at mellanox.co.il
Tue Jan 17 10:10:48 PST 2006


Quoting r. Michael S. Tsirkin <mst at mellanox.co.il>:
> Subject: Re: Re: [PATCH] ipoib: pkt_queue
> 
> Quoting r. Roland Dreier <rdreier at cisco.com>:
> > Subject: [openib-general] Re: [PATCH] ipoib: pkt_queue
> > 
> >     Michael> We seem to be doing operations that cant be called under
> >     Michael> tx_lock a few lines below.
> > 
> > Do you mean dev_queue_xmit()?  Can that call directly back into our
> > xmit function (I honestly don't know the locking rules here)?
> 
> Yes, exactly.

Here it is from net/core/dev.c

/**
 *	dev_queue_xmit - transmit a buffer
 *	@skb: buffer to transmit
 *
 *	Queue a buffer for transmission to a network device. The caller must
 *	have set the device and priority and built the buffer before calling
 *	this function. The function can be called from an interrupt.
 *
 *	A negative errno code is returned on a failure. A success does not
 *	guarantee the frame will be transmitted as it may be dropped due
 *	to congestion or traffic shaping.
 *
 * -----------------------------------------------------------------------------------
 *      I notice this method can also return errors from the queue disciplines,
 *      including NET_XMIT_DROP, which is a positive value.  So, errors can also
 *      be positive.
 *
 *      Regardless of the return value, the skb is consumed, so it is currently
 *      difficult to retry a send to this method.  (You can bump the ref count
 *      before sending to hold a reference for retry if you are careful.)
 *
 *      When calling this method, interrupts MUST be enabled.  This is because
 *      the BH enable code must have IRQs enabled so that it will not deadlock.
 *          --BLG
 */

int dev_queue_xmit(struct sk_buff *skb)
{
	struct net_device *dev = skb->dev;
	struct Qdisc *q;
	int rc = -ENOMEM;

	if (skb_shinfo(skb)->frag_list &&
	    !(dev->features & NETIF_F_FRAGLIST) &&
	    __skb_linearize(skb, GFP_ATOMIC))
		goto out_kfree_skb;

	/* Fragmented skb is linearized if device does not support SG,
	 * or if at least one of fragments is in highmem and device
	 * does not support DMA from it.
	 */
	if (skb_shinfo(skb)->nr_frags &&
	    (!(dev->features & NETIF_F_SG) || illegal_highdma(dev, skb)) &&
	    __skb_linearize(skb, GFP_ATOMIC))
		goto out_kfree_skb;

	/* If packet is not checksummed and device does not support
	 * checksumming for this protocol, complete checksumming here.
	 */
	if (skb->ip_summed == CHECKSUM_HW &&
	    (!(dev->features & (NETIF_F_HW_CSUM | NETIF_F_NO_CSUM)) &&
	     (!(dev->features & NETIF_F_IP_CSUM) ||
	      skb->protocol != htons(ETH_P_IP))))
	      	if (skb_checksum_help(skb, 0))
	      		goto out_kfree_skb;

	spin_lock_prefetch(&dev->queue_lock);

	/* Disable soft irqs for various locks below. Also 
	 * stops preemption for RCU. 
	 */
	local_bh_disable(); 

	/* Updates of qdisc are serialized by queue_lock. 
	 * The struct Qdisc which is pointed to by qdisc is now a 
	 * rcu structure - it may be accessed without acquiring 
	 * a lock (but the structure may be stale.) The freeing of the
	 * qdisc will be deferred until it's known that there are no 
	 * more references to it.
	 * 
	 * If the qdisc has an enqueue function, we still need to 
	 * hold the queue_lock before calling it, since queue_lock
	 * also serializes access to the device queue.
	 */

	q = rcu_dereference(dev->qdisc);
#ifdef CONFIG_NET_CLS_ACT
	skb->tc_verd = SET_TC_AT(skb->tc_verd,AT_EGRESS);
#endif
	if (q->enqueue) {
		/* Grab device queue */
		spin_lock(&dev->queue_lock);

		rc = q->enqueue(skb, q);

		qdisc_run(dev);

		spin_unlock(&dev->queue_lock);
		rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc;
		goto out;
	}

	/* The device has no queue. Common case for software devices:
	   loopback, all the sorts of tunnels...

	   Really, it is unlikely that xmit_lock protection is necessary here.
	   (f.e. loopback and IP tunnels are clean ignoring statistics
	   counters.)
	   However, it is possible, that they rely on protection
	   made by us here.

	   Check this and shot the lock. It is not prone from deadlocks.
	   Either shot noqueue qdisc, it is even simpler 8)
	 */
	if (dev->flags & IFF_UP) {
		int cpu = smp_processor_id(); /* ok because BHs are off */

		if (dev->xmit_lock_owner != cpu) {

			HARD_TX_LOCK(dev, cpu);

			if (!netif_queue_stopped(dev)) {
				if (netdev_nit)
					dev_queue_xmit_nit(skb, dev);

				rc = 0;
				if (!dev->hard_start_xmit(skb, dev)) {
					HARD_TX_UNLOCK(dev);
					goto out;
				}
			}
			HARD_TX_UNLOCK(dev);
			if (net_ratelimit())
				printk(KERN_CRIT "Virtual device %s asks to "
				       "queue packet!\n", dev->name);
		} else {
			/* Recursion is detected! It is possible,
			 * unfortunately */
			if (net_ratelimit())
				printk(KERN_CRIT "Dead loop on virtual device "
				       "%s, fix it urgently!\n", dev->name);
		}
	}

	rc = -ENETDOWN;
	local_bh_enable();

out_kfree_skb:
	kfree_skb(skb);
	return rc;
out:
	local_bh_enable();
	return rc;
}

-- 
MST



More information about the general mailing list