[ofa-general] NetEffect, iw_nes and kernel warning

Roland Dreier rdreier at cisco.com
Tue Jan 27 15:53:16 PST 2009


Interesting... looks like an unfortunate interaction with unclear
locking rules.  See below for full explanation.

BTW, what workload are you running to hit this?

I assume you have CONFIG_HIGHMEM set?

 > WARNING: at kernel/softirq.c:136 local_bh_enable+0x9b/0xa0()

I assume this is

	WARN_ON_ONCE(in_irq() || irqs_disabled());

The interesting parts of the stack trace seem to be (reversing the order
so the story makes sense):

 [<e8e3f815>] nes_netdev_start_xmit+0x815/0x8a0 [iw_nes]

nes_netdev_start_xmit() calls skb_linearize() for nonlinear skbs it
can't handle, which calls __pskb_pull_tail():

 [<c048982c>] __pskb_pull_tail+0x5c/0x2e0

__pskb_pull_tail() calls skb_copy_bits():

 [<c0489c05>] skb_copy_bits+0x155/0x290

At least in some cases, skb_copy_bits() calls kmap_skb_frag() and more
to the point kunmap_skb_frag(), which looks like:

	static inline void kunmap_skb_frag(void *vaddr)
	{
		kunmap_atomic(vaddr, KM_SKB_DATA_SOFTIRQ);
	#ifdef CONFIG_HIGHMEM
		local_bh_enable();
	#endif
	}

which leads to:

 [<c012a79b>] local_bh_enable+0x9b/0xa0

which hits the irqs_disabled() warning because iw_nes is using LLTX, and
nes_netdev_start_xmit() does:

	local_irq_save(flags);
	if (!spin_trylock(&nesnic->sq_lock)) {

at the very beginning.

The best solution is probably for iw_nes to stop using LLTX and use the
main netdev lock... but actually I still don't see how it's safe for a
net driver to call skb_linearize() from its transmit routine, since
there's a chance that that will unconditionally enable BHs?

 - R.



More information about the general mailing list