[ofa-general] Re: got scheduling while atomic in ipoib (was net/bonding: announce fail-over for the active-backup mode)

Or Gerlitz ogerlitz at voltaire.com
Sun May 25 05:27:34 PDT 2008


> Enhance bonding to announce fail-over for the active-backup mode through
> the netdev events notifier chain mechanism. Such an event can be of use
> for the RDMA CM (communication manager) to let native RDMA ULPs (eg
> NFS-RDMA, iSER) always use the same links as the IP stack does.

> --- linux-2.6.26-rc2.orig/drivers/net/bonding/bond_main.c	2008-05-13 10:02:22.000000000 +0300
> +++ linux-2.6.26-rc2/drivers/net/bonding/bond_main.c	2008-05-15 12:29:44.000000000 +0300
> @@ -1117,6 +1117,7 @@ void bond_change_active_slave(struct bon
>  			bond->send_grat_arp = 1;
>  		} else
>  			bond_send_gratuitous_arp(bond);
> +		netdev_bonding_change(bond->dev);
>  	}
>  }

> --- linux-2.6.26-rc2.orig/net/core/dev.c	2008-05-13 10:02:31.000000000 +0300
> +++ linux-2.6.26-rc2/net/core/dev.c	2008-05-13 11:50:49.000000000 +0300
> @@ -956,6 +956,12 @@ void netdev_state_change(struct net_devi
>  	}
>  }
>
> +void netdev_bonding_change(struct net_device *dev)
> +{
> +	call_netdevice_notifiers(NETDEV_BONDING_FAILOVER, dev);
> +}
> +EXPORT_SYMBOL(netdev_bonding_change);

Hi Roland,

I have enhanced the bonding driver to deliver event through the netdev
notifier chain and getting this "scheduling while atomic" warning.

The function __bond_mii_monitor does spin_lock_bh before calling bond_select_active_slave()
which calls bond_change_active_slave() so maybe its not a good idea to deliver event under
these atomic conditions, but I still want to make sure I didn't stepped on some problem in
ipoib (as of the :ib_ipoib:ipoib_start_xmit+0x445/0x459 line in the trace), any idea?

bonding: bond0: link status definitely down for interface ib0, disabling it
bonding: bond0: making interface ib1 the new active one.
BUG: scheduling while atomic: bond0/14237/0x10000100
Pid: 14237, comm: bond0 Not tainted 2.6.26-rc3 #4

Call Trace:
 [<ffffffff804777d7>] schedule+0x98/0x57b
 [<ffffffff80277836>] dbg_redzone1+0x16/0x1f
 [<ffffffffa0106f22>] :ib_ipoib:ipoib_start_xmit+0x445/0x459
 [<ffffffff802799c2>] kmem_cache_alloc_node+0x147/0x177
 [<ffffffff8040a939>] __alloc_skb+0x35/0x12b
 [<ffffffff8022c99b>] __cond_resched+0x1c/0x43
 [<ffffffff80477e11>] _cond_resched+0x2d/0x38
 [<ffffffff802798a0>] kmem_cache_alloc_node+0x25/0x177
 [<ffffffff8040a939>] __alloc_skb+0x35/0x12b
 [<ffffffff8041825e>] rtmsg_ifinfo+0x3a/0xd4
 [<ffffffff80418335>] rtnetlink_event+0x3d/0x41
 [<ffffffff8047b925>] notifier_call_chain+0x30/0x54
 [<ffffffffa00a3d4b>] :bonding:bond_select_active_slave+0xb9/0xe8
 [<ffffffffa00a495e>] :bonding:__bond_mii_monitor+0x43a/0x464
 [<ffffffffa00a49e6>] :bonding:bond_mii_monitor+0x5e/0xaa
 [<ffffffffa00a4988>] :bonding:bond_mii_monitor+0x0/0xaa
 [<ffffffff8023d6fa>] run_workqueue+0x7f/0x107
 [<ffffffff8023d782>] worker_thread+0x0/0xef
 [<ffffffff8023d867>] worker_thread+0xe5/0xef
 [<ffffffff8024088f>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8024088f>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8024055a>] kthread+0x3d/0x63
 [<ffffffff8020c068>] child_rip+0xa/0x12
 [<ffffffff8024051d>] kthread+0x0/0x63
 [<ffffffff8020c05e>] child_rip+0x0/0x12

eth2: no IPv6 routers present
bond0: no IPv6 routers present
end_request: I/O error, dev fd0, sector 0

Or.



More information about the general mailing list