[ofa-general] Re: IPoIB kernel Oops -- possible race condition identified.

Jack Morgenstein jackm at dev.mellanox.co.il
Tue Jan 27 01:07:02 PST 2009


On Monday 26 January 2009 19:00, Yossi Etigin wrote:
>  There's a patch of mine in OFED that's probably exposing a bug in ipoib.
> The bug is that priv->broadcast can be NULL-ified and join_task does not
> protect the check with the spinlock.
> The patch may expose the bug because it uses rtnl_lock().
>  However, in 2.6.28 kernel there's another version of this patch which does not
> take rtnl_lock, so the problem still exists but is probably much harder to reproduce.
> 
I'm using code IDENTICAL to the 2.6.28 code, except for ipoib_warn and ipoib_dbg_mcast
message formatting.  The 2.6.28 code is below. I've indicated the hole where priv->broadcast
may be set to NULL by another kernel thread. Below, at the end of this post, I have included
a suggested patch which fixes the problem.

- Jack
====================
2.6.28 snippet of ipoib_mcast_join_task:

	while (1) {identical 
		struct ipoib_mcast *mcast = NULL;

		spin_lock_irq(&priv->lock);
		list_for_each_entry(mcast, &priv->multicast_list, list) {
			if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)
			    && !test_bit(IPOIB_MCAST_FLAG_BUSY, &mcast->flags)
			    && !test_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
				/* Found the next unjoined group */
				break;
			}
		}
		spin_unlock_irq(&priv->lock);

		if (&mcast->list == &priv->multicast_list) {
			/* All done */
			break;
		}

		ipoib_mcast_join(dev, mcast, 1);
		return;
	}
==>***** priv->broadcast MAY BE NULLED OUT HERE! **************<======
	priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));

	if (!ipoib_cm_admin_enabled(dev)) {
		rtnl_lock();
		dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
		rtnl_unlock();
	}

=================================
The following patch can fix the problem:

ipoib: fix unprotected use of priv->broadcast in ipoib_mcast_join_task.

There is a race whereby the ipoib broadcast pointer may be set to NULL by flush while the join
task is being started.  This protects the broadcast pointer access via a spinlock.  If the
pointer is indeed NULL, we set the mcast_mtu value to the current admin_mtu value -- since
it does not matter anyway, the I/F is going down.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2009-01-27 10:48:07.399491000 +0200
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2009-01-26 18:17:07.000000000 +0200
@@ -581,7 +593,12 @@ void ipoib_mcast_join_task(struct work_s
 		return;
 	}
 
-	priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
+	spin_lock_irq(&priv->lock);
+	if (priv->broadcast)
+		priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
+	else
+		priv->mcast_mtu = priv->admin_mtu;
+	spin_unlock_irq(&priv->lock);
 
 	if (!ipoib_cm_admin_enabled(dev)) {
 		rtnl_lock();




More information about the general mailing list