[ofa-general] [PATCH] IPoIB: Test for NULL broadcast object in opiob_mcast_join_finish.

Jack Morgenstein jackm at dev.mellanox.co.il
Mon May 19 07:03:05 PDT 2008


IPoIB: "join finish" occurring just after device was flushed caused Oops.

ipoib_mcast_join_finish() processing could conceivably occur just after
ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer
is NULL).  This patch tests for and fixes this case.

Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>

---

Roland,

We encountered this problem in our regression testing (kernel Oops).
(bugzilla bug 1040). The test randomly causes the HCA physical port to go
down then up. We then have a situation where a "flush" could occur while
IPoIB mcast initialization was still in progress.

Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2008-05-19 15:48:17.000000000 +0300
+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2008-05-19 16:07:52.723294000 +0300
@@ -194,7 +194,13 @@ static int ipoib_mcast_join_finish(struc
 	/* Set the cached Q_Key before we attach if it's the broadcast group */
 	if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
 		    sizeof (union ib_gid))) {
+		spin_lock_irq(&priv->lock);
+		if (!priv->broadcast) {
+			spin_unlock_irq(&priv->lock);
+			return -EAGAIN;
+		}
 		priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
+		spin_unlock_irq(&priv->lock);
 		priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
 	}
 


-------------------------------------------------------



More information about the general mailing list