[ofa-general] [PATCH] IPoIB: Test for NULL broadcast object in opiob_mcast_join_finish.
Jack Morgenstein
jackm at dev.mellanox.co.il
Mon May 19 07:03:05 PDT 2008
IPoIB: "join finish" occurring just after device was flushed caused Oops.
ipoib_mcast_join_finish() processing could conceivably occur just after
ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer
is NULL). This patch tests for and fixes this case.
Signed-off-by: Jack Morgenstein <jackm at dev.mellanox.co.il>
---
Roland,
We encountered this problem in our regression testing (kernel Oops).
(bugzilla bug 1040). The test randomly causes the HCA physical port to go
down then up. We then have a situation where a "flush" could occur while
IPoIB mcast initialization was still in progress.
Index: ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- ofed_kernel.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-05-19 15:48:17.000000000 +0300
+++ ofed_kernel/drivers/infiniband/ulp/ipoib/ipoib_multicast.c 2008-05-19 16:07:52.723294000 +0300
@@ -194,7 +194,13 @@ static int ipoib_mcast_join_finish(struc
/* Set the cached Q_Key before we attach if it's the broadcast group */
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
+ spin_lock_irq(&priv->lock);
+ if (!priv->broadcast) {
+ spin_unlock_irq(&priv->lock);
+ return -EAGAIN;
+ }
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
+ spin_unlock_irq(&priv->lock);
priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
}
-------------------------------------------------------
More information about the general
mailing list