[openib-general] [PATCH] IB/ipoib: fix crash on mcast join finish

Or Gerlitz ogerlitz at voltaire.com
Mon Jul 24 00:00:01 PDT 2006


Roland,

This crash happens 1:1 with setting ipoib_debug_mcast, the fix applied by the
patch below is to set mcast->ah before the debug code attempts to access it.

This is 2.6.18 material, correct? also since the crash does not allow
to debug mcast, i guess the fix needs to go into OFED 1.1 as well.

Or.

ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
ib0: Created ah ffff81002cdb1c00
Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
 [<ffffffff880adbf5>] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af
PGD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ib_ipoib ib_sa autofs nfs lockd sunrpc sg st sd_mod sr_mod scsi_mod ib_mthca ib_mad ib_core i2c_amd8111 i2c_amd756 i2c_core e100
Pid: 3919, comm: ib_mad1 Not tainted 2.6.18-rc1 #6
RIP: 0010:[<ffffffff880adbf5>]  [<ffffffff880adbf5>] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af
RSP: 0018:ffff81003eba9ca0  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff81000ddcce80 RCX: 0000000000000012
RDX: 00000000000000ff RSI: ffff810031f63000 RDI: ffffffff880b087b
RBP: ffff8100050d61c0 R08: 0000000000000040 R09: 000000000000001b
R10: 0000000000000000 R11: 0000000000000000 R12: ffff810031f63500
R13: ffff810031f63000 R14: ffff81003eba9d58 R15: ffff81001a947000
FS:  00002aad0e23a0a0(0000) GS:ffff81003f8b8f40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006e0
Process ib_mad1 (pid: 3919, threadinfo ffff81003eba8000, task ffff81001e204a70)
Stack:  000000000000c000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 000101030000c000 0000ffff1b4012ff
 ffffffff00000000 0000000000000000 000101030000c000 00000000000000ff
Call Trace:
 [<ffffffff880ae622>] :ib_ipoib:ipoib_mcast_join_complete+0x9d/0x24c
 [<ffffffff880a4aad>] :ib_sa:ib_sa_mcmember_rec_callback+0x40/0x49
 [<ffffffff880a43b3>] :ib_sa:recv_handler+0x3a/0x43
 [<ffffffff8802f48b>] :ib_mad:ib_mad_completion_handler+0x3ac/0x592
 [<ffffffff8023fc47>] run_workqueue+0x9a/0xea
 [<ffffffff8023fdfd>] worker_thread+0x108/0x13a
 [<ffffffff80243244>] kthread+0xc9/0xf2
 [<ffffffff8020a592>] child_rip+0x8/0x12

Code: ff 70 08 0f b6 43 0f 50 0f b6 43 0e 50 0f b6 43 0d 50 0f b6
RIP  [<ffffffff880adbf5>] :ib_ipoib:ipoib_mcast_join_finish+0x273/0x3af
 RSP <ffff81003eba9ca0>
CR2: 0000000000000008

----------------------------------------------------------------------

set mcast->ah before the ipoib mcast debug code attempts to access it

Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>

--- linux-2.6.18-rc1-orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-07-18 12:53:33.000000000 +0300
+++ linux-2.6.18-rc1/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2006-07-24 09:50:37.000000000 +0300
@@ -264,6 +264,9 @@ static int ipoib_mcast_join_finish(struc
 		if (!ah) {
 			ipoib_warn(priv, "ib_address_create failed\n");
 		} else {
+			spin_lock_irq(&priv->lock);
+			mcast->ah = ah;
+			spin_unlock_irq(&priv->lock);
 			ipoib_dbg_mcast(priv, "MGID " IPOIB_GID_FMT
 					" AV %p, LID 0x%04x, SL %d\n",
 					IPOIB_GID_ARG(mcast->mcmember.mgid),
@@ -271,10 +274,6 @@ static int ipoib_mcast_join_finish(struc
 					be16_to_cpu(mcast->mcmember.mlid),
 					mcast->mcmember.sl);
 		}
-
-		spin_lock_irq(&priv->lock);
-		mcast->ah = ah;
-		spin_unlock_irq(&priv->lock);
 	}

 	/* actually send any queued packets */




More information about the general mailing list