<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2654.45">
<TITLE>oops on module teardown (was Re: recursion depth exceeded in ipoib_workqueue )</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=2 FACE="Courier New">I tested out your recursion patch on SVN 3487, and it works. However, while testing it out, I got the kernel Oops described below (while unloading the driver). Looks like a race condition (Note that this is in the send-timeout flow) .</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">From disassembly of ib_ipoib.ko (no line-debug info unfortunately), failure is at address 5360:</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> 534c: 48 89 95 b0 00 00 00 mov %rdx,0xb0(%rbp)</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> 5353: f0 ff 0d 00 00 00 00 lock decl 0(%rip) # 535a <ipoib_mcast_join_complete+0x1fa></FONT>
<BR><FONT SIZE=2 FACE="Courier New"> 535a: 0f 88 d9 03 00 00 js 5739 <.text.lock.ipoib_multicast+0x50></FONT>
<BR><FONT SIZE=2 FACE="Courier New"> 5360: 41 8b 45 10 mov 0x10(%r13),%eax</FONT>
<BR><FONT SIZE=2 FACE="Courier New"> 5364: a8 20 test $0x20,%al</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">I traced the source code to ipoib_multicast.c:434 ( in ipoib_mcast_join_complete):</FONT>
<BR> <FONT SIZE=2 FACE="Courier New">if (test_bit(IPOIB_MCAST_RUN, &priv->flags)) </FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">The dereference failure is in trying to dereference "priv->flags". (dereferencing priv->flags is the code at address 5360).</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">"priv" here is "netdev_priv(dev)", implying that "netdev_priv(dev)" is no longer valid and returns garbage. This garbage gets dereferenced.</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">environment:</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Host 1 Port 1 connected back-to-back to Host 2 Port 1.</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Host 1: while date; do /etc/init.d/openibd start ; /etc/init.d/openibd stop ; done</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Host 2: runs opensm.</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Jack</FONT>
<BR><FONT SIZE=2 FACE="Courier New">================================================================================================================</FONT>
</P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Unable to handle kernel NULL pointer dereference at 0000000000000390 RIP:</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff8807a360>{:ib_ipoib:ipoib_mcast_join_complete+512}</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: PGD 777d2067 PUD 773ca067 PMD 0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Oops: 0000 [1] SMP</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: CPU 0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Modules linked in: ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core video1394 ohci1394 raw1394 ieee1394</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Pid: 11302, comm: ib_mad2 Not tainted 2.6.13</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RIP: 0010:[<ffffffff8807a360>] <ffffffff8807a360>{:ib_ipoib:ipoib_mcast_join_complete+512}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RSP: 0018:ffff810055bc1d38 EFLAGS: 00010247</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RAX: 0000000000000000 RBX: ffffffff8807e000 RCX: ffffffff88070e10</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8807e000</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RBP: ffff810053b10880 R08: ffff810055bc0000 R09: 0000000000000000</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: R10: 00000000ffffffff R11: ffffffff8055f320 R12: 00000000ffffff92</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: R13: 0000000000000380 R14: ffff81007e409a78 R15: ffffffff88042bd0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: FS: 00002aaaab15db00(0000) GS:ffffffff805d4800(0000) knlGS:0000000056729bb0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: CR2: 0000000000000390 CR3: 00000000777d3000 CR4: 00000000000006e0</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Process ib_mad2 (pid: 11302, threadinfo ffff810055bc0000, task ffff810054734830)</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Stack: ffff81007a8324c0 ffff810054734830 ffffffff805dffb0 ffffffff803f8855</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: ffff810055bc1e58 0000000000000296 ffff810054982f90 00000000ffffff92</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: ffff81007e409a10 ffffffff88070e5c</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Call Trace:<ffffffff803f8855>{thread_return+0} <ffffffff88070e5c>{:ib_sa:ib_sa_mcmember_rec_callback+76}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff8807060c>{:ib_sa:send_handler+156} <ffffffff88042d4e>{:ib_mad:timeout_sends+382}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff80132ca3>{__wake_up+67} <ffffffff80147e7e>{worker_thread+478}</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff80132210>{default_wake_function+0} <ffffffff8012f793>{__wake_up_common+67}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff80132210>{default_wake_function+0} <ffffffff8014c3d0>{keventd_create_kthread+0}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff80147ca0>{worker_thread+0} <ffffffff8014c3d0>{keventd_create_kthread+0}</FONT></P>
<P><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff8014c529>{kthread+217} <ffffffff8010e50e>{child_rip+8}</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff8014c3d0>{keventd_create_kthread+0} <ffffffff8014c450>{kthread+0}</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: <ffffffff8010e506>{child_rip+0}</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel:</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: Code: 41 8b 45 10 a8 20 74 3e 41 83 fc 92 75 15 48 8b 3d cb 46 00</FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: RIP <ffffffff8807a360>{:ib_ipoib:ipoib_mcast_join_complete+512} RSP <ffff810055bc1d38></FONT>
<BR><FONT SIZE=2 FACE="Courier New">Sep 20 12:05:30 swlab163 kernel: CR2: 0000000000000390</FONT>
</P>
</BODY>
</HTML>