[ofa-general] madeye kernel oops

Ami Perlmutter amip at dev.mellanox.co.il
Thu Mar 29 00:32:34 PDT 2007


On Wed, 2007-03-28 at 14:15 -0500, Hal Rosenstock wrote:
> On Wed, 2007-03-28 at 02:15, Ami Perlmutter wrote:
> > On Tue, 2007-03-27 at 12:03 -0700, Sean Hefty wrote:
> > > How easily can you reproduce this?  I'm assuming that this is with OFED 1.2 on
> > > 2.6.20, correct?
> > yes
> > > Can you describe what you were doing when this crash occurred?
> > opensm was running on the other computer
> > running SDP programs
> 
> So the node which oops'd was only running madeye and some SDP data
> transfer ?
I was using madeye to debug mad loses in SDP connect.
so other that CM mads there was no data being sent by SDP
> Can you be more specific about the failure scenario ? What was going on
> on the node which failed ? It looks like you were removing madeye. Was
> this the first time ? Anything else going on ?
when I tried to remove the module, the node was not running anything.
opensm was running on the other machine.
the oops happend when I tried to remove madeye in order to reset the
driver.
this oops happend more than once, but not every time I removed the
module.
> Thanks.
> 
> -- Hal
> 
> > > Thanks,
> > > Sean
> > > 
> > > >Unable to handle kernel NULL pointer dereference at 0000000000000038
> > > >RIP:
> > > > [<ffffffff8801021f>] :ib_mad:ib_unregister_mad_agent+0x11/0x480
> > > >PGD 73387067 PUD 72844067 PMD 0
> > > >Oops: 0000 [1] SMP
> > > >CPU 0
> > > >Modules linked in: ib_madeye i2c_dev i2c_core ib_sdp rdma_cm iw_cm
> > > >ib_addr ib_local_sa ib_uverbs ib_umad ib_mthca ib_ipoib ib_cm ib_sa
> > > >ib_mad ib_core
> > > >Pid: 8917, comm: rmmod Not tainted 2.6.20 #1
> > > >RIP: 0010:[<ffffffff8801021f>]
> > > >[<ffffffff8801021f>] :ib_mad:ib_unregister_mad_agent+0x11/0x480
> > > >RSP: 0000:ffff810071ee1e08  EFLAGS: 00010292
> > > >RAX: 0000000000000000 RBX: 0000000000000020 RCX: 000000000000003f
> > > >RDX: ffff810077ebd6c0 RSI: 0000000000000202 RDI: 0000000000000000
> > > >RBP: 0000000000000000 R08: ffff810077ebd728 R09: 0000000000000003
> > > >R10: 0000000000000000 R11: 0000000000000000 R12: ffff8100766c33c0
> > > >R13: 0000000000000002 R14: 0000000000000880 R15: 0000000000503010
> > > >FS:  00002b3d6689fb00(0000) GS:ffffffff80702000(0000)
> > > >knlGS:0000000000000000
> > > >CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > >CR2: 0000000000000038 CR3: 0000000071086000 CR4: 00000000000006e0
> > > >Process rmmod (pid: 8917, threadinfo ffff810071ee0000, task
> > > >ffff8100781aeee0)
> > > >Stack:  ffff810071ee1e18 ffffffff8022b92f ffff810071ee1e28
> > > >ffffffff80538b43
> > > > ffff810071ee1ea8 ffffffff80538ea2 ffffffff80690880 ffff810071ee1e78
> > > > 000000000000000f 0000000000000020 0000000000000002 ffff8100766c33c0
> > > >Call Trace:
> > > > [<ffffffff8022b92f>] __cond_resched+0x1c/0x44
> > > > [<ffffffff80538b43>] cond_resched+0x2e/0x39
> > > > [<ffffffff80538ea2>] wait_for_completion+0x1a/0xd0
> > > > [<ffffffff88093cf2>] :ib_madeye:madeye_remove_one+0x56/0x88
> > > > [<ffffffff880041aa>] :ib_core:ib_unregister_client+0x40/0xe2
> > > > [<ffffffff8024ae86>] sys_delete_module+0x1b4/0x1e5
> > > > [<ffffffff80340065>] add_uevent_var+0x40/0xe3
> > > > [<ffffffff8026613f>] sys_munmap+0x4b/0x58
> > > > [<ffffffff8020959e>] system_call+0x7e/0x83
> > > >
> > > >
> > > >Code: 83 7f 38 00 0f 84 fd 03 00 00 48 8d 44 24 20 4c 8d 67 f0 48
> > > >RIP  [<ffffffff8801021f>] :ib_mad:ib_unregister_mad_agent+0x11/0x480
> > > > RSP <ffff810071ee1e08>
> > > >CR2: 0000000000000038
> > 
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> > 
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> 




More information about the general mailing list