[openib-general] ipoib use of multicast module on trunk causes kernel oops on 2.6.16

Michael S. Tsirkin mst at mellanox.co.il
Wed May 24 06:37:28 PDT 2006


Hi!
Looks like moving ipoib to the new multicast module caused some instability.
See below.

----- Forwarded message from Ali Ayoub <ali at mellanox.co.il> -----

> -----Original Message-----
> From: Michael S. Tsirkin
> To: Ali Ayoub
> 
> Quoting r. Ali Ayoub <ali at mellanox.co.il>:
> > Subject: [gen2 trunk] kernel oops on 2.6.16
> >
> > The last trunk build causes kernel oops on 2.6.16 while restarting the driver.
> >
> > (the previous build -rev 7422- works fine)
> >
> >
> >
> > May 24 16:00:40 sw037 kernel: Unable to handle kernel paging request at
> ffffffff8804bb17 RIP:
> > May 24 16:00:40 sw037 kernel: [<ffffffff8804bb17>]
> > May 24 16:00:40 sw037 kernel: PGD 103027 PUD 105027 PMD 17f69d067 PTE 0
> > May 24 16:00:40 sw037 kernel: Oops: 0000 [1] SMP
> > May 24 16:00:40 sw037 kernel: CPU 1
> > May 24 16:00:40 sw037 kernel: Modules linked in: ib_sa ib_uverbs ib_umad
> ib_mthca ib_mad ib_core
> > May 24 16:00:40 sw037 kernel: Pid: 4355, comm: modprobe Not tainted
> 2.6.16 #9
> > May 24 16:00:40 sw037 kernel: RIP: 0010:[<ffffffff8804bb17>]
> [<ffffffff8804bb17>]
> > May 24 16:00:40 sw037 kernel: RSP: 0000:ffff810179ca7d40  EFLAGS:
> 00010246
> > May 24 16:00:40 sw037 kernel: RAX: 0000000000000005 RBX:
> ffff810179ca7df0 RCX: ffffffff88045b49
> > May 24 16:00:40 sw037 kernel: RDX: ffff81017c0f1760 RSI:
> 0000000000000000 RDI: 00000000fffffffc
> > May 24 16:00:40 sw037 kernel: RBP: ffff810179ca7da8 R08:
> ffff81017a83fb68 R09: ffff81017c54cc40
> > May 24 16:00:40 sw037 kernel: R10: ffff81017c8c4848 R11:
> 0000000000000020 R12: ffff81017e21a3b8
> > May 24 16:00:40 sw037 kernel: R13: 00000000fffffffc R14:
> 0000000000000000 R15: 0000000000000080
> > May 24 16:00:40 sw037 kernel: FS:  0000000000000000(0000)
> GS:ffff81017fc772a8(0000) knlGS:0000000000000000
> > May 24 16:00:40 sw037 ifdown: Interface not available and no
> configuration found.
> > May 24 16:00:40 sw037 kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> > May 24 16:00:40 sw037 kernel: CR2: ffffffff8804bb17 CR3:
> 000000017af3f000 CR4: 00000000000006e0
> > May 24 16:00:40 sw037 kernel: Process modprobe (pid: 4355, threadinfo
> ffff810179ca6000, task ffff81017f9209a0)
> > May 24 16:00:40 sw037 kernel: Stack: ffffffff88045b95 ffff81017c54cc38
> ffff81017c54c000 ffff810179ca7d98
> > May 24 16:00:40 sw037 kernel:        ffffffff80173b26 ffff810179ca7d98
> ffffffff801736ad 0000000000000000
> > May 24 16:00:40 sw037 kernel:        ffff81017fc00040 ffff81017c8c4840
> > May 24 16:00:40 sw037 kernel: Call Trace:
> <ffffffff88045b95>{:ib_sa:ib_sa_mcmember_rec_callback+76}
> > May 24 16:00:40 sw037 kernel:
> <ffffffff80173b26>{cache_free_debugcheck+568}
> <ffffffff801736ad>{poison_obj+58}
> > May 24 16:00:40 sw037 kernel:
> <ffffffff880454d3>{:ib_sa:send_handler+80}
> <ffffffff88012183>{:ib_mad:ib_unregister_mad_agent+359}
> > May 24 16:00:40 sw037 kernel:
> <ffffffff880450af>{:ib_sa:free_sm_ah+0}
> <ffffffff88045050>{:ib_sa:ib_sa_remove_one+80}
> > May 24 16:00:40 sw037 kernel:
> <ffffffff88004079>{:ib_core:ib_unregister_client+72}
> > May 24 16:00:40 sw037 kernel:
> <ffffffff88045d4c>{:ib_sa:ib_sa_cleanup+16}
> <ffffffff8014e12c>{sys_delete_module+513}
> > May 24 16:00:40 sw037 kernel:        <ffffffff8024908c>{__up_write+293}
> <ffffffff8010a8fe>{system_call+126}
> > May 24 16:00:40 sw037 kernel:
> > May 24 16:00:40 sw037 kernel: Code:  Bad RIP value.
> > May 24 16:00:40 sw037 kernel: RIP [<ffffffff8804bb17>] RSP
> <ffff810179ca7d40>
> > May 24 16:00:40 sw037 kernel: CR2: ffffffff8804bb17
> > May 24 16:00:40 sw037 kernel:  BUG: modprobe/4355, lock held at task
> exit time!
> > May 24 16:00:40 sw037 kernel:  [ffffffff8800c9e0] {device_mutex}
> > May 24 16:00:40 sw037 kernel: .. held by:          modprobe: 4355
> [ffff81017f9209a0, 118]
> > May 24 16:00:40 sw037 kernel: ... acquired at:
> ib_unregister_client+0x1a/0x108 [ib_core]
> > May 24 16:00:42 sw037 ifdown: Interface not available and no
> configuration found.
> > May 24 16:00:42 sw037 ifdown: Interface not available and no
> >
> >
> 
> Must be ipoib multicast change by Sean. Please try reverting r7401.

Right, reverting to 7401 solved the problem.

-- 
MST



More information about the general mailing list