[openib-general] Kernel Oops related to IPoIB (multicast module?)
Jack Morgenstein
jackm at mellanox.co.il
Mon Jun 26 00:51:12 PDT 2006
Problem in main trunk (SVN 8189):
The following Oops occurred upon unloading the openib driver. I unloaded the driver immediately following a reboot
(the driver had been loaded during the boot sequence). I did NOT run opensm before unloading the driver.
Evidently, ipoib was still attempting to connect with an SA, when the ipoib module was unloaded (modprobe -r).
After the ipoib module was unloaded (or at least rendered inaccessible), the ib_sa module attempted to invoke
"ib_sa_mcmember_rec_callback" (for a callback address that was part of the unloaded ipoib module). Hence, the Oops
below.
The "modprobe" process in the trace below is "modprobe -r ib_sa" (After unloading ib_ipoib, we attempt to unload ib_sa).
Following the Oops, I've included info on the running environment.
Jack
===============================================
Jun 26 10:19:56 sw134 ifdown: ib0 device: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
Jun 26 10:19:58 sw134 kernel: Unable to handle kernel paging request at ffffffff883219dd RIP:
Jun 26 10:19:58 sw134 kernel: [<ffffffff883219dd>]
Jun 26 10:19:58 sw134 kernel: PGD 103027 PUD 105027 PMD 7bd53067 PTE 0
Jun 26 10:19:58 sw134 kernel: Oops: 0010 [1] SMP
Jun 26 10:19:58 sw134 kernel: last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Jun 26 10:19:58 sw134 kernel: CPU 2
Jun 26 10:19:58 sw134 kernel: Modules linked in: autofs4 ipv6 ib_sa ib_uverbs ib_umad nfs lockd nfs_acl sunrpc ib_mthca ib_mad ib_core af_
packet button battery ac apparmor aamatch_pcre loop dm_mod hw_random shpchp ehci_hcd uhci_hcd i8xx_tco usbcore pci_hotplug e1000 i2c_i801
i2c_core ide_cd cdrom floppy ext3 jbd sg edd fan thermal processor ata_piix libata piix sd_mod scsi_mod ide_disk ide_core
Jun 26 10:19:58 sw134 kernel: Pid: 4457, comm: modprobe Tainted: G U 2.6.16.16-1.6-smp #1
Jun 26 10:19:58 sw134 kernel: RIP: 0010:[<ffffffff883219dd>] [<ffffffff883219dd>]
Jun 26 10:19:58 sw134 kernel: RSP: 0018:ffff81007163dd90 EFLAGS: 00010246
Jun 26 10:19:58 sw134 kernel: RAX: 0000000000000005 RBX: ffff81007d78be00 RCX: ffffffff8831747f
Jun 26 10:19:58 sw134 kernel: RDX: ffff81007dec3000 RSI: 0000000000000000 RDI: 00000000fffffffc
Jun 26 10:19:58 sw134 kernel: RBP: ffff810079960fd0 R08: 0000000000000206 R09: 0000000000000002
Jun 26 10:19:58 sw134 kernel: R10: ffff810001029400 R11: 0000000000000000 R12: 00000000fffffffc
Jun 26 10:19:58 sw134 kernel: R13: 0000000000000000 R14: 00000000005182a8 R15: 0000000000000000
Jun 26 10:19:58 sw134 kernel: FS: 00002ba7037ef6d0(0000) GS:ffff81007e3ab340(0000) knlGS:0000000000000000
Jun 26 10:19:58 sw134 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 26 10:19:58 sw134 kernel: CR2: ffffffff883219dd CR3: 0000000072da0000 CR4: 00000000000006e0
Jun 26 10:19:58 sw134 ifdown: ib0
Jun 26 10:19:58 sw134 kernel: Process modprobe (pid: 4457, threadinfo ffff81007163c000, task ffff81006fcb7040)
Jun 26 10:19:58 sw134 ifdown: Interface not available and no configuration found.
Jun 26 10:19:58 sw134 kernel: Stack: ffffffff883174bf 0000000000000bd4 000000027163de78 ffff81007163de80
Jun 26 10:19:58 sw134 kernel: ffff81007163de78 ffff81007d810790 ffff81007163de68 0000000000000001
Jun 26 10:19:59 sw134 kernel: 0000000000000000 ffff81007d78be00
Jun 26 10:19:59 sw134 kernel: Call Trace: <ffffffff883174bf>{:ib_sa:ib_sa_mcmember_rec_callback+64}
Jun 26 10:19:59 sw134 kernel: <ffffffff883172ae>{:ib_sa:send_handler+72} <ffffffff8824e387>{:ib_mad:ib_unregister_mad_agent+345}
Jun 26 10:19:59 sw134 kernel: <ffffffff802cdb65>{wait_for_completion+155} <ffffffff801e86af>{find_next_bit+85}
Jun 26 10:19:59 sw134 kernel: <ffffffff8831703a>{:ib_sa:ib_sa_remove_one+58} <ffffffff8823b2b9>{:ib_core:ib_unregister_client+47}
Jun 26 10:19:59 sw134 kernel: <ffffffff88317df8>{:ib_sa:ib_sa_cleanup+16} <ffffffff8014a9d8>{sys_delete_module+540}
Jun 26 10:19:59 sw134 kernel: <ffffffff80167ccc>{do_munmap+619} <ffffffff801e6fe3>{__up_write+33}
Jun 26 10:19:59 sw134 kernel: <ffffffff8010a7be>{system_call+126}
Jun 26 10:19:59 sw134 kernel:
Jun 26 10:19:59 sw134 kernel: Code: Bad RIP value.
Jun 26 10:19:59 sw134 kernel: RIP [<ffffffff883219dd>] RSP <ffff81007163dd90>
Jun 26 10:19:59 sw134 kernel: CR2: ffffffff883219dd
Jun 26 10:20:01 sw134 /usr/sbin/cron[4615]: (root) CMD (/mswg/projects/test_suite2/etc/check_daemon.csh >/dev/null)
===================================
Host information given below:
*************************************************************
Host Architecture : x86_64
Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
Kernel Version : 2.6.16.16-1.6-smp
Memory size : 2060956 kB
Driver Version : openib_gen2-20060625-1800 (REV=8189)
HCA ID(s) : mthca0
HCA model(s) : 25204
FW version(s) : 1.0.800
Board(s) : MT_0230000001
*************************************************************
More information about the general
mailing list