[openib-general] Kernel Oops related to IPoIB (multicast module?)

Jack Morgenstein jackm at mellanox.co.il
Mon Jun 26 00:51:12 PDT 2006


Problem in main trunk (SVN 8189):

The following Oops occurred upon unloading the openib driver.  I unloaded the driver immediately following a reboot
(the driver had been loaded during the boot sequence).  I did NOT run opensm before unloading the driver.

Evidently, ipoib was still attempting to connect with an SA, when the ipoib module was unloaded (modprobe -r). 
After the ipoib module was unloaded (or at least rendered inaccessible), the ib_sa module attempted to invoke 
"ib_sa_mcmember_rec_callback" (for a callback address that was part of the unloaded ipoib module).  Hence, the Oops
below.

The "modprobe" process in the trace below is "modprobe -r ib_sa" (After unloading ib_ipoib, we attempt to unload ib_sa).
Following the Oops, I've included info on the running environment.

Jack

===============================================

Jun 26 10:19:56 sw134 ifdown:     ib0       device: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
Jun 26 10:19:58 sw134 kernel: Unable to handle kernel paging request at ffffffff883219dd RIP:
Jun 26 10:19:58 sw134 kernel: [<ffffffff883219dd>]
Jun 26 10:19:58 sw134 kernel: PGD 103027 PUD 105027 PMD 7bd53067 PTE 0
Jun 26 10:19:58 sw134 kernel: Oops: 0010 [1] SMP
Jun 26 10:19:58 sw134 kernel: last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
Jun 26 10:19:58 sw134 kernel: CPU 2
Jun 26 10:19:58 sw134 kernel: Modules linked in: autofs4 ipv6 ib_sa ib_uverbs ib_umad nfs lockd nfs_acl sunrpc ib_mthca ib_mad ib_core af_
packet button battery ac apparmor aamatch_pcre loop dm_mod hw_random shpchp ehci_hcd uhci_hcd i8xx_tco usbcore pci_hotplug e1000 i2c_i801
i2c_core ide_cd cdrom floppy ext3 jbd sg edd fan thermal processor ata_piix libata piix sd_mod scsi_mod ide_disk ide_core
Jun 26 10:19:58 sw134 kernel: Pid: 4457, comm: modprobe Tainted: G     U 2.6.16.16-1.6-smp #1
Jun 26 10:19:58 sw134 kernel: RIP: 0010:[<ffffffff883219dd>] [<ffffffff883219dd>]
Jun 26 10:19:58 sw134 kernel: RSP: 0018:ffff81007163dd90  EFLAGS: 00010246
Jun 26 10:19:58 sw134 kernel: RAX: 0000000000000005 RBX: ffff81007d78be00 RCX: ffffffff8831747f
Jun 26 10:19:58 sw134 kernel: RDX: ffff81007dec3000 RSI: 0000000000000000 RDI: 00000000fffffffc
Jun 26 10:19:58 sw134 kernel: RBP: ffff810079960fd0 R08: 0000000000000206 R09: 0000000000000002
Jun 26 10:19:58 sw134 kernel: R10: ffff810001029400 R11: 0000000000000000 R12: 00000000fffffffc
Jun 26 10:19:58 sw134 kernel: R13: 0000000000000000 R14: 00000000005182a8 R15: 0000000000000000
Jun 26 10:19:58 sw134 kernel: FS:  00002ba7037ef6d0(0000) GS:ffff81007e3ab340(0000) knlGS:0000000000000000
Jun 26 10:19:58 sw134 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jun 26 10:19:58 sw134 kernel: CR2: ffffffff883219dd CR3: 0000000072da0000 CR4: 00000000000006e0
Jun 26 10:19:58 sw134 ifdown:     ib0
Jun 26 10:19:58 sw134 kernel: Process modprobe (pid: 4457, threadinfo ffff81007163c000, task ffff81006fcb7040)
Jun 26 10:19:58 sw134 ifdown: Interface not available and no configuration found.
Jun 26 10:19:58 sw134 kernel: Stack: ffffffff883174bf 0000000000000bd4 000000027163de78 ffff81007163de80
Jun 26 10:19:58 sw134 kernel:        ffff81007163de78 ffff81007d810790 ffff81007163de68 0000000000000001
Jun 26 10:19:59 sw134 kernel:        0000000000000000 ffff81007d78be00
Jun 26 10:19:59 sw134 kernel: Call Trace: <ffffffff883174bf>{:ib_sa:ib_sa_mcmember_rec_callback+64}
Jun 26 10:19:59 sw134 kernel:        <ffffffff883172ae>{:ib_sa:send_handler+72} <ffffffff8824e387>{:ib_mad:ib_unregister_mad_agent+345}
Jun 26 10:19:59 sw134 kernel:        <ffffffff802cdb65>{wait_for_completion+155} <ffffffff801e86af>{find_next_bit+85}
Jun 26 10:19:59 sw134 kernel:        <ffffffff8831703a>{:ib_sa:ib_sa_remove_one+58} <ffffffff8823b2b9>{:ib_core:ib_unregister_client+47}
Jun 26 10:19:59 sw134 kernel:        <ffffffff88317df8>{:ib_sa:ib_sa_cleanup+16} <ffffffff8014a9d8>{sys_delete_module+540}
Jun 26 10:19:59 sw134 kernel:        <ffffffff80167ccc>{do_munmap+619} <ffffffff801e6fe3>{__up_write+33}
Jun 26 10:19:59 sw134 kernel:        <ffffffff8010a7be>{system_call+126}
Jun 26 10:19:59 sw134 kernel:
Jun 26 10:19:59 sw134 kernel: Code:  Bad RIP value.
Jun 26 10:19:59 sw134 kernel: RIP [<ffffffff883219dd>] RSP <ffff81007163dd90>
Jun 26 10:19:59 sw134 kernel: CR2: ffffffff883219dd
Jun 26 10:20:01 sw134 /usr/sbin/cron[4615]: (root) CMD (/mswg/projects/test_suite2/etc/check_daemon.csh >/dev/null)

===================================
Host information given below:
*************************************************************
Host Architecture : x86_64
Linux Distribution: SUSE Linux Enterprise Server 10 (x86_64) VERSION = 10
Kernel Version    : 2.6.16.16-1.6-smp
Memory size       : 2060956 kB
Driver Version    : openib_gen2-20060625-1800 (REV=8189)
HCA ID(s)         : mthca0
HCA model(s)      : 25204
FW version(s)     : 1.0.800
Board(s)          : MT_0230000001
*************************************************************




More information about the general mailing list