[openib-general] [Bug 272] New: IPoIB: kernel Oops as a result of interface Up/Down

bugzilla-daemon at openib.org bugzilla-daemon at openib.org
Sun Oct 8 02:58:18 PDT 2006


http://openib.org/bugzilla/show_bug.cgi?id=272

           Summary: IPoIB: kernel Oops as a result of interface Up/Down
           Product: OpenFabrics Linux
           Version: 1.1rc7
          Platform: X86-64
        OS/Version: SLES 10
            Status: NEW
          Severity: normal
          Priority: P2
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: vlad at mellanox.co.il


Setup: 
Two nodes (node1 and node2) connected to the IB switch with both IB ports.
To reproduce:
IPoIB High Availability service is available on node2.
'/etc/init.d/opensmd restart' executed on node1 in infinite loop.
Then after ~ 6 hours the following kernel Oops received on node2:


kernel: ib0: dev_queue_xmit failed to requeue packet
kernel: NMI Watchdog detected LOCKUP on CPU 0
kernel: CPU 0
kernel: Modules linked in: mst_pciconf mst_pci rdma_ucm rdma_cm ib_addr ib_cm
ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core autofs4 ipv6 nfs lockd
nfs_acl sunrpc af_packet button battery ac apparmor aamatch_pcre loop
lug uhci_hcd ehci_hcd i2c_i801 i2c_core hw_random i8xx_tco tg3 usbcore ext3 jbd
edd fan thermal processor mptspi mptscsih mptbase scsi_transport_spi sg sr_mod
cdrom ata_piix libata sd_mod scsi_mod
kernel: Pid: 7307, comm: ib_mad2 Tainted: GU 2.6.16.21-0.8-smp #1
kernel: RIP: 0010:[<ffffffff802d011e>]
<ffffffff802d011e>{.text.lock.spinlock+34}
kernel: RSP: 0018:ffff81011683dbf0  EFLAGS: 00000086
kernel: RAX: 0000000000000092 RBX: ffff8100c8ede148 RCX: ffffffff883544ee
kernel: RDX: ffff8100c8ede0c0 RSI: 0000000000000000 RDI: ffff8100c8ede150
kernel: RBP: ffff81011683dc18 R08: ffff810113ade640 R09: ffff810113d16460
kernel: R10: 000000004523d9d2 R11: 000000000000206d R12: ffff8100c8ede150
kernel: R13: ffff81011683dc78 R14: ffff810121fce500 R15: 0000000000000286
kernel: FS:  0000000000000000(0000) GS:ffffffff80445000(0000)
knlGS:0000000000000000
kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
kernel: CR2: 00000000005e6288 CR3: 00000000ce84a000 CR4: 00000000000006e0
kernel: Process ib_mad2 (pid: 7307, threadinfo ffff81011683c000, task
ffff81012217e7d0)
kernel: Stack: ffffffff80129a15 ffff810121fce000 0000000000000000
ffff8100c8ede0c0
kernel:        ffff81011683dc78 00000000fffffffc ffffffff8835e593
ffff810121fce000
kernel:        ffff810121fce000 0000000000000000
kernel: Call Trace: <ffffffff80129a15>{complete+28}
<ffffffff8835e593>{:ib_ipoib:path_rec_completion+764}
kernel:        <ffffffff8027753e>{dev_queue_xmit+545}
<ffffffff8835e5e7>{:ib_ipoib:path_rec_completion+848}
kernel:        <ffffffff8835452e>{:ib_sa:ib_sa_path_rec_callback+64}
kernel:        <ffffffff80138f17>{lock_timer_base+27}
<ffffffff80138f89>{try_to_del_timer_sync+81}
kernel:        <ffffffff883542b3>{:ib_sa:send_handler+72}
<ffffffff8822562f>{:ib_mad:ib_mad_complete_send_wr+421}
kernel:        <ffffffff88225f00>{:ib_mad:ib_mad_completion_handler+947}
kernel:        <ffffffff88225b4d>{:ib_mad:ib_mad_completion_handler+0}
kernel:        <ffffffff80140177>{run_workqueue+153}
<ffffffff8014081e>{worker_thread+0}
kernel:        <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80140927>{worker_thread+265}
kernel:        <ffffffff8012787f>{__wake_up_common+62}
<ffffffff8012905a>{default_wake_function+0}
kernel:        <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80143aca>{kthread+236}
kernel:        <ffffffff8010b60a>{child_rip+8}
<ffffffff801437e5>{keventd_create_kthread+0}
kernel:        <ffffffff801439de>{kthread+0} <ffffffff8010b602>{child_rip+0}
kernel:
kernel: Code: 83 3f 00 7e f9 e9 c2 fe ff ff f3 90 83 3f 00 7e f9 e9 c1 fe
kernel: console shuts up ...
kernel:  NMI Watchdog detected LOCKUP on CPU 2
kernel: CPU 2
kernel: Modules linked in: mst_pciconf mst_pci rdma_ucm rdma_cm ib_addr ib_cm
ib_ipoib ib_sa ib_uverbs ib_umad ib_mthca ib_mad ib_core autofs4 ipv6 nfs lockd
nfs_acl sunrpc af_packet button battery ac apparmor aamatch_pcre loop
lug uhci_hcd ehci_hcd i2c_i801 i2c_core hw_random i8xx_tco tg3 usbcore ext3 jbd
edd fan thermal processor mptspi mptscsih mptbase scsi_transport_spi sg sr_mod
cdrom ata_piix libata sd_mod scsi_mod
kernel: Pid: 7336, comm: ipoib Tainted: G     U 2.6.16.21-0.8-smp #1
kernel: RIP: 0010:[<ffffffff802d012d>]
<ffffffff802d012d>{.text.lock.spinlock+49}
kernel: RSP: 0018:ffff810113acfd80  EFLAGS: 00000086
kernel: RAX: 0000000000000000 RBX: ffff810121fce000 RCX: 0000000000000000
kernel: RDX: 0000000000000002 RSI: ffff810121fce5e0 RDI: ffff810121fce500
kernel: RBP: ffff810121fce500 R08: 0000000000000000 R09: ffff810121fce000
kernel: R10: ffff8101235add9f R11: 0000000000000286 R12: ffff8100c8ede0c0
kernel: R13: 0000000000000000 R14: ffff810121fce500 R15: ffff810121fce000
kernel: FS:  0000000000000000(0000) GS:ffff810123e2b3c0(0000)
knlGS:0000000000000000
kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
kernel: CR2: 0000000000591cf0 CR3: 00000000c0e01000 CR4: 00000000000006e0
kernel: Process ipoib (pid: 7336, threadinfo ffff810113ace000, task
ffff810123042040)
kernel: Stack: ffffffff88361212 ffff810100e18780 ffff810113acfd60
ffff810113acfd60
kernel:        ffffffff88361807 0000000000000000 0000000000000246
0000ffff1b4012ff
kernel:        0100000000000000 ffff810113acfdc8
kernel: Call Trace: <ffffffff88361212>{:ib_ipoib:ipoib_mcast_start_thread+109}
kernel:        <ffffffff88361807>{:ib_ipoib:ipoib_mcast_restart_task+965}
kernel:        <ffffffff8835f982>{:ib_ipoib:ipoib_ib_dev_flush+0}
<ffffffff8835fa20>{:ib_ipoib:ipoib_ib_dev_flush+158}
kernel:        <ffffffff80140177>{run_workqueue+153}
<ffffffff8014081e>{worker_thread+0}
kernel:        <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80140927>{worker_thread+265}
kernel:        <ffffffff8012787f>{__wake_up_common+62}
<ffffffff8012905a>{default_wake_function+0}
kernel:        <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff801437e5>{keventd_create_kthread+0}
kernel:        <ffffffff80143aca>{kthread+236} <ffffffff8010b60a>{child_rip+8}
kernel:        <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff801439de>{kthread+0}
kernel:        <ffffffff8010b602>{child_rip+0}
kernel:
kernel: Code: 7e f9 e9 c1 fe ff ff f3 90 83 3f 00 7e f9 e9 d2 fe ff ff e8
kernel: console shuts up ...




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the general mailing list