[openib-general] [Bug 263] New: OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop

bugzilla-daemon at openib.org bugzilla-daemon at openib.org
Tue Oct 3 22:47:54 PDT 2006


http://openib.org/bugzilla/show_bug.cgi?id=263

           Summary: OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop
           Product: OpenFabrics Linux
           Version: 1.1rc6
          Platform: X86-64
        OS/Version: SLES 10
            Status: NEW
          Severity: major
          Priority: P2
         Component: IPoIB
        AssignedTo: bugzilla at openib.org
        ReportedBy: sweitzen at cisco.com


SLES10 x86_64 with dual-port LionCub HCA.

I am looping a script that turns off and back on IB ports on a Cisco IB switch
such that there will be IPoIB failover every 30 seconds on a host, and I'm
running IPoIB traffic on that host too.

If I fail back and forth between ib0 and ib1 every 30 seconds or so for several
hours, while IPoIB traffic is running, IPoIB host gets an Oops: and IPoIB stops
working.

ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
general protection fault: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 7
Modules linked in: af_packet ib_sdp rdma_ucm rdma_cm ib_addr ib_cm ib_ipoib
ib_s
a ib_uverbs ib_umad ib_mthca ib_mad ib_core nls_utf8 st ipv6 nfs lockd nfs_acl
s
unrpc button battery ac apparmor aamatch_pcre loop usbhid dm_mod hw_random
ide_c
d ehci_hcd uhci_hcd cdrom i8xx_tco ide_floppy usbcore shpchp e1000 pci_hotplug
f
loppy reiserfs edd fan thermal processor siimage sg mptspi mptscsih mptbase
scsi
_transport_spi piix sd_mod scsi_mod ide_disk ide_core
Pid: 23541, comm: ib_mad1 Tainted: G     U 2.6.16.21-0.8-smp #1
RIP: 0010:[<ffffffff802cffea>] <ffffffff802cffea>{_spin_lock_irqsave+3}
RSP: 0018:ffff810132a4fc20  EFLAGS: 00010086
RAX: 0000000000000286 RBX: 0000000000000000 RCX: ffffffff883324ee
RDX: ffff810128d5e380 RSI: 0000000000000000 RDI: 0000ffff1b6017ff
RBP: 00000000fffffffc R08: ffffffff803d3260 R09: ffff810140333800
R10: ffff81000107d400 R11: 0000000000000292 R12: ffff810128d5e380
R13: ffff810132a4fc78 R14: 0000ffff1b6017ff R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff810142d19740(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b0b5e6ae180 CR3: 0000000128cbc000 CR4: 00000000000006e0
Process ib_mad1 (pid: 23541, threadinfo ffff810132a4e000, task
ffff810142b56100)
Stack: ffffffff8833c5f5 ffff8101302b3000 0000ffff1b6012ff 0000000000000002
       0000000000000296 ffff8101302b3500 ffffffff8027753e ffff810128d5e3a0
       ffff81012bce1680 ffff810128d5e380
Call Trace: <ffffffff8833c5f5>{:ib_ipoib:path_rec_completion+862}
       <ffffffff8027753e>{dev_queue_xmit+545}
<ffffffff8833c5b2>{:ib_ipoib:path_
rec_completion+795}
       <ffffffff8833252e>{:ib_sa:ib_sa_path_rec_callback+64}
       <ffffffff80138f17>{lock_timer_base+27}
<ffffffff80138f89>{try_to_del_time
r_sync+81}
       <ffffffff883322b3>{:ib_sa:send_handler+72}
<ffffffff8826762f>{:ib_mad:ib_
mad_complete_send_wr+421}
       <ffffffff88267f00>{:ib_mad:ib_mad_completion_handler+947}
       <ffffffff88267b4d>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff80140177>{run_workqueue+153}
<ffffffff8014081e>{worker_thread+0}
       <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80140927>{worker_th
read+265}
       <ffffffff8012787f>{__wake_up_common+62}
<ffffffff8012905a>{default_wake_f
unction+0}
       <ffffffff801437e5>{keventd_create_kthread+0}
<ffffffff80143aca>{kthread+2
36}
       <ffffffff8010b60a>{child_rip+8}
<ffffffff801437e5>{keventd_create_kthread
+0}
       <ffffffff801439de>{kthread+0} <ffffffff8010b602>{child_rip+0}

Code: f0 ff 0f 0f 88 29 01 00 00 c3 fa f0 ff 0f 0f 88 2a 01 00 00
RIP <ffffffff802cffea>{_spin_lock_irqsave+3} RSP <ffff810132a4fc20>




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the general mailing list