[openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop

Wed Oct 11 10:10:09 PDT 2006

http://openib.org/bugzilla/show_bug.cgi?id=263

sweitzen at cisco.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rolandd at cisco.com
         OS/Version|SLES 10                     |All

------- Comment #1 from sweitzen at cisco.com  2006-10-11 10:10 -------
I tried OFED 1.1 rc7 on RHEL4 U3 x86_64, using two hosts each with dual port
HCAs.  I am looping a script that turns off and back on IB ports on a Cisco IB
switchsuch that there will be IPoIB failover every 20 seconds on one of the 
hosts. I ran ping and netserver on host 1, and netperf on host2.  After a few
hours, host 1 gets an Oops

ib1: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
general protection fault: 0000 [1] SMP
CPU 1
Modules linked in: ib_sdp(U) rdma_ucm(U) rdma_cm(U) ib_addr(U)
ib_ipoib<7>Losing
 some ticks... checking if CPU frequency changed.
(U) ib_mthca(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U)
ib
_core(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd
nfs_a
cl sunrpc ds yenta_socket pcmcia_core dm_mirror dm_multipath dm_mod button
batte
ry ac uhci_hcd ehci_hcd hw_random shpchp e1000 floppy sg ext3 jbd aic79xx
sd_mod
 scsi_mod
Pid: 7155, comm: ib_mad1 Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffff8030596b>]
<ffffffff8030596b>{_spin_lock_irqsave+12}<4>warni
ng: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip mwait_idle+0x56/0x7c

RSP: 0018:00000101bccd1c58  EFLAGS: 00010086
RAX: 00000101bccd1cb8 RBX: 0000ffff1b60167f RCX: ffffffffa00e547d
RDX: dead4ead00000001 RSI: 0000000000000000 RDI: 0000ffff1b60167f
RBP: 00000101b9c0f480 R08: 0000000000000003 R09: 00000101b9c0f4a0
R10: ffffffff8040a900 R11: ffffffff8040a900 R12: 0000ffff1b60167f
R13: 00000000fffffffc R14: 0000000000000000 R15: 0000ffff1b6012ff
FS:  0000000000000000(0000) GS:ffffffff804d7b80(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000003e2678f4b0 CR3: 00000000bff28000 CR4: 00000000000006e0
Process ib_mad1 (pid: 7155, threadinfo 00000101bccd0000, task 00000101b94cd030)
Stack: 0000000000000000 0000000000000286 0000000000000000 ffffffffa011195b
       00000101beca7000 0000000000000002 00000101beca7380 0000000000000246
       0000000000000246 ffffffff802ab017
Call Trace:<ffffffffa011195b>{:ib_ipoib:path_rec_completion+450}
       <ffffffff802ab017>{dev_queue_xmit+525}
<ffffffffa00e54bd>{:ib_sa:ib_sa_pa
th_rec_callback+64}
       <ffffffffa00e5a56>{:ib_sa:send_handler+74}
<ffffffffa00db763>{:ib_mad:ib_
mad_complete_send_wr+418}
       <ffffffffa00dbce5>{:ib_mad:ib_mad_completion_handler+979}
       <ffffffffa00db912>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff80146e1e>{worker_thread+419}
<ffffffff801333c8>{default_wake_fun
ction+0}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff8014aabc>{keventd_cr
eate_kthread+0}
       <ffffffff80146c7b>{worker_thread+0}
<ffffffff8014aabc>{keventd_create_kth
read+0}
       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
       <ffffffff8014aabc>{keventd_create_kthread+0}
<ffffffff8014a9cb>{kthread+0
}
       <ffffffff80110e0f>{child_rip+0}

Code: 81 7f 04 ad 4e ad de 74 1f 48 8b 74 24 18 48 c7 c7 ed f2 31
RIP <ffffffff8030596b>{_spin_lock_irqsave+12} RSP <00000101bccd1c58>
 <0>Kernel panic - not syncing: Oops

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.