[openib-general] [Bug 263] OFED 1.1 rc6: IPoIB Oops during IPoIB failover loop

bugzilla-daemon at openib.org bugzilla-daemon at openib.org
Wed Oct 18 09:56:23 PDT 2006


http://openib.org/bugzilla/show_bug.cgi?id=263





------- Comment #11 from sweitzen at cisco.com  2006-10-18 09:56 -------
Roland, I enabled debug_level=1 with OFED 1.1 rc7 RHEL4 U3 x86_64, and got same
crash (netserver machine).

I could only see the debug_level=1 info by running dmesg in a loop, and the
info did not get saved into any /var/log files.  Is there some extra
configuration needed for syslog?  Shouldn't IPoIB debug_level=1 info go into a
syslog file by default?

Here's what I saw from dmesg loop right before crash.

ib1: Port state change event
ib0: Port state change event
ib1: Port state change event
ib0: flushing
ib0: downing ib_dev
ib1: flushing
ib1: downing ib_dev
ib0: Created ah 00000101beffa800
ib1: Created ah 00000101be636800
ib0: Created ah 00000101be5724c0
ib1: Created ah 00000101be9c8a80
ib0: Created ah 00000101bfc57100
ib1: Created ah 00000101be49f700
ib0: Created ah 00000101beffa3c0
ib1: Created ah 00000101beffae80
ib0: Created ah 00000101be636b40
ib1: Created ah 000001019dfecd40
ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0861 MTU >
1024
ib0: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0005:ad00:0020:0861
ib0: Created ah 000001019dfec600
ib0: created address handle 000001019dfecac0 for LID 0x0006, SL 0
ib0: Port state change event
ib1: Port state change event
ib0: flushing
ib0: downing ib_dev
ib1: flushing
ib1: downing ib_dev
ib0: Start path record lookup for fe80:0000:0000:0000:0005:ad00:0020:0861 MTU >
1024
ib0: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0005:ad00:0020:0861
ib0: Created ah 00000101beffa300
ib0: created address handle 000001019dfec1c0 for LID 0x0006, SL 0
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: Created ah 00000101bfc55e80
ib0: Created ah 00000101bfc4cc80
ib0: Created ah 000001019dfec480
ib0: Created ah 000001019dfec3c0
ib0: Created ah 000001019dfec100
Tue Oct 17 01:05:42 PDT 2006

Message from syslogd at svbu-qa-pcie-1 at Tue Oct 17 01:05:43 2006 ...
svbu-qa-pcie-1 kernel: general protection fault: 0000 [1] SMP


Here's serial console output from netserver machine.

ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
ib0: dev_queue_xmit failed to requeue packet
general protection fault: 0000 [1] SMP
CPU 0
Modules linked in: rdma_ucm(U) rdma_cm(U) ib_addr(U) ib_ipoib(U)
ib_mthca<7>Losi
ng some ticks... checking if CPU frequency changed.
(U) ib_umad(U) ib_ucm(U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) ib_core(U)
md5
 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core nfs lockd nfs_acl sunrpc
ds
 yenta_socket pcmcia_core dm_mirror dm_multipath dm_mod button battery ac
uhci_h
cd ehci_hcd hw_random shpchp e1000 floppy sg ext3 jbd aic79xx sd_mod scsi_mod
Pid: 7838, comm: ib_mad1 Not tainted 2.6.9-34.ELsmp
RIP: 0010:[<ffffffffa01c384b>]
<ffffffffa01c384b>{:ib_ipoib:path_rec_completion+
178}
RSP: 0018:00000101a756bc70  EFLAGS: 00010202
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip mwait_idle+0x56/0x7c
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000101bbeffc80 RSI: 0000000000000000 RDI: 00000000fffffffc
RBP: 00000101bbeffc80 R08: 0000000000000003 R09: 00000101bbeffca0
R10: ffffffff8011dfe0 R11: ffffffff8011dfe0 R12: 0000ffff1b60167f
R13: 00000000fffffffc R14: 0000000000000000 R15: 0000ffff1b6012ff
FS:  0000000000000000(0000) GS:ffffffff804d7b00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006cf5e8 CR3: 0000000000101000 CR4: 00000000000006e0
Process ib_mad1 (pid: 7838, threadinfo 00000101a756a000, task 00000101bdc3b030)
Stack: ffffffffa00e547d 00000101afda5000 0000000000000002 00000101afda5380
       0000000000000246 0000000000000246 ffffffff802ab017 00000101bc16a500
       00000101bbeffca0 00000101bbeffc80
Call Trace:<ffffffffa00e547d>{:ib_sa:ib_sa_path_rec_callback+0}
       <ffffffff802ab017>{dev_queue_xmit+525}
<ffffffffa01c3b0e>{:ib_ipoib:path_
rec_completion+885}
       <ffffffffa00e54bd>{:ib_sa:ib_sa_path_rec_callback+64}
       <ffffffffa00e5a56>{:ib_sa:send_handler+74}
<ffffffffa00db763>{:ib_mad:ib_
mad_complete_send_wr+418}
       <ffffffffa00dbce5>{:ib_mad:ib_mad_completion_handler+979}
       <ffffffffa00db912>{:ib_mad:ib_mad_completion_handler+0}
       <ffffffff80146e1e>{worker_thread+419}
<ffffffff801333c8>{default_wake_fun
ction+0}
       <ffffffff801333c8>{default_wake_function+0}
<ffffffff8014aabc>{keventd_cr
eate_kthread+0}
       <ffffffff80146c7b>{worker_thread+0}
<ffffffff8014aabc>{keventd_create_kth
read+0}
       <ffffffff8014aa93>{kthread+200} <ffffffff80110e17>{child_rip+8}
       <ffffffff8014aabc>{keventd_create_kthread+0}
<ffffffff8014a9cb>{kthread+0
}
       <ffffffff80110e0f>{child_rip+0}

Code: 49 8b 74 24 08 50 0f b6 42 16 50 0f b6 42 15 50 0f b6 42 14
RIP <ffffffffa01c384b>{:ib_ipoib:path_rec_completion+178} RSP
<00000101a756bc70>
 <0>Kernel panic - not syncing: Oops




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the general mailing list