[ofa-general] Next set of mthca issues
Pradeep Satyanarayana
pradeep at us.ibm.com
Mon Apr 16 17:18:36 PDT 2007
Here is the stack trace that I see after I upgraded to the latest version
(3.5) of the FW. Now the version
of FW is not displayed in /var/log/messages. Is that because FW version is
at the "expected level"?
However, /sys/class/infiniband/mthca0/fw_ver does indicate it is 3.5.
ping seems to work fine, but run into problems with netperf. (especially
when it is the receiver i.e. running netserver).
I am running these tests on a ppc64 mcahine.
Pradeep
pradeep at us.ibm.com
Apr 16 19:37:49 elm3b37 kernel: ib_mthca: Mellanox InfiniBand HCA driver
v0.08 (February 14, 2006)
Apr 16 19:37:49 elm3b37 kernel: ib_mthca: Initializing 0002:d9:00.0
Apr 16 19:37:53 elm3b37 kernel: ADDRCONF(NETDEV_UP): ib1: link is not
ready
Apr 16 19:38:02 elm3b37 kernel: ib0: enabling connected mode will cause
multicast packet drops
Apr 16 19:38:05 elm3b37 kernel: ib0: mtu > 2044 will cause multicast
packet drops.
Apr 16 19:46:25 elm3b37 kernel: Call Trace:
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3BB0] [C00000000000F884]
.show_stack+0x54/0x1f0 (unreliable)
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3C60] [C0000000000415EC]
.eeh_dn_check_failure+0x2bc/0x320
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3D10] [C0000000000416E4]
.eeh_check_failure+0x94/0x170
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3D90] [D00000000025ACEC]
.mthca_tavor_interrupt+0x1cc/0x1e0 [ib_mthca]
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3E50] [C00000000008C180]
.handle_IRQ_event+0x70/0x100
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3EF0] [C00000000008EAB0]
.handle_fasteoi_irq+0xd0/0x200
Apr 16 19:46:25 elm3b37 kernel: [C00000000FFF3F90] [C000000000028638]
.call_handle_irq+0x1c/0x2c
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FA50] [C00000000000CCA0]
.do_IRQ+0xc0/0x1e0
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FAE0] [C000000000004270]
hardware_interrupt_entry+0x18/0x28
Apr 16 19:46:25 elm3b37 kernel: --- Exception: 501 at
.pseries_dedicated_idle_sleep+0xd4/0x1a0
Apr 16 19:46:25 elm3b37 kernel: LR =
.pseries_dedicated_idle_sleep+0xd0/0x1a0
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FDD0] [0000000000000000]
.__start+0x4000000000000000/0x8 (unreliable)
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FE70] [C00000000001200C]
.cpu_idle+0x13c/0x250
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FF00] [C00000000002B16C]
.start_secondary+0x14c/0x190
Apr 16 19:46:25 elm3b37 kernel: [C0000000EB57FF90] [C000000000008364]
.start_secondary_prolog+0xc/0x10
Apr 16 19:46:25 elm3b37 kernel: EEH: Detected PCI bus error on device
0002:d9:00.0
Apr 16 19:46:25 elm3b37 kernel: EEH: This PCI device has failed 1 times
since last reboot: location=U7879.001.DQD1EKZ-P1-C2 driver=ib_mthca pci
addr=0002:d9:00.0
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: Catastrophic error
detected: unknown error
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[00]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[01]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[02]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[03]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[04]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[05]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[06]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[07]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[08]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[09]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0a]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0b]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0c]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0d]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0e]: ffffffff
Apr 16 19:46:30 elm3b37 kernel: ib_mthca 0002:d9:00.0: buf[0f]: ffffffff
Apr 16 19:46:35 elm3b37 kernel: ib_mthca 0002:d9:00.0: HW2SW_MPT failed
(-11)
Apr 16 19:47:05 elm3b37 last message repeated 3 times
Apr 16 19:47:05 elm3b37 last message repeated 3 times
Apr 16 19:47:15 elm3b37 kernel: ib0: ib_detach_mcast failed (result = -11)
Apr 16 19:47:15 elm3b37 kernel: ib0: ipoib_mcast_detach failed (result =
-11)
Apr 16 19:47:25 elm3b37 kernel: ib0: ib_detach_mcast failed (result = -11)
Apr 16 19:47:25 elm3b37 kernel: ib0: ipoib_mcast_detach failed (result =
-11)
Apr 16 19:47:35 elm3b37 kernel: ib0: ib_detach_mcast failed (result = -11)
Apr 16 19:47:35 elm3b37 kernel: ib0: ipoib_mcast_detach failed (result =
-11)
Apr 16 19:47:45 elm3b37 kernel: ib0: ib_detach_mcast failed (result = -11)
Apr 16 19:47:45 elm3b37 kernel: ib0: ipoib_mcast_detach failed (result =
-11)
More information about the general
mailing list