[openib-general] Problem with 2.4.24 and gen1

Ken MacInnis kcm at psc.edu
Mon Nov 1 04:44:07 PST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Right.  I've had this machine (and OS running a much more vanilla
configuration) and HBA using the OpenIB and MTI stacks just fine in the
past.  Dual Opteron, 8GB RAM, PCI-X MT23108.  This same problem happens
with this kernel on fairly different hardware we're using too, though..

It is Fedora Core 1, vanilla 2.4.24-based with Lustre 1.2.6
patches/mods.  Almost nothing is modular in the kernel.. it is either
off or compiled in.  In fact, ACPI is turned off.. perhaps enabling it
would be beneficial?  I have attached the config file if that helps.
Perhaps there is something critical I have unknowingly disabled.

Also, another question I have is fairly naive -- at what point are the
Lion Cub (PCI Express) cards supported in the OpenIB stack?  I seem to
remember the Tavor code supporting them inherently but in a
non-efficient manner if native code wasn't used.

Ken

Tziporet Koren wrote:

| The problem is that the driver does not get the interrupt for the command
| completion,
| and thus you get the error: "Command not completed after timeout".
|
| It is related to the OS & system you are using. What is the
distribution you
| are using? We once saw such problems with older versions of SuSE.
|
| Try to add append="acpi=off" to the lilo you are using or add also
| disableapic in the same append line.
|
|
| Tziporet
|
|
| -----Original Message-----
| From: Ken MacInnis [mailto:kcm at psc.edu]
| Sent: Sunday, October 31, 2004 8:20 PM
| To: openib-general at openib.org
| Subject: [openib-general] Problem with 2.4.24 and gen1

| I've got a fairly modified kernel here I'm trying to get a OpenIB stack
| running on.  It's a vanilla 2.4.24 kernel with Lustre and other patches
| in it, but I'm seeing this when I modprobe ib_tavor:
|
| Oct 31 13:13:05 samwise kernel:  THH(1): cmdif.c[1190]: Command not
| completed after timeout: cmd=TAV
| OR_IF_CMD_MAD_IFC (0x24), token=0x1400, pid=0x8E1, go=0
| Oct 31 13:13:05 samwise kernel:  THH(1): CMD ERROR DUMP. opcode=0x24,
| opc_mod = 0x1, exec_time_micro
| =300000000
| .
| .
| Oct 31 13:13:06 samwise kernel:  THH(1): cmdif.c[842]: Failed command
| 0x24 (TAVOR_IF_CMD_MAD_IFC): s
| tatus=0x103 (0x0103 - unexpected error - fatal)
| Oct 31 13:13:06 samwise kernel:
| Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[2790]:
| THH_hob_query_port_prop: cmdif returned FA
| TAL
| Oct 31 13:13:06 samwise kernel:  VIPKL(1): qpm.c[278]: QPM_new:
| HOBKL_query_port_prop returned with
| error: -254 = VAPI_EFATAL
| Oct 31 13:13:06 samwise kernel:  VIPKL(1): qpm.c[302]: QPM_new:
| returned with error: -254 = VAPI_EF
| ATAL
| Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[3474]:
| THH_hob_fatal_err_thread: RECEIVED FATAL E
| RROR WAKEUP
| Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[4490]:
| THH_hob_halt_hca: HALT HCA returned 0x103
| Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[1620]:
| THH_hob_destroy: FATAL ERROR
| Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[1627]:
| THH_hob_destroy: PERFORMING SW RESET. pa=0
| xFE9F0010 va=0xF8A01010
| Oct 31 13:13:06 samwise kernel:
| Oct 31 13:13:06 samwise kernel: Mellanox Tavor Device Driver is creating
| device "InfiniHost0" (bus=0
| 4, devfn=00)
| Oct 31 13:13:06 samwise kernel:
| Oct 31 13:13:06 samwise kernel:
| [KERNEL_IB][_tsIbTavorInitOne][tavor_main.c:86]InfiniHost0: VAPI_ope
| n_hca failed, status -254 (Fatal error (Local Catastrophic Error))
| Oct 31 13:13:06 samwise kernel:
| [SRPTP][srp_host_init][srp_host.c:1495]SRP Host using indirect addre
| ssing
|
|
| This occurs with an older openib rev (200-ish) as well as one up-to-date
| as of today.
|
| Everything else (modules.conf, etc.) is set up as it has been when I was
| messing with 2.4 kernels and OpenIB a few months ago, so I'm not
| thinking it's related to such.
|
| Any ideas?  Yes, I know it's 2.4 as well as a fairly older 2.4, but I
| have no choice here. :)  lspci -vvv bits follow.
|
| 03:01.0 PCI bridge: Mellanox Technology: Unknown device 5a46 (rev a1)
| (prog-if 00 [Normal decode])
|          Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
| ParErr- Stepping- SERR+ FastB2B-
|          Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
| <TAbort- <MAbort- >SERR- <P
| ERR-
|          Latency: 64, cache line size 10
|          Bus: primary=03, secondary=04, subordinate=04, sec-latency=64
|          I/O behind bridge: 0000f000-00000fff
|          Memory behind bridge: fe700000-fe9fffff
|          Prefetchable memory behind bridge:
| 00000000eb200000-00000000fc200000
|          BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
|          Capabilities: [70] PCI-X non-bridge device.
|                  Command: DPERE+ ERO+ RBC=0 OST=4
|                  Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
| DC=simple, DMMRBC=0, DMOST=0, D
| MCRS=0, RSCEM-
| 04:00.0 InfiniBand: Mellanox Technology: Unknown device 5a44 (rev a1)
|          Subsystem: Mellanox Technology: Unknown device 5a44
|          Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
| ParErr- Stepping- SERR+ FastB2B-
|          Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
| <TAbort- <MAbort- >SERR- <P
| ERR-
|          Latency: 64, cache line size 10
|          Interrupt: pin A routed to IRQ 25
|          Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=1M]
|          Region 2: Memory at fb800000 (64-bit, prefetchable) [size=8M]
|          Region 4: Memory at f0000000 (64-bit, prefetchable) [size=128M]
|          Capabilities: [40] #11 [001f]
|          Capabilities: [60] Message Signalled Interrupts: 64bit+
| Queue=0/5 Enable-
|                  Address: 0000000000000000  Data: 0000
|          Capabilities: [70] PCI-X non-bridge device.
|                  Command: DPERE- ERO- RBC=3 OST=1
|                  Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
| DC=simple, DMMRBC=0, DMOST=0, D
| MCRS=0, RSCEM-
|
|
| Ken
|


- --
Ken MacInnis - Systems Engineer, PSC - http://www.psc.edu/~kcm/
kcm at psc dot edu - +1 412 268 9833 (w) - +1 412 268 5832 (f)
Pittsburgh Supercomputing Center - 4400 Fifth Ave - Pittsburgh, PA 15213
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFBhi9mnT0C17PQhv4RAvckAKComYvuQ8dZ+B3tZBuBvkH6q+MDSgCfe3Bz
DtsqzV39ekgtfzWIGx6vNzk=
=zkFD
-----END PGP SIGNATURE-----



More information about the general mailing list