[openib-general] Problem with 2.4.24 and gen1

Tziporet Koren tziporet at mellanox.co.il
Sun Oct 31 23:51:10 PST 2004


Hi,

The problem is that the driver does not get the interrupt for the command
completion, 
and thus you get the error: "Command not completed after timeout".

It is related to the OS & system you are using. What is the distribution you
are using? We once saw such problems with older versions of SuSE.

Try to add append="acpi=off" to the lilo you are using or add also
disableapic in the same append line.


Tziporet


-----Original Message-----
From: Ken MacInnis [mailto:kcm at psc.edu]
Sent: Sunday, October 31, 2004 8:20 PM
To: openib-general at openib.org
Subject: [openib-general] Problem with 2.4.24 and gen1


Hi,

I've got a fairly modified kernel here I'm trying to get a OpenIB stack
running on.  It's a vanilla 2.4.24 kernel with Lustre and other patches
in it, but I'm seeing this when I modprobe ib_tavor:

Oct 31 13:13:05 samwise kernel:  THH(1): cmdif.c[1190]: Command not
completed after timeout: cmd=TAV
OR_IF_CMD_MAD_IFC (0x24), token=0x1400, pid=0x8E1, go=0
Oct 31 13:13:05 samwise kernel:  THH(1): CMD ERROR DUMP. opcode=0x24,
opc_mod = 0x1, exec_time_micro
=300000000
.
.
Oct 31 13:13:06 samwise kernel:  THH(1): cmdif.c[842]: Failed command
0x24 (TAVOR_IF_CMD_MAD_IFC): s
tatus=0x103 (0x0103 - unexpected error - fatal)
Oct 31 13:13:06 samwise kernel:
Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[2790]:
THH_hob_query_port_prop: cmdif returned FA
TAL
Oct 31 13:13:06 samwise kernel:  VIPKL(1): qpm.c[278]: QPM_new:
HOBKL_query_port_prop returned with
error: -254 = VAPI_EFATAL
Oct 31 13:13:06 samwise kernel:  VIPKL(1): qpm.c[302]: QPM_new:
returned with error: -254 = VAPI_EF
ATAL
Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[3474]:
THH_hob_fatal_err_thread: RECEIVED FATAL E
RROR WAKEUP
Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[4490]:
THH_hob_halt_hca: HALT HCA returned 0x103
Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[1620]:
THH_hob_destroy: FATAL ERROR
Oct 31 13:13:06 samwise kernel:  THH(1): thh_hob.c[1627]:
THH_hob_destroy: PERFORMING SW RESET. pa=0
xFE9F0010 va=0xF8A01010
Oct 31 13:13:06 samwise kernel:
Oct 31 13:13:06 samwise kernel: Mellanox Tavor Device Driver is creating
device "InfiniHost0" (bus=0
4, devfn=00)
Oct 31 13:13:06 samwise kernel:
Oct 31 13:13:06 samwise kernel:
[KERNEL_IB][_tsIbTavorInitOne][tavor_main.c:86]InfiniHost0: VAPI_ope
n_hca failed, status -254 (Fatal error (Local Catastrophic Error))
Oct 31 13:13:06 samwise kernel:
[SRPTP][srp_host_init][srp_host.c:1495]SRP Host using indirect addre
ssing


This occurs with an older openib rev (200-ish) as well as one up-to-date
as of today.

Everything else (modules.conf, etc.) is set up as it has been when I was
messing with 2.4 kernels and OpenIB a few months ago, so I'm not
thinking it's related to such.

Any ideas?  Yes, I know it's 2.4 as well as a fairly older 2.4, but I
have no choice here. :)  lspci -vvv bits follow.

03:01.0 PCI bridge: Mellanox Technology: Unknown device 5a46 (rev a1)
(prog-if 00 [Normal decode])
         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
         Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <P
ERR-
         Latency: 64, cache line size 10
         Bus: primary=03, secondary=04, subordinate=04, sec-latency=64
         I/O behind bridge: 0000f000-00000fff
         Memory behind bridge: fe700000-fe9fffff
         Prefetchable memory behind bridge: 
00000000eb200000-00000000fc200000
         BridgeCtl: Parity+ SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
         Capabilities: [70] PCI-X non-bridge device.
                 Command: DPERE+ ERO+ RBC=0 OST=4
                 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, D
MCRS=0, RSCEM-
04:00.0 InfiniBand: Mellanox Technology: Unknown device 5a44 (rev a1)
         Subsystem: Mellanox Technology: Unknown device 5a44
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR+ FastB2B-
         Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <P
ERR-
         Latency: 64, cache line size 10
         Interrupt: pin A routed to IRQ 25
         Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=1M]
         Region 2: Memory at fb800000 (64-bit, prefetchable) [size=8M]
         Region 4: Memory at f0000000 (64-bit, prefetchable) [size=128M]
         Capabilities: [40] #11 [001f]
         Capabilities: [60] Message Signalled Interrupts: 64bit+
Queue=0/5 Enable-
                 Address: 0000000000000000  Data: 0000
         Capabilities: [70] PCI-X non-bridge device.
                 Command: DPERE- ERO- RBC=3 OST=1
                 Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-,
DC=simple, DMMRBC=0, DMOST=0, D
MCRS=0, RSCEM-


Ken

-- 
Ken MacInnis - Systems Engineer, PSC - http://www.psc.edu/~kcm/
kcm at psc dot edu - +1 412 268 9833 (w) - +1 412 268 5832 (f)
Pittsburgh Supercomputing Center - 4400 Fifth Ave - Pittsburgh, PA 15213
_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20041101/7a127731/attachment.html>


More information about the general mailing list