[ofw] crash in mlx4 driver - or maybe it's an ipoib issue?

Sean Hefty sean.hefty at intel.com
Fri Mar 13 12:38:21 PDT 2009


This is either more random details or a completely separate problem.  This time
I'm running with a debugger attached during the test.

After running dtest2 using libibverbs and the socket CM, the test completes
successfully, but then this occurs in the kernel:

CQ overrun on CQN 00008f

Detected catastrophic error on mdev FFFFFADF9C409000

~1:[MLX4_BUS] pci_get_msi_info() :MSI-X Capability: Enabled - 0, Function Masked
0, Vectors Supported 256, Addr_Offset(BIR) 0(4), Pend_Offset(BIR) 0x1000(4)
~1:[MLX4_BUS] pci_get_msi_info() :MSI-X Vectors: Allocated 0 vectors
~1:[MLX4_BUS] pci_hca_reset() :
Resetting HCA ... 

~1:[MLX4_BUS] pci_get_msi_info() :MSI-X Capability: Enabled - 0, Function Masked
0, Vectors Supported 256, Addr_Offset(BIR) 0(4), Pend_Offset(BIR) 0x1000(4)
~1:[MLX4_BUS] pci_get_msi_info() :MSI-X Vectors: Allocated 0 vectors
~1:[MLX4_BUS] pci_hca_reset() :HCA has been reset ! 
Internal error detected:

{snip - a bunch of null buffers}

~1:[MLX4_HCA] mlnx_query_ca() :***ERROR***  ib_query_device failed (-14)
~1:[MLX4_HCA] mlnx_query_ca() :***ERROR***  completes with ERROR status 2b
~1:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
[IPoIB]:ipoib_port_send() !ERROR!: ib_post_send returned IB_ERROR
[IPoIB]:NdisMSendCompleteX() !ERROR!: Sending status other than Success to NDIS
~1:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
[IPoIB]:ipoib_port_send() !ERROR!: ib_post_send returned IB_ERROR
[IPoIB]:NdisMSendCompleteX() !ERROR!: Sending status other than Success to NDIS
~0:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
~0:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
~0:[MLX4_HCA] mlnx_modify_qp() :***ERROR***  ibv_modify_qp failed (-14)
~0:[MLX4_HCA] mlnx_modify_qp() :***ERROR***  completes with ERROR status 2b
[IPoIB]:ipoib_port_down() !ERROR!: ib_modify_qp to error state returned
IB_ERROR.
~0:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
~0:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b
~0:[MLX4_HCA] mlnx_post_send() :***ERROR***  post_send failed with status 2b

*** Assertion failed: !p_obj->ref_cnt
***   Source File: c:\mshefty\scm\winof\branches\winverbs\core\complib\cl_obj.c,
line 701

ipoib!__destroy_obj
iopib!cl_obj_destroy
ipoib!ipoib_port_destroy
ipoib!__ipoib_adapter_reset

p_adapter->state is set to 0x1002  (which I believe is add port)
p_port->obj.ref_cnt is set to 0x203

If I ignore the assertion, it repeats itself roughly every 10 seconds.  See my
other reply regarding a bug in the error handling in the mlx4 code.

- Sean




More information about the ofw mailing list