[Openib-windows] [BUG] Errors with multiple HCAs in a single system
Fab Tillier
ftillier at silverstorm.com
Mon Oct 17 12:16:02 PDT 2005
I am trying to run both a PCI-X and PCI-Express HCA in a single server to
compare performance, and cannot enable both devices simultaneously. Which ever
HCA is starts second fails with the following output to the debugger:
<1> MDT(1): prepare_intr_resources: Failed to MOSAL_ISR_set MOSAL ret=-1
<1> MDT(4): THH_eventp_create: Cannot set interrupt resources.
<1> MDT(1): THH_hob_open_hca: could not create eventp (-255)
<1> MDT(1): THH_hob_close_hca: Device already closed
It looks like the device lookup in MOSAL_ISR_set uses a generic name
"InfiniHost" and thus matches the first entry in the table.
Why is this doing a lookup? Doesn't the code know exactly which device is being
initialized? Why put the device in the MOSAL_dev_db and then search for it?
I set a breakpoint and manually helped the code find the right entry, and things
moved forward from here but quickly failed again when creating a special QP:
~1:mlnx_query_qp() !ERROR!: completes with ERROR status IB_INVALID_QP_HANDLE
~1:mlnx_create_spl_qp() !ERROR!: completes with ERROR status
IB_INVALID_QP_HANDLE
~1:al:create_spl_qp_svc() !ERROR!: ib_get_spl_qp failed, IB_INVALID_QP_HANDLE
~1:al:create_spl_qp_svc() ]
Now, the fact that create_spl_qp can return IB_INVALID_QP_HANDLE seems flawed to
me. To make things worse, the HCA driver doesn't clean up the QP it
successfully created internally, and that resource is thus leaked.
Apparently, the same issue happens with normal QP creation, judging by the
output from IPoIB trying to load:
~0:mlnx_query_qp() !ERROR!: completes with ERROR status IB_INVALID_QP_HANDLE
~0:mlnx_create_qp() !ERROR!: completes with ERROR status IB_INVALID_QP_HANDLE
~0:__ib_mgr_init() !ERROR!: ib_create_qp returned IB_INVALID_QP_HANDLE
~0:__ib_mgr_init() ]
~0:__port_init() !ERROR!: __ib_mgr_init returned IB_INVALID_QP_HANDLE
~0:__port_init() ]
~0:ipoib_create_port() !ERROR!: ipoib_port_init returned IB_INVALID_QP_HANDLE.
~0:ipoib_create_port() ]
IPoIB then tries to tear things down and the code asserts because PD destruction
failed:
~0:mlnx_deallocate_pd() !ERROR!: completes with ERROR status IB_RESOURCE_BUSY
The resource is busy due to the leak of the QP when mlnx_create_qp failed.
- Fab
More information about the ofw
mailing list