[ofw] ibbus disable on HCA0 erroneously removes all IPoIB instances; including IPoIB ports on HCA1 ?

Smith, Stan stan.smith at intel.com
Fri Feb 6 17:04:52 PST 2009


Hello,
  Recently I discovered some bad HCA disable behavior which used to work correctly?

Has the disable behavior for HCA0 been changed recently such that all existing IPoIB instances for all HCAs are removed?

Details:

For an x86 system using svn.1932 mthca.sys & ibbus.sys with two Mx MT23108 HCAs (1 port active, one port disconnected per HCA), no WSD or WinOF install, just bare mthca, ibbus & IPoIB.

When both HCAs are enabled there are 4 IPoIB instances.

When the 1st HCA as seen by PNP (HCA0 for discussion purposes) is disabled, all 4 IPoIB instances are removed from the device manager view along with the expected HCA0 disabled.
The 2nd HCA (HCA1) is still enabled with no IPoIB instances shown by the device manager.

The expected behavior when disabling HCA0 should be the 1st two IPoIB instances [0 & 2] would be removed from the device manager view, with the 2nd two IPoIB instances [3 & 4] remaining.
This is the case which exposes the ibbus bug where vstat no longer works because \Devices\ibal has been removed as it's bound to the 1st PNP seen HCA which is now disabled.

If you reverse the disable order, such that HCA1 is disabled while HCA0 remains enabled, the expected IPoIB instances [3 & 4] are removed; while instances [0 & 1] remain.

The problem occurs when cl_pnp() calls ibbus::port_mgr_pnp_cb() to remove the IPoIB instances for HCA1; the previous call to ibbus::port_mgr_pnp_cb() for HCA0 is correct.

fdo_query_remove() [
IRP_MN_QUERY_REMOVE_DEVICE IB Bus @ FDO FAB160E8 refs(CI 0 AL 0)
   bfi-0 CA 8025000002c90200
fdo_query_remove() ]
__query_remove() ]
cl_pnp(): IrpSkip/IrpIgnore: skipping down to PDO 81DDD420, ext FAB160E8, status 0
cl_pnp(): returned with status 0
cl_pnp() ]
port_mgr_pnp_cb() [
port_mgr_pnp_cb() ]
port_mgr_pnp_cb() [
port_mgr_port_remove() [
bfi-0 ca_guid 0x8025000002c90200 port_num 1 port_mgr 81D46008
port_mgr_port_remove(): Mark removing IPoIB: PDO 81DA3BD8, ext 81DA3C90, present 0, missing 0
port_mgr_port_remove() ]
port_mgr_pnp_cb() ]
port_mgr_pnp_cb() [
port_mgr_port_remove() [
bfi-0 ca_guid 0x8025000002c90200 port_num 2 port_mgr 81D46008
port_mgr_port_remove(): Mark removing IPoIB: PDO 81DA3350, ext 81DA3408, present 0, missing 0
port_mgr_port_remove() ]
port_mgr_pnp_cb() ]
iou_mgr_pnp_cb() [
iou_mgr_iou_remove() [
bfi-0 ca_guid 0x8025000002c90200 iou_mgr FED74310
iou_mgr_iou_remove(): bfi-0 IB IOU: ext FF58B7B8, present 0, missing 1 .
iou_mgr_iou_remove() ]
iou_mgr_pnp_cb() ]

XXX - this PNP call for HCA1 should not of occurred when disabling HCA0.

port_mgr_port_remove() [
bfi-1 ca_guid 0xa425000002c90200 port_num 1 port_mgr 82030F40
port_mgr_port_remove(): Mark removing IPoIB: PDO FED75DD8, ext FED75E90, present 0, missing 0
port_mgr_port_remove() ]
port_mgr_pnp_cb() ]
port_mgr_pnp_cb() [
port_mgr_port_remove() [
bfi-1 ca_guid 0xa425000002c90200 port_num 2 port_mgr 82030F40
port_mgr_port_remove(): Mark removing IPoIB: PDO FF881DD8, ext FF881E90, present 0, missing 0
port_mgr_port_remove() ]
port_mgr_pnp_cb() ]
iou_mgr_pnp_cb() [
iou_mgr_iou_remove() [
bfi-1 ca_guid 0xa425000002c90200 iou_mgr 821A8A80
iou_mgr_iou_remove(): bfi-1 IB IOU: ext FAC24620, present 0, missing 1 .
iou_mgr_iou_remove() ]
iou_mgr_pnp_cb() ]

XXX end of badness...

Any ideas on the reasons why the 2nd port_mgr_port_remove() call was invoked?
Is there some binding between HCA1 IPoIB ports and HCA0?

Thanks,

Stan.



More information about the ofw mailing list