[Users] IPoIB not working on Windows 2008 r2 - need help

Hal Rosenstock hal.rosenstock at gmail.com
Fri Jun 7 17:55:26 PDT 2013


On Fri, Jun 7, 2013 at 6:52 PM, Orion Poplawski <orion at cora.nwra.com> wrote:

> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>
>  Also, if you turn on log verbosity on OpenSM temporarily and send me the
>> log
>> for that run, I could see what is going on with in terms of trying to set
>> the
>> non default subnet prefix with the Windows node. Given the log you sent,
>> I can
>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>> which sets the subnet prefix but not really acting on it properly.
>> -- Hal
>>
>
> Full log is at http://sw.cora.nwra.com/test/**opensm.debug.log.gz<http://sw.cora.nwra.com/test/opensm.debug.log.gz>
>
>
Looking at that log, I didn't see _any_ MC joins from that port (GUID
0x5ad00000c5ced) so this is a different scenario than before :-(

Also, the previous confusion with:

# saquery -m 0xc000
PortGid.................fe80::**1:5:ad00:c:5c3d (Topspin DDR-HCAe LX x8)

PortGid.................fe80::**1:19:bbff:ff00:5851 (saga mthca0)
PortGid.................fe80::**1:19:bbff:ff00:3899 (sfcomp1 mthca0)

PortGid.................fe80::**1:1a:4bff:ff0c:20c9 (HP Lion Cub 128MB)
PortGid.................fe80::**5:ad00:c:5ced (MT25204 InfiniHostLx
Mellanox Technologies)
PortGid.................fe80::**1:17:8ff:ffd0:9df9 (alexandria2 HCA-1)
GUID is 5:ad00:c:5ced and prefix is fe80::** so it's either missing a digit
like 1 (fe80::1 like the others) or if it's a 0 it would have a 3rd colon
(fe80:::). So I'm not sure what's going on there either.




> I had fontdb shutdown when I started opensm - then booted it up.
>


> This seems to be when it first comes up (lid 0, prefix 0xfe80::0)
>
> Jun 07 14:56:58 088453 [193D0700] 0x10 -> osm_pi_rcv_process: [
> Jun 07 14:56:58 088465 [193D0700] 0x08 -> PortInfo dump:
>                                 port number..............1
>                                 node_guid................**
> 0x0005ad00000c5cec
>                                 port_guid................**
> 0x0005ad00000c5ced
>                                 m_key....................**
> 0x0000000000000000
>                                 subnet_prefix............**
> 0xfe80000000000000
>                                 base_lid.................0
>                                 master_sm_base_lid.......0
>                                 capability_mask..........**0x2500A68
>                                 diag_code................0x0
>                                 m_key_lease_period.......0x0
>                                 local_port_num...........1
>                                 link_width_enabled.......0x3
>                                 link_width_supported.....0x3
>                                 link_width_active........0x2
>                                 link_speed_supported.....0x3
>                                 port_state...............INIT
>                                 state_info2..............0x52
>                                 m_key_protect_bits.......0x0
>                                 lmc......................0x0
>                                 link_speed...............0x13
>                                 mtu_smsl.................0x20
>                                 vl_cap_init_type.........0x30
>                                 vl_high_limit............0x0
>                                 vl_arb_high_cap..........0x8
>                                 vl_arb_low_cap...........0x8
>                                 init_rep_mtu_cap.........0x4
>                                 vl_stall_life............0xFF
>                                 vl_enforce...............0x30
>                                 m_key_violations.........0x0
>                                 p_key_violations.........0x0
>                                 q_key_violations.........0x0
>                                 guid_cap.................0x20
>                                 client_reregister........0x0
>                                 mcast_pkey_trap_suppr....0x0
>                                 subnet_timeout...........0x0
>                                 resp_time_value..........0x10
>                                 error_threshold..........0xF0
>                                 max_credit_hint..........0x0
>                                 link_round_trip_latency..0x0
>                                 capability_mask2.........0x0
>                                 link_speed_ext_active....0x0
>                                 link_speed_ext_supported.0x0
>                                 link_speed_ext_enabled...0x0
> Jun 07 14:56:58 088495 [193D0700] 0x08 -> Capability Mask:
>                                 IB_PORT_CAP_HAS_TRAP
>                                 IB_PORT_CAP_HAS_AUTO_MIG
>                                 IB_PORT_CAP_HAS_SL_MAP
>                                 IB_PORT_CAP_HAS_LED_INFO
>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>                                 IB_PORT_CAP_HAS_VEND_CLS
>                                 IB_PORT_CAP_HAS_CAP_NTC
>                                 IB_PORT_CAP_HAS_CLIENT_REREG
> Jun 07 14:56:58 088499 [193D0700] 0x04 -> osm_pi_rcv_process: Discovered
> port num 1 with GUID 0x5ad00000c5ced for parent node GUID 0x5ad00000c5cec,
> TID 0x130e
>
>
> Then later, sm seems to have assigned a lid.
>
> Jun 07 14:56:58 090679 [161CB700] 0x08 -> PortInfo dump:
>                                 port number..............1
>                                 node_guid................**
> 0x0005ad00000c5cec
>                                 port_guid................**
> 0x0005ad00000c5ced
>                                 m_key....................**
> 0x0000000000000000
>                                 subnet_prefix............**
> 0xfe80000000000001
>                                 base_lid.................16
>                                 master_sm_base_lid.......1
>                                 capability_mask..........**0x2500A68
>                                 diag_code................0x0
>                                 m_key_lease_period.......0x0
>                                 local_port_num...........1
>                                 link_width_enabled.......0x3
>                                 link_width_supported.....0x3
>                                 link_width_active........0x2
>                                 link_speed_supported.....0x3
>                                 port_state...............INIT
>                                 state_info2..............0x52
>                                 m_key_protect_bits.......0x0
>                                 lmc......................0x0
>                                 link_speed...............0x13
>                                 mtu_smsl.................0x40
>                                 vl_cap_init_type.........0x30
>                                 vl_high_limit............0x0
>                                 vl_arb_high_cap..........0x8
>                                 vl_arb_low_cap...........0x8
>                                 init_rep_mtu_cap.........0x4
>                                 vl_stall_life............0xFF
>                                 vl_enforce...............0x30
>                                 m_key_violations.........0x0
>                                 p_key_violations.........0x0
>                                 q_key_violations.........0x0
>                                 guid_cap.................0x20
>                                 client_reregister........0x1
>                                 mcast_pkey_trap_suppr....0x0
>                                 subnet_timeout...........0x12
>                                 resp_time_value..........0x10
>                                 error_threshold..........0x88
>                                 max_credit_hint..........0x0
>                                 link_round_trip_latency..0x0
>                                 capability_mask2.........0x0
>                                 link_speed_ext_active....0x0
>                                 link_speed_ext_supported.0x0
>                                 link_speed_ext_enabled...0x0
> Jun 07 14:56:58 090709 [161CB700] 0x08 -> Capability Mask:
>                                 IB_PORT_CAP_HAS_TRAP
>                                 IB_PORT_CAP_HAS_AUTO_MIG
>                                 IB_PORT_CAP_HAS_SL_MAP
>                                 IB_PORT_CAP_HAS_LED_INFO
>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>                                 IB_PORT_CAP_HAS_VEND_CLS
>                                 IB_PORT_CAP_HAS_CAP_NTC
>                                 IB_PORT_CAP_HAS_CLIENT_REREG
> Jun 07 14:56:58 090713 [161CB700] 0x08 -> osm_pi_rcv_process: Client
> reregister received on response
> Jun 07 14:56:58 091294 [12FC6700] 0x10 -> osm_db_store: ]
> Jun 07 14:56:58 091301 [12FC6700] 0x10 -> osm_lid_mgr_process_subnet: ]
> Jun 07 14:56:58 091308 [161CB700] 0x10 -> pi_rcv_process_set: [
> Jun 07 14:56:58 091313 [161CB700] 0x08 -> pi_rcv_process_set: Received
> logical SetResp() for GUID 0x5ad00000c5ced, port num 1
>                                 for parent node GUID 0x5ad00000c5cec TID
> 0x1311
> Jun 07 14:56:58 091320 [161CB700] 0x08 -> osm_db_update:
> Key:0x0005ad00000c5ced previously exists in:/var/cache/opensm/guid2mkey
> with value:0x0000000000000000
> Jun 07 14:56:58 091324 [161CB700] 0x10 -> pi_rcv_process_set: ]
> Jun 07 14:56:58 091327 [161CB700] 0x10 -> osm_pi_rcv_process: ]
>
> But I'm not really sure what I'm looking for.
>
>
> --
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
> 3380 Mitchell Lane                       orion at nwra.com
> Boulder, CO 80301                   http://www.nwra.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/60c2d0cf/attachment.html>


More information about the Users mailing list