[Users] IPoIB not working on Windows 2008 r2 - need help
Hal Rosenstock
hal.rosenstock at gmail.com
Fri Jun 7 17:55:26 PDT 2013
On Fri, Jun 7, 2013 at 6:52 PM, Orion Poplawski <orion at cora.nwra.com> wrote:
> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>
> Also, if you turn on log verbosity on OpenSM temporarily and send me the
>> log
>> for that run, I could see what is going on with in terms of trying to set
>> the
>> non default subnet prefix with the Windows node. Given the log you sent,
>> I can
>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>> which sets the subnet prefix but not really acting on it properly.
>> -- Hal
>>
>
> Full log is at http://sw.cora.nwra.com/test/**opensm.debug.log.gz<http://sw.cora.nwra.com/test/opensm.debug.log.gz>
>
>
Looking at that log, I didn't see _any_ MC joins from that port (GUID
0x5ad00000c5ced) so this is a different scenario than before :-(
Also, the previous confusion with:
# saquery -m 0xc000
PortGid.................fe80::**1:5:ad00:c:5c3d (Topspin DDR-HCAe LX x8)
PortGid.................fe80::**1:19:bbff:ff00:5851 (saga mthca0)
PortGid.................fe80::**1:19:bbff:ff00:3899 (sfcomp1 mthca0)
PortGid.................fe80::**1:1a:4bff:ff0c:20c9 (HP Lion Cub 128MB)
PortGid.................fe80::**5:ad00:c:5ced (MT25204 InfiniHostLx
Mellanox Technologies)
PortGid.................fe80::**1:17:8ff:ffd0:9df9 (alexandria2 HCA-1)
GUID is 5:ad00:c:5ced and prefix is fe80::** so it's either missing a digit
like 1 (fe80::1 like the others) or if it's a 0 it would have a 3rd colon
(fe80:::). So I'm not sure what's going on there either.
> I had fontdb shutdown when I started opensm - then booted it up.
>
> This seems to be when it first comes up (lid 0, prefix 0xfe80::0)
>
> Jun 07 14:56:58 088453 [193D0700] 0x10 -> osm_pi_rcv_process: [
> Jun 07 14:56:58 088465 [193D0700] 0x08 -> PortInfo dump:
> port number..............1
> node_guid................**
> 0x0005ad00000c5cec
> port_guid................**
> 0x0005ad00000c5ced
> m_key....................**
> 0x0000000000000000
> subnet_prefix............**
> 0xfe80000000000000
> base_lid.................0
> master_sm_base_lid.......0
> capability_mask..........**0x2500A68
> diag_code................0x0
> m_key_lease_period.......0x0
> local_port_num...........1
> link_width_enabled.......0x3
> link_width_supported.....0x3
> link_width_active........0x2
> link_speed_supported.....0x3
> port_state...............INIT
> state_info2..............0x52
> m_key_protect_bits.......0x0
> lmc......................0x0
> link_speed...............0x13
> mtu_smsl.................0x20
> vl_cap_init_type.........0x30
> vl_high_limit............0x0
> vl_arb_high_cap..........0x8
> vl_arb_low_cap...........0x8
> init_rep_mtu_cap.........0x4
> vl_stall_life............0xFF
> vl_enforce...............0x30
> m_key_violations.........0x0
> p_key_violations.........0x0
> q_key_violations.........0x0
> guid_cap.................0x20
> client_reregister........0x0
> mcast_pkey_trap_suppr....0x0
> subnet_timeout...........0x0
> resp_time_value..........0x10
> error_threshold..........0xF0
> max_credit_hint..........0x0
> link_round_trip_latency..0x0
> capability_mask2.........0x0
> link_speed_ext_active....0x0
> link_speed_ext_supported.0x0
> link_speed_ext_enabled...0x0
> Jun 07 14:56:58 088495 [193D0700] 0x08 -> Capability Mask:
> IB_PORT_CAP_HAS_TRAP
> IB_PORT_CAP_HAS_AUTO_MIG
> IB_PORT_CAP_HAS_SL_MAP
> IB_PORT_CAP_HAS_LED_INFO
> IB_PORT_CAP_HAS_SYS_IMG_GUID
> IB_PORT_CAP_HAS_VEND_CLS
> IB_PORT_CAP_HAS_CAP_NTC
> IB_PORT_CAP_HAS_CLIENT_REREG
> Jun 07 14:56:58 088499 [193D0700] 0x04 -> osm_pi_rcv_process: Discovered
> port num 1 with GUID 0x5ad00000c5ced for parent node GUID 0x5ad00000c5cec,
> TID 0x130e
>
>
> Then later, sm seems to have assigned a lid.
>
> Jun 07 14:56:58 090679 [161CB700] 0x08 -> PortInfo dump:
> port number..............1
> node_guid................**
> 0x0005ad00000c5cec
> port_guid................**
> 0x0005ad00000c5ced
> m_key....................**
> 0x0000000000000000
> subnet_prefix............**
> 0xfe80000000000001
> base_lid.................16
> master_sm_base_lid.......1
> capability_mask..........**0x2500A68
> diag_code................0x0
> m_key_lease_period.......0x0
> local_port_num...........1
> link_width_enabled.......0x3
> link_width_supported.....0x3
> link_width_active........0x2
> link_speed_supported.....0x3
> port_state...............INIT
> state_info2..............0x52
> m_key_protect_bits.......0x0
> lmc......................0x0
> link_speed...............0x13
> mtu_smsl.................0x40
> vl_cap_init_type.........0x30
> vl_high_limit............0x0
> vl_arb_high_cap..........0x8
> vl_arb_low_cap...........0x8
> init_rep_mtu_cap.........0x4
> vl_stall_life............0xFF
> vl_enforce...............0x30
> m_key_violations.........0x0
> p_key_violations.........0x0
> q_key_violations.........0x0
> guid_cap.................0x20
> client_reregister........0x1
> mcast_pkey_trap_suppr....0x0
> subnet_timeout...........0x12
> resp_time_value..........0x10
> error_threshold..........0x88
> max_credit_hint..........0x0
> link_round_trip_latency..0x0
> capability_mask2.........0x0
> link_speed_ext_active....0x0
> link_speed_ext_supported.0x0
> link_speed_ext_enabled...0x0
> Jun 07 14:56:58 090709 [161CB700] 0x08 -> Capability Mask:
> IB_PORT_CAP_HAS_TRAP
> IB_PORT_CAP_HAS_AUTO_MIG
> IB_PORT_CAP_HAS_SL_MAP
> IB_PORT_CAP_HAS_LED_INFO
> IB_PORT_CAP_HAS_SYS_IMG_GUID
> IB_PORT_CAP_HAS_VEND_CLS
> IB_PORT_CAP_HAS_CAP_NTC
> IB_PORT_CAP_HAS_CLIENT_REREG
> Jun 07 14:56:58 090713 [161CB700] 0x08 -> osm_pi_rcv_process: Client
> reregister received on response
> Jun 07 14:56:58 091294 [12FC6700] 0x10 -> osm_db_store: ]
> Jun 07 14:56:58 091301 [12FC6700] 0x10 -> osm_lid_mgr_process_subnet: ]
> Jun 07 14:56:58 091308 [161CB700] 0x10 -> pi_rcv_process_set: [
> Jun 07 14:56:58 091313 [161CB700] 0x08 -> pi_rcv_process_set: Received
> logical SetResp() for GUID 0x5ad00000c5ced, port num 1
> for parent node GUID 0x5ad00000c5cec TID
> 0x1311
> Jun 07 14:56:58 091320 [161CB700] 0x08 -> osm_db_update:
> Key:0x0005ad00000c5ced previously exists in:/var/cache/opensm/guid2mkey
> with value:0x0000000000000000
> Jun 07 14:56:58 091324 [161CB700] 0x10 -> pi_rcv_process_set: ]
> Jun 07 14:56:58 091327 [161CB700] 0x10 -> osm_pi_rcv_process: ]
>
> But I'm not really sure what I'm looking for.
>
>
> --
> Orion Poplawski
> Technical Manager 303-415-9701 x222
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane orion at nwra.com
> Boulder, CO 80301 http://www.nwra.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/60c2d0cf/attachment.html>
More information about the Users
mailing list