[Users] IPoIB not working on Windows 2008 r2 - need help

Hal Rosenstock hal.rosenstock at gmail.com
Fri Jun 7 18:09:08 PDT 2013


On Fri, Jun 7, 2013 at 8:55 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:

>
>
> On Fri, Jun 7, 2013 at 6:52 PM, Orion Poplawski <orion at cora.nwra.com>wrote:
>
>> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>>
>>  Also, if you turn on log verbosity on OpenSM temporarily and send me the
>>> log
>>> for that run, I could see what is going on with in terms of trying to
>>> set the
>>> non default subnet prefix with the Windows node. Given the log you sent,
>>> I can
>>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>>> which sets the subnet prefix but not really acting on it properly.
>>> -- Hal
>>>
>>
>> Full log is at http://sw.cora.nwra.com/test/**opensm.debug.log.gz<http://sw.cora.nwra.com/test/opensm.debug.log.gz>
>>
>>
> Looking at that log, I didn't see _any_ MC joins from that port (GUID
> 0x5ad00000c5ced) so this is a different scenario than before :-(
>
> Also, the previous confusion with:
>
> # saquery -m 0xc000
> PortGid.................fe80::**1:5:ad00:c:5c3d (Topspin DDR-HCAe LX x8)
>
> PortGid.................fe80::**1:19:bbff:ff00:5851 (saga mthca0)
>  PortGid.................fe80::**1:19:bbff:ff00:3899 (sfcomp1 mthca0)
>
> PortGid.................fe80::**1:1a:4bff:ff0c:20c9 (HP Lion Cub 128MB)
> PortGid.................fe80::**5:ad00:c:5ced (MT25204 InfiniHostLx
> Mellanox Technologies)
> PortGid.................fe80::**1:17:8ff:ffd0:9df9 (alexandria2 HCA-1)
> GUID is 5:ad00:c:5ced and prefix is fe80::** so it's either missing a
> digit like 1 (fe80::1 like the others) or if it's a 0 it would have a 3rd
> colon (fe80:::). So I'm not sure what's going on there either.
>
>
I did find a half world PR query from that node though and it's GID looks
similar so I must be mistaken about the extra colon but the bottom line is
same that the prefix is set in SMA but not being used in the SA queries (PR
and MCM) issued by the Windows node.


Jun 07 14:56:58 353610 [11BC4700] 0x20 -> SA MAD dump:

                     base_ver................0x1

                     mgmt_class..............0x3

                     class_ver...............0x2

                     method..................0x12 (SubnAdmGetTable)

                     status..................0x0

                     resv....................0x0

                     trans_id................0x100000002

                     attr_id.................0x35 (PathRecord)

                     resv1...................0x0

                     attr_mod................0x0

                     rmpp_version............0x0

                     rmpp_type...............0x0

                     rmpp_flags..............0x0

                     rmpp_status.............0x0

                     seg_num.................0x0

                     payload_len/new_win.....0x0

                     sm_key..................0x0000000000000000

                     attr_offset.............0x8

                     resv2...................0x0

                     comp_mask...............0x0000000000003008



Jun 07 14:56:58 353743 [1A7D2700] 0x08 -> PathRecord dump:

                     service_id..............0x0000000000000000

                     dgid....................::

                     sgid....................fe80::5:ad00:c:5ced

                     dlid....................0

                     slid....................0

                     hop_flow_raw............0x0

                     tclass..................0x0

                     num_path_revers.........0xFF

                     pkey....................0xFFFF

                     qos_class...............0x0

                     sl......................0x0

                     mtu.....................0x0

                     rate....................0x0

                     pkt_life................0x0

                     preference..............0x0

                     resv2...................0x000000000000


>
>
>
>
>
>> I had fontdb shutdown when I started opensm - then booted it up.
>>
>
>
>> This seems to be when it first comes up (lid 0, prefix 0xfe80::0)
>>
>> Jun 07 14:56:58 088453 [193D0700] 0x10 -> osm_pi_rcv_process: [
>> Jun 07 14:56:58 088465 [193D0700] 0x08 -> PortInfo dump:
>>                                 port number..............1
>>                                 node_guid................**
>> 0x0005ad00000c5cec
>>                                 port_guid................**
>> 0x0005ad00000c5ced
>>                                 m_key....................**
>> 0x0000000000000000
>>                                 subnet_prefix............**
>> 0xfe80000000000000
>>                                 base_lid.................0
>>                                 master_sm_base_lid.......0
>>                                 capability_mask..........**0x2500A68
>>                                 diag_code................0x0
>>                                 m_key_lease_period.......0x0
>>                                 local_port_num...........1
>>                                 link_width_enabled.......0x3
>>                                 link_width_supported.....0x3
>>                                 link_width_active........0x2
>>                                 link_speed_supported.....0x3
>>                                 port_state...............INIT
>>                                 state_info2..............0x52
>>                                 m_key_protect_bits.......0x0
>>                                 lmc......................0x0
>>                                 link_speed...............0x13
>>                                 mtu_smsl.................0x20
>>                                 vl_cap_init_type.........0x30
>>                                 vl_high_limit............0x0
>>                                 vl_arb_high_cap..........0x8
>>                                 vl_arb_low_cap...........0x8
>>                                 init_rep_mtu_cap.........0x4
>>                                 vl_stall_life............0xFF
>>                                 vl_enforce...............0x30
>>                                 m_key_violations.........0x0
>>                                 p_key_violations.........0x0
>>                                 q_key_violations.........0x0
>>                                 guid_cap.................0x20
>>                                 client_reregister........0x0
>>                                 mcast_pkey_trap_suppr....0x0
>>                                 subnet_timeout...........0x0
>>                                 resp_time_value..........0x10
>>                                 error_threshold..........0xF0
>>                                 max_credit_hint..........0x0
>>                                 link_round_trip_latency..0x0
>>                                 capability_mask2.........0x0
>>                                 link_speed_ext_active....0x0
>>                                 link_speed_ext_supported.0x0
>>                                 link_speed_ext_enabled...0x0
>> Jun 07 14:56:58 088495 [193D0700] 0x08 -> Capability Mask:
>>                                 IB_PORT_CAP_HAS_TRAP
>>                                 IB_PORT_CAP_HAS_AUTO_MIG
>>                                 IB_PORT_CAP_HAS_SL_MAP
>>                                 IB_PORT_CAP_HAS_LED_INFO
>>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>>                                 IB_PORT_CAP_HAS_VEND_CLS
>>                                 IB_PORT_CAP_HAS_CAP_NTC
>>                                 IB_PORT_CAP_HAS_CLIENT_REREG
>> Jun 07 14:56:58 088499 [193D0700] 0x04 -> osm_pi_rcv_process: Discovered
>> port num 1 with GUID 0x5ad00000c5ced for parent node GUID 0x5ad00000c5cec,
>> TID 0x130e
>>
>>
>> Then later, sm seems to have assigned a lid.
>>
>> Jun 07 14:56:58 090679 [161CB700] 0x08 -> PortInfo dump:
>>                                 port number..............1
>>                                 node_guid................**
>> 0x0005ad00000c5cec
>>                                 port_guid................**
>> 0x0005ad00000c5ced
>>                                 m_key....................**
>> 0x0000000000000000
>>                                 subnet_prefix............**
>> 0xfe80000000000001
>>                                 base_lid.................16
>>                                 master_sm_base_lid.......1
>>                                 capability_mask..........**0x2500A68
>>                                 diag_code................0x0
>>                                 m_key_lease_period.......0x0
>>                                 local_port_num...........1
>>                                 link_width_enabled.......0x3
>>                                 link_width_supported.....0x3
>>                                 link_width_active........0x2
>>                                 link_speed_supported.....0x3
>>                                 port_state...............INIT
>>                                 state_info2..............0x52
>>                                 m_key_protect_bits.......0x0
>>                                 lmc......................0x0
>>                                 link_speed...............0x13
>>                                 mtu_smsl.................0x40
>>                                 vl_cap_init_type.........0x30
>>                                 vl_high_limit............0x0
>>                                 vl_arb_high_cap..........0x8
>>                                 vl_arb_low_cap...........0x8
>>                                 init_rep_mtu_cap.........0x4
>>                                 vl_stall_life............0xFF
>>                                 vl_enforce...............0x30
>>                                 m_key_violations.........0x0
>>                                 p_key_violations.........0x0
>>                                 q_key_violations.........0x0
>>                                 guid_cap.................0x20
>>                                 client_reregister........0x1
>>                                 mcast_pkey_trap_suppr....0x0
>>                                 subnet_timeout...........0x12
>>                                 resp_time_value..........0x10
>>                                 error_threshold..........0x88
>>                                 max_credit_hint..........0x0
>>                                 link_round_trip_latency..0x0
>>                                 capability_mask2.........0x0
>>                                 link_speed_ext_active....0x0
>>                                 link_speed_ext_supported.0x0
>>                                 link_speed_ext_enabled...0x0
>> Jun 07 14:56:58 090709 [161CB700] 0x08 -> Capability Mask:
>>                                 IB_PORT_CAP_HAS_TRAP
>>                                 IB_PORT_CAP_HAS_AUTO_MIG
>>                                 IB_PORT_CAP_HAS_SL_MAP
>>                                 IB_PORT_CAP_HAS_LED_INFO
>>                                 IB_PORT_CAP_HAS_SYS_IMG_GUID
>>                                 IB_PORT_CAP_HAS_VEND_CLS
>>                                 IB_PORT_CAP_HAS_CAP_NTC
>>                                 IB_PORT_CAP_HAS_CLIENT_REREG
>> Jun 07 14:56:58 090713 [161CB700] 0x08 -> osm_pi_rcv_process: Client
>> reregister received on response
>> Jun 07 14:56:58 091294 [12FC6700] 0x10 -> osm_db_store: ]
>> Jun 07 14:56:58 091301 [12FC6700] 0x10 -> osm_lid_mgr_process_subnet: ]
>> Jun 07 14:56:58 091308 [161CB700] 0x10 -> pi_rcv_process_set: [
>> Jun 07 14:56:58 091313 [161CB700] 0x08 -> pi_rcv_process_set: Received
>> logical SetResp() for GUID 0x5ad00000c5ced, port num 1
>>                                 for parent node GUID 0x5ad00000c5cec TID
>> 0x1311
>> Jun 07 14:56:58 091320 [161CB700] 0x08 -> osm_db_update:
>> Key:0x0005ad00000c5ced previously exists in:/var/cache/opensm/guid2mkey
>> with value:0x0000000000000000
>> Jun 07 14:56:58 091324 [161CB700] 0x10 -> pi_rcv_process_set: ]
>> Jun 07 14:56:58 091327 [161CB700] 0x10 -> osm_pi_rcv_process: ]
>>
>> But I'm not really sure what I'm looking for.
>>
>>
>> --
>> Orion Poplawski
>> Technical Manager                     303-415-9701 x222
>> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
>> 3380 Mitchell Lane                       orion at nwra.com
>> Boulder, CO 80301                   http://www.nwra.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/9b6f0477/attachment.html>


More information about the Users mailing list