[Users] IPoIB not working on Windows 2008 r2 - need help
Hal Rosenstock
hal.rosenstock at gmail.com
Fri Jun 7 18:09:08 PDT 2013
On Fri, Jun 7, 2013 at 8:55 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:
>
>
> On Fri, Jun 7, 2013 at 6:52 PM, Orion Poplawski <orion at cora.nwra.com>wrote:
>
>> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>>
>> Also, if you turn on log verbosity on OpenSM temporarily and send me the
>>> log
>>> for that run, I could see what is going on with in terms of trying to
>>> set the
>>> non default subnet prefix with the Windows node. Given the log you sent,
>>> I can
>>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>>> which sets the subnet prefix but not really acting on it properly.
>>> -- Hal
>>>
>>
>> Full log is at http://sw.cora.nwra.com/test/**opensm.debug.log.gz<http://sw.cora.nwra.com/test/opensm.debug.log.gz>
>>
>>
> Looking at that log, I didn't see _any_ MC joins from that port (GUID
> 0x5ad00000c5ced) so this is a different scenario than before :-(
>
> Also, the previous confusion with:
>
> # saquery -m 0xc000
> PortGid.................fe80::**1:5:ad00:c:5c3d (Topspin DDR-HCAe LX x8)
>
> PortGid.................fe80::**1:19:bbff:ff00:5851 (saga mthca0)
> PortGid.................fe80::**1:19:bbff:ff00:3899 (sfcomp1 mthca0)
>
> PortGid.................fe80::**1:1a:4bff:ff0c:20c9 (HP Lion Cub 128MB)
> PortGid.................fe80::**5:ad00:c:5ced (MT25204 InfiniHostLx
> Mellanox Technologies)
> PortGid.................fe80::**1:17:8ff:ffd0:9df9 (alexandria2 HCA-1)
> GUID is 5:ad00:c:5ced and prefix is fe80::** so it's either missing a
> digit like 1 (fe80::1 like the others) or if it's a 0 it would have a 3rd
> colon (fe80:::). So I'm not sure what's going on there either.
>
>
I did find a half world PR query from that node though and it's GID looks
similar so I must be mistaken about the extra colon but the bottom line is
same that the prefix is set in SMA but not being used in the SA queries (PR
and MCM) issued by the Windows node.
Jun 07 14:56:58 353610 [11BC4700] 0x20 -> SA MAD dump:
base_ver................0x1
mgmt_class..............0x3
class_ver...............0x2
method..................0x12 (SubnAdmGetTable)
status..................0x0
resv....................0x0
trans_id................0x100000002
attr_id.................0x35 (PathRecord)
resv1...................0x0
attr_mod................0x0
rmpp_version............0x0
rmpp_type...............0x0
rmpp_flags..............0x0
rmpp_status.............0x0
seg_num.................0x0
payload_len/new_win.....0x0
sm_key..................0x0000000000000000
attr_offset.............0x8
resv2...................0x0
comp_mask...............0x0000000000003008
Jun 07 14:56:58 353743 [1A7D2700] 0x08 -> PathRecord dump:
service_id..............0x0000000000000000
dgid....................::
sgid....................fe80::5:ad00:c:5ced
dlid....................0
slid....................0
hop_flow_raw............0x0
tclass..................0x0
num_path_revers.........0xFF
pkey....................0xFFFF
qos_class...............0x0
sl......................0x0
mtu.....................0x0
rate....................0x0
pkt_life................0x0
preference..............0x0
resv2...................0x000000000000
>
>
>
>
>
>> I had fontdb shutdown when I started opensm - then booted it up.
>>
>
>
>> This seems to be when it first comes up (lid 0, prefix 0xfe80::0)
>>
>> Jun 07 14:56:58 088453 [193D0700] 0x10 -> osm_pi_rcv_process: [
>> Jun 07 14:56:58 088465 [193D0700] 0x08 -> PortInfo dump:
>> port number..............1
>> node_guid................**
>> 0x0005ad00000c5cec
>> port_guid................**
>> 0x0005ad00000c5ced
>> m_key....................**
>> 0x0000000000000000
>> subnet_prefix............**
>> 0xfe80000000000000
>> base_lid.................0
>> master_sm_base_lid.......0
>> capability_mask..........**0x2500A68
>> diag_code................0x0
>> m_key_lease_period.......0x0
>> local_port_num...........1
>> link_width_enabled.......0x3
>> link_width_supported.....0x3
>> link_width_active........0x2
>> link_speed_supported.....0x3
>> port_state...............INIT
>> state_info2..............0x52
>> m_key_protect_bits.......0x0
>> lmc......................0x0
>> link_speed...............0x13
>> mtu_smsl.................0x20
>> vl_cap_init_type.........0x30
>> vl_high_limit............0x0
>> vl_arb_high_cap..........0x8
>> vl_arb_low_cap...........0x8
>> init_rep_mtu_cap.........0x4
>> vl_stall_life............0xFF
>> vl_enforce...............0x30
>> m_key_violations.........0x0
>> p_key_violations.........0x0
>> q_key_violations.........0x0
>> guid_cap.................0x20
>> client_reregister........0x0
>> mcast_pkey_trap_suppr....0x0
>> subnet_timeout...........0x0
>> resp_time_value..........0x10
>> error_threshold..........0xF0
>> max_credit_hint..........0x0
>> link_round_trip_latency..0x0
>> capability_mask2.........0x0
>> link_speed_ext_active....0x0
>> link_speed_ext_supported.0x0
>> link_speed_ext_enabled...0x0
>> Jun 07 14:56:58 088495 [193D0700] 0x08 -> Capability Mask:
>> IB_PORT_CAP_HAS_TRAP
>> IB_PORT_CAP_HAS_AUTO_MIG
>> IB_PORT_CAP_HAS_SL_MAP
>> IB_PORT_CAP_HAS_LED_INFO
>> IB_PORT_CAP_HAS_SYS_IMG_GUID
>> IB_PORT_CAP_HAS_VEND_CLS
>> IB_PORT_CAP_HAS_CAP_NTC
>> IB_PORT_CAP_HAS_CLIENT_REREG
>> Jun 07 14:56:58 088499 [193D0700] 0x04 -> osm_pi_rcv_process: Discovered
>> port num 1 with GUID 0x5ad00000c5ced for parent node GUID 0x5ad00000c5cec,
>> TID 0x130e
>>
>>
>> Then later, sm seems to have assigned a lid.
>>
>> Jun 07 14:56:58 090679 [161CB700] 0x08 -> PortInfo dump:
>> port number..............1
>> node_guid................**
>> 0x0005ad00000c5cec
>> port_guid................**
>> 0x0005ad00000c5ced
>> m_key....................**
>> 0x0000000000000000
>> subnet_prefix............**
>> 0xfe80000000000001
>> base_lid.................16
>> master_sm_base_lid.......1
>> capability_mask..........**0x2500A68
>> diag_code................0x0
>> m_key_lease_period.......0x0
>> local_port_num...........1
>> link_width_enabled.......0x3
>> link_width_supported.....0x3
>> link_width_active........0x2
>> link_speed_supported.....0x3
>> port_state...............INIT
>> state_info2..............0x52
>> m_key_protect_bits.......0x0
>> lmc......................0x0
>> link_speed...............0x13
>> mtu_smsl.................0x40
>> vl_cap_init_type.........0x30
>> vl_high_limit............0x0
>> vl_arb_high_cap..........0x8
>> vl_arb_low_cap...........0x8
>> init_rep_mtu_cap.........0x4
>> vl_stall_life............0xFF
>> vl_enforce...............0x30
>> m_key_violations.........0x0
>> p_key_violations.........0x0
>> q_key_violations.........0x0
>> guid_cap.................0x20
>> client_reregister........0x1
>> mcast_pkey_trap_suppr....0x0
>> subnet_timeout...........0x12
>> resp_time_value..........0x10
>> error_threshold..........0x88
>> max_credit_hint..........0x0
>> link_round_trip_latency..0x0
>> capability_mask2.........0x0
>> link_speed_ext_active....0x0
>> link_speed_ext_supported.0x0
>> link_speed_ext_enabled...0x0
>> Jun 07 14:56:58 090709 [161CB700] 0x08 -> Capability Mask:
>> IB_PORT_CAP_HAS_TRAP
>> IB_PORT_CAP_HAS_AUTO_MIG
>> IB_PORT_CAP_HAS_SL_MAP
>> IB_PORT_CAP_HAS_LED_INFO
>> IB_PORT_CAP_HAS_SYS_IMG_GUID
>> IB_PORT_CAP_HAS_VEND_CLS
>> IB_PORT_CAP_HAS_CAP_NTC
>> IB_PORT_CAP_HAS_CLIENT_REREG
>> Jun 07 14:56:58 090713 [161CB700] 0x08 -> osm_pi_rcv_process: Client
>> reregister received on response
>> Jun 07 14:56:58 091294 [12FC6700] 0x10 -> osm_db_store: ]
>> Jun 07 14:56:58 091301 [12FC6700] 0x10 -> osm_lid_mgr_process_subnet: ]
>> Jun 07 14:56:58 091308 [161CB700] 0x10 -> pi_rcv_process_set: [
>> Jun 07 14:56:58 091313 [161CB700] 0x08 -> pi_rcv_process_set: Received
>> logical SetResp() for GUID 0x5ad00000c5ced, port num 1
>> for parent node GUID 0x5ad00000c5cec TID
>> 0x1311
>> Jun 07 14:56:58 091320 [161CB700] 0x08 -> osm_db_update:
>> Key:0x0005ad00000c5ced previously exists in:/var/cache/opensm/guid2mkey
>> with value:0x0000000000000000
>> Jun 07 14:56:58 091324 [161CB700] 0x10 -> pi_rcv_process_set: ]
>> Jun 07 14:56:58 091327 [161CB700] 0x10 -> osm_pi_rcv_process: ]
>>
>> But I'm not really sure what I'm looking for.
>>
>>
>> --
>> Orion Poplawski
>> Technical Manager 303-415-9701 x222
>> NWRA, Boulder/CoRA Office FAX: 303-415-9702
>> 3380 Mitchell Lane orion at nwra.com
>> Boulder, CO 80301 http://www.nwra.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/9b6f0477/attachment.html>
More information about the Users
mailing list