[Users] IPoIB not working on Windows 2008 r2 - need help
Orion Poplawski
orion at cora.nwra.com
Fri Jun 7 12:38:44 PDT 2013
On 06/07/2013 12:56 PM, Hal Rosenstock wrote:
> Would you send me the output of an ibnetdiscover for your subnet ?
>
>
> Which is SM host ?
saga is the SM host.
>
> #
> # Topology file: generated on Fri Jun 7 10:43:36 2013
> #
> # Initiated from node 0019bbffff005850 port 0019bbffff005851
>
> vendid=0x66a
> devid=0xb924
> sysimgguid=0x66a00d8000242
> switchguid=0x66a00d8000242(__66a00d8000242)
> Switch 24 "S-00066a00d8000242" # "InfinIO 9024 Switch " enhanced
> port 0 lid 2 lmc 0
> [1] "H-0005ad00000c5c3c"[1](__5ad00000c5c3d) # "andrew
> mthca0" lid 15 4xSDR
> [6] "H-001708ffffd09df8"[1](__1708ffffd09df9) #
> "alexandria2 HCA-1" lid 4 4xSDR
> [8] "H-001708ffffd09df8"[2](__1708ffffd09dfa) #
> "alexandria2 HCA-1" lid 5 4xSDR
> [10] "H-0019bbffff005850"[1](__19bbffff005851) # "saga
> mthca0" lid 1 4xSDR
> [11] "H-0019bbffff003898"[2](__19bbffff00389a) #
> "sfcomp1 mthca0" lid 9 4xSDR
> [12] "H-001a4bffff0c20c8"[1](__1a4bffff0c20c9) # "earth
> mthca0" lid 13 4xSDR
> [20] "H-0005ad00000c5cec"[1](__5ad00000c5ced) # "MT25204
> InfiniHostLx Mellanox Technologies" lid 16 4xSDR
> [23] "H-0019bbffff003898"[1](__19bbffff003899) #
> "sfcomp1 mthca0" lid 8 4xSDR
>
> vendid=0x2c9
> devid=0x6274
> sysimgguid=0x5ad00000c5cef
> caguid=0x5ad00000c5cec
> Ca 1 "H-0005ad00000c5cec" # "MT25204 InfiniHostLx Mellanox
> Technologies"
> [1](5ad00000c5ced) "S-00066a00d8000242"[20] # lid 16
> lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>
> vendid=0x1708
> devid=0x6278
> sysimgguid=0x1a4bffff0c20cb
> caguid=0x1a4bffff0c20c8
> Ca 2 "H-001a4bffff0c20c8" # "earth mthca0"
> [1](1a4bffff0c20c9) "S-00066a00d8000242"[12] # lid 13
> lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>
> vendid=0x1708
> devid=0x6278
> sysimgguid=0x19bbffff00389b
> caguid=0x19bbffff003898
> Ca 2 "H-0019bbffff003898" # "sfcomp1 mthca0"
> [1](19bbffff003899) "S-00066a00d8000242"[23] # lid 8
> lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
> [2](19bbffff00389a) "S-00066a00d8000242"[11] # lid 9
> lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>
> vendid=0x5ad
> devid=0x6274
> sysimgguid=0x5ad00000c5c3f
> caguid=0x5ad00000c5c3c
> Ca 1 "H-0005ad00000c5c3c" # "andrew mthca0"
> [1](5ad00000c5c3d) "S-00066a00d8000242"[1] # lid 15 lmc 0
> "InfinIO 9024 Switch " lid 2 4xSDR
>
> vendid=0x1708
> devid=0x6278
> sysimgguid=0x1708ffffd09dfb
> caguid=0x1708ffffd09df8
> Ca 2 "H-001708ffffd09df8" # "alexandria2 HCA-1"
> [1](1708ffffd09df9) "S-00066a00d8000242"[6] # lid 4 lmc 0
> "InfinIO 9024 Switch " lid 2 4xSDR
> [2](1708ffffd09dfa) "S-00066a00d8000242"[8] # lid 5 lmc 0
> "InfinIO 9024 Switch " lid 2 4xSDR
>
> vendid=0x1708
> devid=0x6278
> sysimgguid=0x19bbffff005853
> caguid=0x19bbffff005850
> Ca 2 "H-0019bbffff005850" # "saga mthca0"
> [1](19bbffff005851) "S-00066a00d8000242"[10] # lid 1
> lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>
>
>
>
>
> # saquery -m 0xc000
> PortGid.................fe80::__1:5:ad00:c:5c3d (Topspin
> DDR-HCAe LX x8)
>
> PortGid.................fe80::__1:19:bbff:ff00:5851 (saga
> mthca0)
> PortGid.................fe80::__1:19:bbff:ff00:3899
> (sfcomp1 mthca0)
>
> PortGid.................fe80::__1:1a:4bff:ff0c:20c9 (HP
> Lion Cub 128MB)
> PortGid.................fe80::__5:ad00:c:5ced (MT25204
> InfiniHostLx Mellanox Technologies)
> PortGid.................fe80::__1:17:8ff:ffd0:9df9
> (alexandria2 HCA-1)
>
>
> Seems like I may have two entries for the 5:ad00:c:5ced device?
>
> Looks different to me than 5:ad00:c:5c3d which is Topspin one
Ah, didn't catch that. The Topspin then is andrew.
>
> Perhaps updating the firmware led to that (now it is MT25204 instead of
> Topspin).
>
> Looks like your 2 subnets are "interconnected" so they're not really 2
> disjoint subnets! Is your other subnet 0xfe80::5 ? Looking at your
> ibnetdiscover file, there's only 1 switch so are you running 2 SMs (one for
> each subnet) over the same topology. If so, that doesn't work.
I should only have 2 subnets, and we should only be seeing the 0xfe80::1
subnet here (there is a 0xfe80::2 subnet that consist only of two machines
(amos and andrew) directly connected together). With the MT25204 windows
machine, 5:ad00:c:5ced is the GUID I believe, so it looks like it may have a
prefix of 0xfe80::0 ? I confirmed that the SM service on the windows machine
(fontdb) is disabled and stopped. So I have no idea why it isn't getting a
prefix of 0xfe80::1.
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion at nwra.com
Boulder, CO 80301 http://www.nwra.com
More information about the Users
mailing list