[Users] IPoIB not working on Windows 2008 r2 - need help

Hal Rosenstock hal.rosenstock at gmail.com
Fri Jun 7 13:23:51 PDT 2013


On Fri, Jun 7, 2013 at 3:38 PM, Orion Poplawski <orion at cora.nwra.com> wrote:

> On 06/07/2013 12:56 PM, Hal Rosenstock wrote:
>
>          Would you send me the output of an ibnetdiscover for your subnet ?
>>
>>
>> Which is SM host ?
>>
>
> saga is the SM host.
>
>
>>     #
>>     # Topology file: generated on Fri Jun  7 10:43:36 2013
>>     #
>>     # Initiated from node 0019bbffff005850 port 0019bbffff005851
>>
>>     vendid=0x66a
>>     devid=0xb924
>>     sysimgguid=0x66a00d8000242
>>     switchguid=0x66a00d8000242(__**66a00d8000242)
>>
>>     Switch  24 "S-00066a00d8000242"         # "InfinIO 9024 Switch "
>> enhanced
>>     port 0 lid 2 lmc 0
>>     [1]     "H-0005ad00000c5c3c"[1](__**5ad00000c5c3d)          # "andrew
>>     mthca0" lid 15 4xSDR
>>     [6]     "H-001708ffffd09df8"[1](__**1708ffffd09df9)                 #
>>
>>     "alexandria2 HCA-1" lid 4 4xSDR
>>     [8]     "H-001708ffffd09df8"[2](__**1708ffffd09dfa)                 #
>>
>>     "alexandria2 HCA-1" lid 5 4xSDR
>>     [10]    "H-0019bbffff005850"[1](__**19bbffff005851)
>> # "saga
>>     mthca0" lid 1 4xSDR
>>     [11]    "H-0019bbffff003898"[2](__**19bbffff00389a)                 #
>>
>>     "sfcomp1 mthca0" lid 9 4xSDR
>>     [12]    "H-001a4bffff0c20c8"[1](__**1a4bffff0c20c9)
>> # "earth
>>     mthca0" lid 13 4xSDR
>>     [20]    "H-0005ad00000c5cec"[1](__**5ad00000c5ced)          #
>> "MT25204
>>
>>     InfiniHostLx Mellanox Technologies" lid 16 4xSDR
>>     [23]    "H-0019bbffff003898"[1](__**19bbffff003899)                 #
>>
>>     "sfcomp1 mthca0" lid 8 4xSDR
>>
>>     vendid=0x2c9
>>     devid=0x6274
>>     sysimgguid=0x5ad00000c5cef
>>     caguid=0x5ad00000c5cec
>>     Ca      1 "H-0005ad00000c5cec"          # "MT25204 InfiniHostLx
>> Mellanox
>>     Technologies"
>>     [1](5ad00000c5ced)      "S-00066a00d8000242"[20]                # lid
>> 16
>>     lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>     vendid=0x1708
>>     devid=0x6278
>>     sysimgguid=0x1a4bffff0c20cb
>>     caguid=0x1a4bffff0c20c8
>>     Ca      2 "H-001a4bffff0c20c8"          # "earth mthca0"
>>     [1](1a4bffff0c20c9)     "S-00066a00d8000242"[12]                # lid
>> 13
>>     lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>     vendid=0x1708
>>     devid=0x6278
>>     sysimgguid=0x19bbffff00389b
>>     caguid=0x19bbffff003898
>>     Ca      2 "H-0019bbffff003898"          # "sfcomp1 mthca0"
>>     [1](19bbffff003899)     "S-00066a00d8000242"[23]                # lid
>> 8
>>     lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>>     [2](19bbffff00389a)     "S-00066a00d8000242"[11]                # lid
>> 9
>>     lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>     vendid=0x5ad
>>     devid=0x6274
>>     sysimgguid=0x5ad00000c5c3f
>>     caguid=0x5ad00000c5c3c
>>     Ca      1 "H-0005ad00000c5c3c"          # "andrew mthca0"
>>     [1](5ad00000c5c3d)      "S-00066a00d8000242"[1]         # lid 15 lmc 0
>>     "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>     vendid=0x1708
>>     devid=0x6278
>>     sysimgguid=0x1708ffffd09dfb
>>     caguid=0x1708ffffd09df8
>>     Ca      2 "H-001708ffffd09df8"          # "alexandria2 HCA-1"
>>     [1](1708ffffd09df9)     "S-00066a00d8000242"[6]         # lid 4 lmc 0
>>     "InfinIO 9024 Switch " lid 2 4xSDR
>>     [2](1708ffffd09dfa)     "S-00066a00d8000242"[8]         # lid 5 lmc 0
>>     "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>     vendid=0x1708
>>     devid=0x6278
>>     sysimgguid=0x19bbffff005853
>>     caguid=0x19bbffff005850
>>     Ca      2 "H-0019bbffff005850"          # "saga mthca0"
>>     [1](19bbffff005851)     "S-00066a00d8000242"[10]                # lid
>> 1
>>     lmc 0 "InfinIO 9024 Switch " lid 2 4xSDR
>>
>>
>>
>>
>>
>
>
>>     # saquery -m 0xc000
>>                      PortGid.................fe80::**__1:5:ad00:c:5c3d
>> (Topspin
>>     DDR-HCAe LX x8)
>>
>>                      PortGid.................fe80::**__1:19:bbff:ff00:5851
>> (saga
>>     mthca0)
>>                      PortGid.................fe80::**
>> __1:19:bbff:ff00:3899
>>     (sfcomp1 mthca0)
>>
>>                      PortGid.................fe80::**__1:1a:4bff:ff0c:20c9
>> (HP
>>     Lion Cub 128MB)
>>                      PortGid.................fe80::**__5:ad00:c:5ced
>> (MT25204
>>     InfiniHostLx Mellanox Technologies)
>>                      PortGid.................fe80::**__1:17:8ff:ffd0:9df9
>>
>>     (alexandria2 HCA-1)
>>
>>
>>     Seems like I may have two entries for the 5:ad00:c:5ced device?
>>
>> Looks different to me than  5:ad00:c:5c3d which is Topspin one
>>
>
> Ah, didn't catch that.  The Topspin then is andrew.
>
>
>
>>     Perhaps updating the firmware led to that (now it is MT25204 instead
>> of
>>     Topspin).
>>
>> Looks like your 2 subnets are "interconnected" so they're not really 2
>> disjoint subnets! Is your other subnet 0xfe80::5 ? Looking at your
>> ibnetdiscover file, there's only 1 switch so are you running 2 SMs (one
>> for
>> each subnet) over the same topology. If so, that doesn't work.
>>
>
> I should only have 2 subnets, and we should only be seeing the 0xfe80::1
> subnet here (there is a 0xfe80::2 subnet that consist only of two machines
> (amos and andrew) directly connected together).  With the MT25204 windows
> machine, 5:ad00:c:5ced is the GUID I believe, so it looks like it may have
> a prefix of 0xfe80::0 ?  I confirmed that the SM service on the windows
> machine (fontdb) is disabled and stopped.  So I have no idea why it isn't
> getting a prefix of 0xfe80::1.
>
>
>
Yes, I see now. It does have the default subnet prefix rather than the one
you configured in the SM. This is evidence of what you asked before which
is why you probably asked. I don't know whether or not non default subnet
prefixes work on Windows. Is there any reason you want to run this with
other than the default subnet prefix ? If not, can you try that and see if
things work ? While it is legal to have different IB subnets on the same
IPoIB subnet, that requires an IB router and isn't your intent anyway.

Also, if you turn on log verbosity on OpenSM temporarily and send me the
log for that run, I could see what is going on with in terms of trying to
set the non default subnet prefix with the Windows node. Given the log you
sent, I can only imagine that the SMA on the Windows node is ack'ing the
PortInfo set which sets the subnet prefix but not really acting on it
properly.

-- Hal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/3943cc32/attachment.html>


More information about the Users mailing list