[Users] IPoIB not working on Windows 2008 r2 - need help

Hal Rosenstock hal.rosenstock at gmail.com
Fri Jun 7 15:05:49 PDT 2013


On Fri, Jun 7, 2013 at 4:42 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:

>
>
> On Fri, Jun 7, 2013 at 4:35 PM, Orion Poplawski <orion at cora.nwra.com>wrote:
>
>> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>>
>>          Looks like your 2 subnets are "interconnected" so they're not
>>> really 2
>>>         disjoint subnets! Is your other subnet 0xfe80::5 ? Looking at
>>> your
>>>         ibnetdiscover file, there's only 1 switch so are you running 2
>>> SMs
>>>         (one for
>>>         each subnet) over the same topology. If so, that doesn't work.
>>>
>>>
>>>     I should only have 2 subnets, and we should only be seeing the
>>> 0xfe80::1
>>>     subnet here (there is a 0xfe80::2 subnet that consist only of two
>>> machines
>>>     (amos and andrew) directly connected together).  With the MT25204
>>> windows
>>>     machine, 5:ad00:c:5ced is the GUID I believe, so it looks like it
>>> may have
>>>     a prefix of 0xfe80::0 ?  I confirmed that the SM service on the
>>> windows
>>>     machine (fontdb) is disabled and stopped.  So I have no idea why it
>>> isn't
>>>     getting a prefix of 0xfe80::1.
>>>
>>> Yes, I see now. It does have the default subnet prefix rather than the
>>> one you
>>> configured in the SM. This is evidence of what you asked before which is
>>> why
>>> you probably asked. I don't know whether or not non default subnet
>>> prefixes
>>> work on Windows. Is there any reason you want to run this with other
>>> than the
>>> default subnet prefix ? If not, can you try that and see if things work ?
>>> While it is legal to have different IB subnets on the same IPoIB subnet,
>>> that
>>> requires an IB router and isn't your intent anyway.
>>>
>>
>>
>> This is one reason I'm running with a non-default subnet ID:
>>
>> http://www.open-mpi.org/faq/?**category=openfabrics#ofa-**
>> default-subnet-gid<http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid>
>>
>> and I do have some multi-homed machines (amos and andrew above) and may
>> add some more.
>>
>>
>>
> Can you run the default prefix on the problematic subnet and another non
> default one on the other back to back one (at least to see if it works or
> not) ?
>
>

One other experiment to try:

On the SM node (saga),
first try
smpquery -D nd 0,1,1
and you should see andrew mthca0

and if so, try
smpquery -D pi 0,1,1 1
and look at LID (should be 15) and  GidPrefix. I wonder if it'll be
0xfe80000000000000 or 0xfe80000000000001.

We'll then be able see if it's an SMA (and maybe more issue) or IPoIB issue.

-- Hal


>
>
>
>
>>  Also, if you turn on log verbosity on OpenSM temporarily and send me the
>>> log
>>> for that run, I could see what is going on with in terms of trying to
>>> set the
>>> non default subnet prefix with the Windows node. Given the log you sent,
>>> I can
>>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>>> which sets the subnet prefix but not really acting on it properly.
>>> -- Hal
>>>
>>
>> There are a lot of different levels for verbosity.  What would be useful
>> (but perhaps not too much)?
>>
>
> I would just go for 0xFF and get the too much version for now for the
> purposes of debug and then switch it back.
>
> -- Hal
>
>
>>
>> Thanks!
>>
>>
>>
>>
>> --
>> Orion Poplawski
>> Technical Manager                     303-415-9701 x222
>> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
>> 3380 Mitchell Lane                       orion at nwra.com
>> Boulder, CO 80301                   http://www.nwra.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/fa62f4c2/attachment.html>


More information about the Users mailing list