[Users] IPoIB not working on Windows 2008 r2 - need help
Hal Rosenstock
hal.rosenstock at gmail.com
Fri Jun 7 15:05:49 PDT 2013
On Fri, Jun 7, 2013 at 4:42 PM, Hal Rosenstock <hal.rosenstock at gmail.com>wrote:
>
>
> On Fri, Jun 7, 2013 at 4:35 PM, Orion Poplawski <orion at cora.nwra.com>wrote:
>
>> On 06/07/2013 02:23 PM, Hal Rosenstock wrote:
>>
>> Looks like your 2 subnets are "interconnected" so they're not
>>> really 2
>>> disjoint subnets! Is your other subnet 0xfe80::5 ? Looking at
>>> your
>>> ibnetdiscover file, there's only 1 switch so are you running 2
>>> SMs
>>> (one for
>>> each subnet) over the same topology. If so, that doesn't work.
>>>
>>>
>>> I should only have 2 subnets, and we should only be seeing the
>>> 0xfe80::1
>>> subnet here (there is a 0xfe80::2 subnet that consist only of two
>>> machines
>>> (amos and andrew) directly connected together). With the MT25204
>>> windows
>>> machine, 5:ad00:c:5ced is the GUID I believe, so it looks like it
>>> may have
>>> a prefix of 0xfe80::0 ? I confirmed that the SM service on the
>>> windows
>>> machine (fontdb) is disabled and stopped. So I have no idea why it
>>> isn't
>>> getting a prefix of 0xfe80::1.
>>>
>>> Yes, I see now. It does have the default subnet prefix rather than the
>>> one you
>>> configured in the SM. This is evidence of what you asked before which is
>>> why
>>> you probably asked. I don't know whether or not non default subnet
>>> prefixes
>>> work on Windows. Is there any reason you want to run this with other
>>> than the
>>> default subnet prefix ? If not, can you try that and see if things work ?
>>> While it is legal to have different IB subnets on the same IPoIB subnet,
>>> that
>>> requires an IB router and isn't your intent anyway.
>>>
>>
>>
>> This is one reason I'm running with a non-default subnet ID:
>>
>> http://www.open-mpi.org/faq/?**category=openfabrics#ofa-**
>> default-subnet-gid<http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid>
>>
>> and I do have some multi-homed machines (amos and andrew above) and may
>> add some more.
>>
>>
>>
> Can you run the default prefix on the problematic subnet and another non
> default one on the other back to back one (at least to see if it works or
> not) ?
>
>
One other experiment to try:
On the SM node (saga),
first try
smpquery -D nd 0,1,1
and you should see andrew mthca0
and if so, try
smpquery -D pi 0,1,1 1
and look at LID (should be 15) and GidPrefix. I wonder if it'll be
0xfe80000000000000 or 0xfe80000000000001.
We'll then be able see if it's an SMA (and maybe more issue) or IPoIB issue.
-- Hal
>
>
>
>
>> Also, if you turn on log verbosity on OpenSM temporarily and send me the
>>> log
>>> for that run, I could see what is going on with in terms of trying to
>>> set the
>>> non default subnet prefix with the Windows node. Given the log you sent,
>>> I can
>>> only imagine that the SMA on the Windows node is ack'ing the PortInfo set
>>> which sets the subnet prefix but not really acting on it properly.
>>> -- Hal
>>>
>>
>> There are a lot of different levels for verbosity. What would be useful
>> (but perhaps not too much)?
>>
>
> I would just go for 0xFF and get the too much version for now for the
> purposes of debug and then switch it back.
>
> -- Hal
>
>
>>
>> Thanks!
>>
>>
>>
>>
>> --
>> Orion Poplawski
>> Technical Manager 303-415-9701 x222
>> NWRA, Boulder/CoRA Office FAX: 303-415-9702
>> 3380 Mitchell Lane orion at nwra.com
>> Boulder, CO 80301 http://www.nwra.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/fa62f4c2/attachment.html>
More information about the Users
mailing list