[Users] IPoIB not working on Windows 2008 r2 - need help

Hal Rosenstock hal.rosenstock at gmail.com
Fri Jun 7 09:09:25 PDT 2013


On Fri, Jun 7, 2013 at 11:14 AM, Orion Poplawski <orion at cora.nwra.com>wrote:

> On 06/07/2013 04:31 AM, Hal Rosenstock wrote:
>
>>
>>
>> On Thu, Jun 6, 2013 at 3:32 PM, Orion Poplawski <orion at cora.nwra.com
>> <mailto:orion at cora.nwra.com>> wrote:
>>
>>     I'm trying for the first time to get IPoIB working on one of our
>> Windows
>>     servers.  The network is working fine between some Linux machines.
>>  Details:
>>
>>     InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
>>     Windows Server 2008 r2
>>     MLNX_WinOF_VPI_2_1_2_win7_x64.**__msi (as recommended by the mellanox
>>
>>     download page for InfiniHost III adapters)
>>
>>     I don't notice any errors, the adapter shows up fine and I can
>> configure
>>     it with a static IP address.  After configuring it (or after boot) I
>> can
>>     ping it from another machine for about 10 seconds before it stops
>>     responding.  When I ping out from the machine at this point, the icmp
>>     packets are being sent out the main ethernet interface (which is a
>>     different IP network) and I can see them get to our router.  ibdiagnet
>>     does not report any errors.  ipconfig and netstart -r seem fine.
>>
>>     I see the following in my opensm log:
>>
> >
>
>> IPoIB in Windows deletes from the IPoIB broadcast IB multicast group
>> before
>>
>> joining so if that port isn't a member of that MC group you will see this
>> so
>> these aren't necessarily "bad" from an SM perspective.
>>
>
> I thought that might be the case, thanks.
>
>
>  How is your partition file for OpenSM setup ? You should have the ipoib
>> flag
>> on for the default partition.
>> Which OpenSM are you using here ? A Windows or Linux node ? Which version
>> ?
>>
>
> IPoIB is working fine among my Linux machines, I'm just trying to add
> Windows to the mix.  I'm running opensm 3.3.15 on SL 6.
>

What's SL 6 ?


>
> /etc/rdma/partitions:
> Default=0x7fff, ipoib : ALL=full ;
>
>
>
I have a theory as to what is going on. I think the IB port on Windows is
too slow (IB rate * width) to join the already formed group but I thought
you wrote something about IPoIB link being indicated by Windows IPoIB
driver/net device.

Would you send me the output of an ibnetdiscover for your subnet ?


>
>  You should check what saquery -m 0xc000 says after looking at saquery -g
>> to
>> make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff )
>> says
>> MLID 0xc000.
>>
>
> That is fine on my opensm machine.  I don't seem to have saquery on the
> Windows machine.


You can run it from the opensm machine to see if Windows machine is part of
the IPoIB broadcast group. I want to see both group parameters for MLID
0xc000 and whether windows port GUID (0x0005ad00000c5ced) is listed as
joined in that group.

I suspect it may not be due to:

Jun 06 12:06:29 959894 [295C0700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B13:
validate_modify failed from port 0x0005ad00000c5ced (Topspin DDR-HCAe LX
x8), sending IB_SA_MAD_STATUS_REQ_INVALID
is worrisome...


>  I'm going to try switching from MLNX_WinOF_VPI to winOFED 3.1.
>
>
>  Also, is the HCA really 8x DDR as the NodeDescription appears (Topspin
>> DDR-HCAe LX x8) ?
>>
>
> Yes. pcie x8, DDR.
>
>  -- Hal
>>
>
> Thanks.
>
>
>
> --
> Orion Poplawski
> Technical Manager                     303-415-9701 x222
> NWRA, Boulder/CoRA Office             FAX: 303-415-9702
> 3380 Mitchell Lane                       orion at nwra.com
> Boulder, CO 80301                   http://www.nwra.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/users/attachments/20130607/a9fa4190/attachment.html>


More information about the Users mailing list