<br><br><div class="gmail_quote">On Fri, Jun 7, 2013 at 11:14 AM, Orion Poplawski <span dir="ltr"><<a href="mailto:orion@cora.nwra.com" target="_blank">orion@cora.nwra.com</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div class="im">On 06/07/2013 04:31 AM, Hal Rosenstock wrote:<br>
</div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"><div class="im">
<br>
<br>
On Thu, Jun 6, 2013 at 3:32 PM, Orion Poplawski <<a href="mailto:orion@cora.nwra.com" target="_blank">orion@cora.nwra.com</a><br></div><div class="im">
<mailto:<a href="mailto:orion@cora.nwra.com" target="_blank">orion@cora.nwra.com</a>>> wrote:<br>
<br>
I'm trying for the first time to get IPoIB working on one of our Windows<br>
servers. The network is working fine between some Linux machines. Details:<br>
<br>
InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]<br>
Windows Server 2008 r2<br></div>
MLNX_WinOF_VPI_2_1_2_win7_x64.<u></u>__msi (as recommended by the mellanox<div class="im"><br>
download page for InfiniHost III adapters)<br>
<br>
I don't notice any errors, the adapter shows up fine and I can configure<br>
it with a static IP address. After configuring it (or after boot) I can<br>
ping it from another machine for about 10 seconds before it stops<br>
responding. When I ping out from the machine at this point, the icmp<br>
packets are being sent out the main ethernet interface (which is a<br>
different IP network) and I can see them get to our router. ibdiagnet<br>
does not report any errors. ipconfig and netstart -r seem fine.<br>
<br>
I see the following in my opensm log:<br>
</div></blockquote>
><br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
IPoIB in Windows deletes from the IPoIB broadcast IB multicast group before<div class="im"><br>
joining so if that port isn't a member of that MC group you will see this so<br>
these aren't necessarily "bad" from an SM perspective.<br>
</div></blockquote>
<br>
I thought that might be the case, thanks.<div class="im"><br>
<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
How is your partition file for OpenSM setup ? You should have the ipoib flag<br>
on for the default partition.<br>
Which OpenSM are you using here ? A Windows or Linux node ? Which version ?<br>
</blockquote>
<br></div>
IPoIB is working fine among my Linux machines, I'm just trying to add Windows to the mix. I'm running opensm 3.3.15 on SL 6.<br></blockquote><div> </div><div>What's SL 6 ?</div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<br>
/etc/rdma/partitions:<br>
Default=0x7fff, ipoib : ALL=full ;<div class="im"><br>
<br></div></blockquote><div> </div><div>I have a theory as to what is going on. I think the IB port on Windows is too slow (IB rate * width) to join the already formed group but I thought you wrote something about IPoIB link being indicated by Windows IPoIB driver/net device.</div>
<div> </div><div>Would you send me the output of an ibnetdiscover for your subnet ?</div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<div class="im">
<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
You should check what saquery -m 0xc000 says after looking at saquery -g to<br>
make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff ) says<br>
MLID 0xc000.<br>
</blockquote>
<br></div>
That is fine on my opensm machine. I don't seem to have saquery on the Windows machine.</blockquote><div> </div><div>You can run it from the opensm machine to see if Windows machine is part of the IPoIB broadcast group. I want to see both group parameters for MLID 0xc000 and whether windows port GUID (0x0005ad00000c5ced) is listed as joined in that group.</div>
<div> </div><div>I suspect it may not be due to:</div><div> </div><div>Jun 06 12:06:29 959894 [295C0700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B13: validate_modify failed from port 0x0005ad00000c5ced (Topspin DDR-HCAe LX x8), sending IB_SA_MAD_STATUS_REQ_INVALID</div>
<div><div>is worrisome...</div></div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote"> I'm going to try switching from MLNX_WinOF_VPI to winOFED 3.1.<div class="im">
<br>
<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
Also, is the HCA really 8x DDR as the NodeDescription appears (Topspin<br>
DDR-HCAe LX x8) ?<br>
</blockquote>
<br></div>
Yes. pcie x8, DDR.<br>
<br>
<blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
-- Hal<br>
</blockquote>
<br>
Thanks.<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
-- <br>
Orion Poplawski<br>
Technical Manager <a href="tel:303-415-9701%20x222" target="_blank" value="+13034159701">303-415-9701 x222</a><br>
NWRA, Boulder/CoRA Office FAX: <a href="tel:303-415-9702" target="_blank" value="+13034159702">303-415-9702</a><br>
3380 Mitchell Lane <a href="mailto:orion@nwra.com" target="_blank">orion@nwra.com</a><br>
Boulder, CO 80301 <a href="http://www.nwra.com" target="_blank">http://www.nwra.com</a><br>
</div></div></blockquote></div><br>