<br><br><div class="gmail_quote">On Thu, Jun 6, 2013 at 3:32 PM, Orion Poplawski <span dir="ltr"><<a href="mailto:orion@cora.nwra.com" target="_blank">orion@cora.nwra.com</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
I'm trying for the first time to get IPoIB working on one of our Windows servers. The network is working fine between some Linux machines. Details:<br>
<br>
InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]<br>
Windows Server 2008 r2<br>
MLNX_WinOF_VPI_2_1_2_win7_x64.<u></u>msi (as recommended by the mellanox download page for InfiniHost III adapters)<br>
<br>
I don't notice any errors, the adapter shows up fine and I can configure it with a static IP address. After configuring it (or after boot) I can ping it from another machine for about 10 seconds before it stops responding. When I ping out from the machine at this point, the icmp packets are being sent out the main ethernet interface (which is a different IP network) and I can see them get to our router. ibdiagnet does not report any errors. ipconfig and netstart -r seem fine.<br>
<br>
I see the following in my opensm log:<br>
<br>
Jun 06 11:51:50 600282 [29FC1700] 0x02 -> log_notice: Reporting Generic Notice type:1 num:128 (Link state change) from LID:2 GID:fe80::1:6:6a00:d800:242<br>
Jun 06 11:51:51 011771 [1F5B0700] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches<br>
Jun 06 11:51:51 016889 [1F5B0700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:64 (GID in service) from LID:1 GID:fe80::1:5:ad00:c:5ced<br>
Jun 06 11:51:51 016899 [1F5B0700] 0x02 -> state_mgr_report_new_ports: Discovered new port with GUID:0x0005ad00000c5ced LID range [16,16] of node: Topspin DDR-HCAe LX x8<br>
Jun 06 11:51:51 027491 [1F5B0700] 0x02 -> SUBNET UP<br>
Jun 06 11:51:56 333829 [213B3700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:0:1<br>
Jun 06 11:51:56 333875 [209B2700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:ff76:<u></u>9ac6<br>
Jun 06 11:51:56 603980 [295C0700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:0:2<br>
Jun 06 11:51:56 604270 [245B8700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:0:16<br>
Jun 06 11:52:15 854497 [263BB700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:1:2<br>
Jun 06 11:52:15 857261 [213B3700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:1405:ffff::3333:1:3<br>
Jun 06 11:52:15 857968 [209B2700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:401b:ffff::16<br>
Jun 06 11:52:15 963577 [21DB4700] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:401b:ffff::fc<br>
Jun 06 12:04:56 535293 [26DBC700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:0:1 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 535870 [277BD700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:1:3 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 535908 [259BA700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:1:2 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 535942 [23BB7700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:401b:ffff::fc for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 535970 [281BE700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:401b:ffff::16 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 536014 [277BD700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:ff76:9ac6 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 536042 [209B2700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:0:16 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:04:56 536634 [227B5700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:0:2 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:06:29 959894 [295C0700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B13: validate_modify failed from port 0x0005ad00000c5ced (Topspin DDR-HCAe LX x8), sending IB_SA_MAD_STATUS_REQ_INVALID<br>
Jun 06 12:06:29 960518 [231B6700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:1405:ffff::3333:1:2 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:06:36 629355 [26DBC700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:401b:ffff::1 for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:06:36 629416 [259BA700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:401b:ffff::ffff:ffff for PortGID: fe80::5:ad00:c:5ced<br>
Jun 06 12:06:36 638659 [21DB4700] 0x01 -> mcmr_rcv_join_mgrp: ERR 1B13: validate_modify failed from port 0x0005ad00000c5ced (Topspin DDR-HCAe LX x8), sending IB_SA_MAD_STATUS_REQ_INVALID<br>
Jun 06 12:06:36 638853 [245B8700] 0x01 -> mcmr_rcv_leave_mgrp: ERR 1B25: Received an invalid delete request for MGID: ff12:401b:ffff::ffff:ffff for PortGID: fe80::5:ad00:c:5ced<br>
<br>
This last message repeats quite a bit within that second, and then stops.<br></blockquote><div> </div><div>IPoIB in Windows deletes from the IPoIB broadcast IB multicast group before joining so if that port isn't a member of that MC group you will see this so these aren't necessarily "bad" from an SM perspective.</div>
<div> </div><div>How is your partition file for OpenSM setup ? You should have the ipoib flag on for the default partition.</div><div> </div><div>Which OpenSM are you using here ? A Windows or Linux node ? Which version ?</div>
<div> </div><div><div>You should check what saquery -m 0xc000 says after looking at saquery -g to make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff ) says MLID 0xc000.</div><div> </div><div>Also, is the HCA really 8x DDR as the NodeDescription appears (Topspin DDR-HCAe LX x8) ?</div>
<div> </div><div>-- Hal</div></div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<br>
Any ideas?<span class="HOEnZb"><font color="#888888"><br>
<br>
-- <br>
Orion Poplawski<br>
Technical Manager <a href="tel:303-415-9701%20x222" target="_blank" value="+13034159701">303-415-9701 x222</a><br>
NWRA, Boulder/CoRA Office FAX: <a href="tel:303-415-9702" target="_blank" value="+13034159702">303-415-9702</a><br>
3380 Mitchell Lane <a href="mailto:orion@nwra.com" target="_blank">orion@nwra.com</a><br>
Boulder, CO 80301 <a href="http://www.nwra.com" target="_blank">http://www.nwra.com</a><br>
______________________________<u></u>_________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.openfabrics.org" target="_blank">Users@lists.openfabrics.org</a><br>
<a href="http://lists.openfabrics.org/cgi-bin/mailman/listinfo/users" target="_blank">http://lists.openfabrics.org/<u></u>cgi-bin/mailman/listinfo/users</a><br>
</font></span></blockquote></div><br>