<br><br><div class="gmail_quote">On Fri, Jun 7, 2013 at 12:29 PM, Orion Poplawski <span dir="ltr"><<a href="mailto:orion@cora.nwra.com" target="_blank">orion@cora.nwra.com</a>></span> wrote:<br><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">

<div>On 06/07/2013 04:31 AM, Hal Rosenstock wrote:<br>
</div><div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
You should check what saquery -m 0xc000 says after looking at saquery -g to<br>
make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff ) says<br>
MLID 0xc000.<br>
</blockquote>
<br></div>
I switched to winOFED 3.1 but I'm seeing the same behavior.  saquery reports:<br>
<br>
                PortGid.................fe80::<u></u>1:5:ad00:c:5c3d (andrew mthca0)<br>
                PortGid.................fe80::<u></u>1:19:bbff:ff00:5851 (saga mthca0)<br>
                PortGid.................fe80::<u></u>1:19:bbff:ff00:3899 (HP Lion Cub 128MB)<br>
                PortGid.................fe80::<u></u>1:1a:4bff:ff0c:20c9 (HP Lion Cub 128MB)<br>
                PortGid.................fe80::<u></u>5:ad00:c:5ced (MT25204 InfiniHostLx Mellanox Technologies)<br>
                PortGid.................fe80::<u></u>1:17:8ff:ffd0:9df9 (alexandria2 HCA-1)<br>
<br>
The windows machine is fe80::5:ad00:c:5ced.<br></blockquote><div> </div><div>OK; it's fine then at SM layer as had been thought earlier by Susan.</div><div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">


<br>
Running wireshark on the windows machine again after it stops responding, I see the ping requests coming in and arp requests going out for the ip address of the remote ping sender, but no arp responses coming in.<br></blockquote>
<div> </div><div>Would you send me the wireshark file ? I'd like at more details in the packets. I think there maybe some incompatibility between Windows and Linux IPoIB. Particularly, I recall something "funny" about MAC address emulation in Windows.</div>
<div> </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<br>
Running wireshark on the linux ping sender I see lots of:<br>
<br>
 24.162827              ->              ARP 192.168.2.5 is at 80000404FE80000000000001001A4B<u></u>FFFF0C20C9<br>
 24.162833              ->              ARP 192.168.2.5 is at 80000404FE80000000000001001A4B<u></u>FFFF0C20C9<br>
<br>
But they don't seem to be making it back.<br>
<br>
opensm is going crazy on my subnet master, sending a flood of pings to the windows machine:<br></blockquote><div>To be clear, you mean opensm machine not opensm. </div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">

<br>
10:11:29.395397 IP <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a> > <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a>: ICMP echo request, id 55637, seq 4, length 64<br>


10:11:29.395459 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
10:11:29.395477 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
10:11:29.395484 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
10:11:29.395491 IP <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a> > <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a>: ICMP echo request, id 55637, seq 4, length 64<br>


10:11:29.395609 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
10:11:29.395624 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
10:11:29.395631 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://saga.cora.nwra.com" target="_blank">saga.cora.nwra.com</a>, length 56<br>
<br>
so I restarted it.<br>
<br>
So it is really causing havok with the ib network.<br>
<br>
I shutdown the ipoib interface on the windows machine.<br>
<br>
I'm still seeing lots of arp traffic on the linux box I was pinging from though, which seems crazy:<br>
<br>
10:28:35.836691 ARP, Request who-has <a href="http://fontdbib.cora.nwra.com" target="_blank">fontdbib.cora.nwra.com</a> tell <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a>, length 56<br>


10:28:35.836696 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836700 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836704 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836708 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836713 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836717 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836721 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836726 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836730 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836734 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836739 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836743 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836747 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836752 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836756 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836760 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836765 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836769 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836795 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836804 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836808 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836813 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836817 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836822 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836827 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836832 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836836 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
10:28:35.836850 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
^C10:28:35.836855 ARP, Reply <a href="http://earthib.cora.nwra.com" target="_blank">earthib.cora.nwra.com</a> is-at 80:00:04:04:fe:80:00:00:00:00:<u></u>00:01:00:1a:4b:ff:ff:0c:20:c9, length 56<br>
<br>
<br>
why so bloody many arp replies?<br></blockquote><div> It might help if you would tell hostname for the various machines particularly the ones running OpenSM and the Windows machine.</div><blockquote style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid" class="gmail_quote">
<p>
<br>
There is an over active ib_mad1 process on the machine as well.</p><p> </p></blockquote><div> </div></div><div>What is your linux machine's configuration ? Is it a dual port configuration ? What is running on each port ? Is there an SM on the second port ?<br>

</div><div>-- Hal</div><div> </div>