[Users] IPoIB not working on Windows 2008 r2 - need help

Orion Poplawski orion at cora.nwra.com
Fri Jun 7 09:29:41 PDT 2013


On 06/07/2013 04:31 AM, Hal Rosenstock wrote:
> You should check what saquery -m 0xc000 says after looking at saquery -g to
> make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff ) says
> MLID 0xc000.

I switched to winOFED 3.1 but I'm seeing the same behavior.  saquery reports:

                 PortGid.................fe80::1:5:ad00:c:5c3d (andrew mthca0)
                 PortGid.................fe80::1:19:bbff:ff00:5851 (saga mthca0)
                 PortGid.................fe80::1:19:bbff:ff00:3899 (HP Lion 
Cub 128MB)
                 PortGid.................fe80::1:1a:4bff:ff0c:20c9 (HP Lion 
Cub 128MB)
                 PortGid.................fe80::5:ad00:c:5ced (MT25204 
InfiniHostLx Mellanox Technologies)
                 PortGid.................fe80::1:17:8ff:ffd0:9df9 (alexandria2 
HCA-1)

The windows machine is fe80::5:ad00:c:5ced.

Running wireshark on the windows machine again after it stops responding, I 
see the ping requests coming in and arp requests going out for the ip address 
of the remote ping sender, but no arp responses coming in.

Running wireshark on the linux ping sender I see lots of:

  24.162827              ->              ARP 192.168.2.5 is at 
80000404FE80000000000001001A4BFFFF0C20C9
  24.162833              ->              ARP 192.168.2.5 is at 
80000404FE80000000000001001A4BFFFF0C20C9

But they don't seem to be making it back.

opensm is going crazy on my subnet master, sending a flood of pings to the 
windows machine:

10:11:29.395397 IP saga.cora.nwra.com > fontdbib.cora.nwra.com: ICMP echo 
request, id 55637, seq 4, length 64
10:11:29.395459 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56
10:11:29.395477 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56
10:11:29.395484 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56
10:11:29.395491 IP saga.cora.nwra.com > fontdbib.cora.nwra.com: ICMP echo 
request, id 55637, seq 4, length 64
10:11:29.395609 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56
10:11:29.395624 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56
10:11:29.395631 ARP, Request who-has fontdbib.cora.nwra.com tell 
saga.cora.nwra.com, length 56

so I restarted it.

So it is really causing havok with the ib network.

I shutdown the ipoib interface on the windows machine.

I'm still seeing lots of arp traffic on the linux box I was pinging from 
though, which seems crazy:

10:28:35.836691 ARP, Request who-has fontdbib.cora.nwra.com tell 
earthib.cora.nwra.com, length 56
10:28:35.836696 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836700 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836704 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836708 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836713 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836717 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836721 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836726 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836730 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836734 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836739 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836743 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836747 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836752 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836756 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836760 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836765 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836769 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836795 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836804 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836808 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836813 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836817 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836822 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836827 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836832 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836836 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836850 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
^C10:28:35.836855 ARP, Reply earthib.cora.nwra.com is-at 
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56


why so bloody many arp replies?

There is an over active ib_mad1 process on the machine as well.

-- 
Orion Poplawski
Technical Manager                     303-415-9701 x222
NWRA, Boulder/CoRA Office             FAX: 303-415-9702
3380 Mitchell Lane                       orion at nwra.com
Boulder, CO 80301                   http://www.nwra.com



More information about the Users mailing list