[Users] IPoIB not working on Windows 2008 r2 - need help
Orion Poplawski
orion at cora.nwra.com
Fri Jun 7 09:29:41 PDT 2013
On 06/07/2013 04:31 AM, Hal Rosenstock wrote:
> You should check what saquery -m 0xc000 says after looking at saquery -g to
> make sure that the IPoIB broadcast group ( ff12:401b:ffff::ffff:ffff ) says
> MLID 0xc000.
I switched to winOFED 3.1 but I'm seeing the same behavior. saquery reports:
PortGid.................fe80::1:5:ad00:c:5c3d (andrew mthca0)
PortGid.................fe80::1:19:bbff:ff00:5851 (saga mthca0)
PortGid.................fe80::1:19:bbff:ff00:3899 (HP Lion
Cub 128MB)
PortGid.................fe80::1:1a:4bff:ff0c:20c9 (HP Lion
Cub 128MB)
PortGid.................fe80::5:ad00:c:5ced (MT25204
InfiniHostLx Mellanox Technologies)
PortGid.................fe80::1:17:8ff:ffd0:9df9 (alexandria2
HCA-1)
The windows machine is fe80::5:ad00:c:5ced.
Running wireshark on the windows machine again after it stops responding, I
see the ping requests coming in and arp requests going out for the ip address
of the remote ping sender, but no arp responses coming in.
Running wireshark on the linux ping sender I see lots of:
24.162827 -> ARP 192.168.2.5 is at
80000404FE80000000000001001A4BFFFF0C20C9
24.162833 -> ARP 192.168.2.5 is at
80000404FE80000000000001001A4BFFFF0C20C9
But they don't seem to be making it back.
opensm is going crazy on my subnet master, sending a flood of pings to the
windows machine:
10:11:29.395397 IP saga.cora.nwra.com > fontdbib.cora.nwra.com: ICMP echo
request, id 55637, seq 4, length 64
10:11:29.395459 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
10:11:29.395477 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
10:11:29.395484 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
10:11:29.395491 IP saga.cora.nwra.com > fontdbib.cora.nwra.com: ICMP echo
request, id 55637, seq 4, length 64
10:11:29.395609 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
10:11:29.395624 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
10:11:29.395631 ARP, Request who-has fontdbib.cora.nwra.com tell
saga.cora.nwra.com, length 56
so I restarted it.
So it is really causing havok with the ib network.
I shutdown the ipoib interface on the windows machine.
I'm still seeing lots of arp traffic on the linux box I was pinging from
though, which seems crazy:
10:28:35.836691 ARP, Request who-has fontdbib.cora.nwra.com tell
earthib.cora.nwra.com, length 56
10:28:35.836696 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836700 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836704 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836708 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836713 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836717 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836721 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836726 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836730 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836734 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836739 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836743 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836747 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836752 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836756 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836760 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836765 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836769 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836795 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836804 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836808 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836813 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836817 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836822 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836827 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836832 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836836 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
10:28:35.836850 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
^C10:28:35.836855 ARP, Reply earthib.cora.nwra.com is-at
80:00:04:04:fe:80:00:00:00:00:00:01:00:1a:4b:ff:ff:0c:20:c9, length 56
why so bloody many arp replies?
There is an over active ib_mad1 process on the machine as well.
--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA, Boulder/CoRA Office FAX: 303-415-9702
3380 Mitchell Lane orion at nwra.com
Boulder, CO 80301 http://www.nwra.com
More information about the Users
mailing list