[openib-general] ip_ipoib works on IA64! (woohoo! :^)
Grant Grundler
iod00d at hp.com
Fri Dec 10 13:22:19 PST 2004
/me bounces!
With openib-1321, ip_ipoib is seems to be working!
I couldn't reproduce the problem with ping not working sometimes.
Current issue is misaligned accesses in the kernel.
Here's a "cleaner" set of data.
ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_mthca
ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:81:00.0)
GSI 60 (level, low) -> CPU 0 (0x0000) vector 67
ACPI: PCI interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 67
ionize:/usr/src/linux-ia64-release-2.6.10# elilo -v --efiboot
ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_ipoib
ionize:/usr/src/linux-ia64-release-2.6.10# cat /sys/class/infiniband/mthca0/ports/?/state
4: ACTIVE
4: ACTIVE
ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib0 10.0.0.2 netmask 255.255.255.0 broadcast 10.0.0.255
ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib1 10.0.1.2 netmask 255.255.255.0 broadcast 10.0.1.255
ionize:/usr/src/linux-ia64-release-2.6.10# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001be010
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=14.4 ms
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.571 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.069 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.067 ms
64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.069 ms
64 bytes from 10.0.0.1: icmp_seq=6 ttl=64 time=0.068 ms
--- 10.0.0.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5001ms
rtt min/avg/max/mdev = 0.067/2.551/14.463/5.330 ms
ionize:/usr/src/linux-ia64-release-2.6.10#
ionize:/usr/src/linux-ia64-release-2.6.10# cd /opt/netperf/
ionize:/opt/netperf# ls
netperf snapshot_script tcp_rr_script udp_rr_script
netserver tcp_range_script tcp_stream_script udp_stream_script
ionize:/opt/netperf# ./snapshot_script 10.0.1.1
Netperf snapshot script started at Fri Dec 10 13:00:02 PST 2004
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
...
misaligned accesses reports are rate limited by the kernel.
The above is just the tip of the iceberg.
a0000002001bee60 t ipoib_start_xmit [ib_ipoib]
a0000002001bf880 t ipoib_get_stats [ib_ipoib]
The "netserver" (rx4640) is getting the following:
kernel unaligned access to 0xe0000001008b0f5c, ip=0xa000000200152f10
a000000200152e60 t ipoib_start_xmit [ib_ipoib]
a000000200153880 t ipoib_get_stats [ib_ipoib]
based on IP and offset (0x5c) I'll guess this is the same problem
on both sides. Still looking at it.
FYA,
Starting 32x4 TCP_STREAM tests at Fri Dec 10 13:09:36 PST 2004
------------------------------------
Testing with the following command line:
/opt/netperf/netperf -t TCP_STREAM -l 60 -H 10.0.1.1 -i 10,3 -I 99,5 -- -s 32768 -S 32768 -m 4096
...
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
262142 262142 4096 60.00 1164.65
Fixing the alignment issue should help here.
Then I can start drilling a bit deeper on bottlenecks.
hth,
grant
More information about the general
mailing list