[openib-general] ip_ipoib works on IA64! (woohoo! :^)

Grant Grundler iod00d at hp.com
Fri Dec 10 13:22:19 PST 2004


/me bounces!
With openib-1321, ip_ipoib is seems to be working!

I couldn't reproduce the problem with ping not working sometimes.
Current issue is misaligned accesses in the kernel.
Here's a "cleaner" set of data.

ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_mthca
ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004)
ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:81:00.0)
GSI 60 (level, low) -> CPU 0 (0x0000) vector 67
ACPI: PCI interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 67
ionize:/usr/src/linux-ia64-release-2.6.10# elilo -v --efiboot
ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_ipoib
ionize:/usr/src/linux-ia64-release-2.6.10# cat /sys/class/infiniband/mthca0/ports/?/state
4: ACTIVE
4: ACTIVE
ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib0 10.0.0.2 netmask 255.255.255.0 broadcast 10.0.0.255
ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib1 10.0.1.2 netmask 255.255.255.0 broadcast 10.0.1.255
ionize:/usr/src/linux-ia64-release-2.6.10# ping 10.0.0.1
PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data.
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001be010
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=14.4 ms
kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10
64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.571 ms
64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.069 ms
64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.067 ms
64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.069 ms
64 bytes from 10.0.0.1: icmp_seq=6 ttl=64 time=0.068 ms

--- 10.0.0.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5001ms
rtt min/avg/max/mdev = 0.067/2.551/14.463/5.330 ms
ionize:/usr/src/linux-ia64-release-2.6.10#
ionize:/usr/src/linux-ia64-release-2.6.10# cd /opt/netperf/
ionize:/opt/netperf# ls
netperf    snapshot_script   tcp_rr_script      udp_rr_script
netserver  tcp_range_script  tcp_stream_script  udp_stream_script
ionize:/opt/netperf# ./snapshot_script 10.0.1.1
Netperf snapshot script started at Fri Dec 10 13:00:02 PST 2004
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10
...


misaligned accesses reports are rate limited by the kernel.
The above is just the tip of the iceberg.

a0000002001bee60 t ipoib_start_xmit     [ib_ipoib]
a0000002001bf880 t ipoib_get_stats      [ib_ipoib]


The "netserver" (rx4640) is getting the following:
kernel unaligned access to 0xe0000001008b0f5c, ip=0xa000000200152f10

a000000200152e60 t ipoib_start_xmit     [ib_ipoib]
a000000200153880 t ipoib_get_stats      [ib_ipoib]

based on IP and offset (0x5c) I'll guess this is the same problem
on both sides.  Still looking at it.


FYA,
Starting 32x4 TCP_STREAM tests at Fri Dec 10 13:09:36 PST 2004

------------------------------------
Testing with the following command line:
/opt/netperf/netperf -t TCP_STREAM -l 60 -H 10.0.1.1 -i 10,3 -I 99,5 -- -s 32768 -S 32768 -m 4096
...
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

262142 262142   4096    60.00    1164.65   

Fixing the alignment issue should help here.
Then I can start drilling a bit deeper on bottlenecks.

hth,
grant



More information about the general mailing list