[openib-general] ip_ipoib works on IA64! (woohoo! :^)

Grant Grundler iod00d at hp.com
Fri Dec 10 16:42:09 PST 2004


On Fri, Dec 10, 2004 at 02:18:24PM -0800, Roland Dreier wrote:
>     Grant> /me bounces!  With openib-1321, ip_ipoib is seems to be working!
> 
> Cool.
> 
> (I was wondering whether the earlier problems were because each system
> had two interfaces on the same broadcast domain and hence maybe
> responding to ARPs from the wrong interface.

Yes, you are, as usual :^), probably right. The initial broadcast
mask was 10.255.255.255 because I didn't specify one when I ran
ifconfig (with params) for the first time.
I checked what I had done by running ifconfig (no params) and realized
the error. I ran ifconfig (with params) again for both ib0/1 and this
time specified the broadcast address and netmask.
It's likely it didn't recover from that.

But I expect this issue is not ia64 specific and anyone should
be able to reproduce it.

> I wonder whether
> /proc/sys/net/ipv4/conf/ibX/arp_filter might have helped...)

sorry - I don't understand networking protocols well enough to know
what you are alluding to here. But if you are already aware of
the issue and fixing it...

>     Grant> Current issue is misaligned accesses in the kernel.
> 
> Hmm... what's offsetof(struct neighbour, ha) on ia64?

By hand, I counted 68. It should be in the asm I posted earlier.

> (I'll check for
> myself a little later but I think the problem may be stashing a
> pointer at neigh->ha + 24)

yes, I'm pretty sure it is. 

> My current tree has a lot of local changes so it's a little hard for
> me to generate a patch, but does changing the body of to_ipoib_neigh()
> to the following help?
> 
> static inline struct ipoib_neigh **to_ipoib_neigh(struct neighbour *neigh)
> {
> 	return (struct ipoib_neigh **) (neigh->ha + 24 -
> 					(offsetof(struct neighbour, ha) & 4));
> }

Sorry, I don't have this function in my tree...that's probably part
of the changes you want to commit.  I think it's called to_ipoib_path()
in the exist tree:

static inline struct ipoib_path **to_ipoib_path(struct neighbour *neigh)
{
        return (struct ipoib_path **) (neigh->ha + 24);
}

I've add the same bit to it that you have above
and that does avoid the misaligned access.

Trying to unload the module didn't go smoothly either:
ionize:/opt/netperf# ifconfig ib0 down
ionize:/opt/netperf# ifconfig ib1 down
ionize:/opt/netperf# rmmod ib_ipoib
ib1: ib_dealloc_pd failed

Not sure what caused that hiccup but I was able to unload everything else
and reload the new modules just fine.

In another email you commented:
> I'd be curious how much this boosts your performance (it would be at
> least one unaligned trap per packet, so it's probably a big deal).

That's what I expected too.
But not for this particular test:
Starting 56x4 TCP_STREAM tests at Fri Dec 10 15:58:12 PST 2004
/opt/netperf/netperf -t TCP_STREAM -l 60 -H 10.0.1.1 -i 10,3 -I 99,5 -- -s 57344 -S 57344 -m 4096

TCP STREAM TEST to 10.0.1.1 : +/-2.5% @ 99% conf.
!!! WARNING
!!! Desired confidence was not achieved within the specified iterations.
!!! This implies that there was variability in the test environment that
!!! must be investigated before going further.
!!! Confidence intervals: Throughput      :  5.6%
!!!                       Local CPU util  :  0.0%
!!!                       Remote CPU util :  0.0%

Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

262142 262142   4096    60.00    1232.26   


That's something like ~10% improvment.
I ran the same test again but limited it to three passes *AND*
under control of pfmon...about the same result (1221.12 Mbps)
CPU0                    16948574 UC_LOADS_RETIRED
CPU1                       13211 UC_LOADS_RETIRED

*sigh*...forgot to grab IRQ counts.

Again, but with one iteration, (60 seconds):

 67:  149648661          0  IO-SAPIC-level  ib_mthca
pfmon -e uc_loads_retired -k --system-wide -- /opt/netperf/netperf -t TCP_STREAM -l 60 -H 10.0.1.1 -i 1,1 -- -s 57344 -S 57344 -m 4096
...
 114688 114688   4096    60.00    1170.88   
CPU0                     5656170 UC_LOADS_RETIRED
CPU1                        4466 UC_LOADS_RETIRED
ionize:/opt/netperf# cat /proc/interrupts  | fgrep mthca
 67:  152474464          0  IO-SAPIC-level  ib_mthca

5656170/(152474464-149648661) ~= 2

two uncached reads per Interrupt.
That's what e1000 driver is doing today.
No wonder we aren't much faster.
I was expecting zero uncached reads from IB in the interrupt path.
Fix that and we'll get back ~8 seconds of CPU time for a 60 second test.

47096 interrupts/second.
We should be able to do 2x that on this box at least.
(1.5GHz Madison)

I'll dig up the other trivial things with pfmon.

BTW, if anyone has another favorite trivial test or netperf
parameters, I'd be happy to collect pfmon, q-syscollect, prospect,
or oprofile output.

thanks,
grant



More information about the general mailing list