[ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
Krishna Kumar2
krkumar2 at in.ibm.com
Tue Aug 28 20:23:30 PDT 2007
Hi Dave,
I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
side.
192.168.1.1.37201 > 192.168.1.2.ssh: P 837178092:837178596(504) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837178596:837181492(2896) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837181492:837184388(2896) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837184388:837188732(4344) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837188732:837193076(4344) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837193076:837194524(1448) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.2.ssh > 192.168.1.1.37201: . ack 837165060 win 3338
<nop,nop,timestamp 123919397 36791208>
Data in pipeline: 837194524 - 837165060 = 29464.
In most cases, I am getting 7K, 8K, 13K, and rarely close to 16K. I ran
iperf with
4K, 16K & 32K (as I could do multiple threads instead of single process).
Results
are (for E1000 82547GI chipset, BW in KB/s):
Test Org BW New BW %
Size:4096 Procs:1 114612 114644 .02
Size:16394 Procs:1 114634 114644 0
Size:32768 Procs:1 114645 114643 0
And for multiple threads:
Test Org BW New BW %
Size:4096 Procs:8 114632 114637 0
Size:4096 Procs:16 114639 114637 0
Size:4096 Procs:64 114893 114800 -.08
Size:16394 Procs:8 114641 114642 0
Size:16394 Procs:16 114642 114643 0
Size:16394 Procs:64 114911 114781 -.11
Size:32768 Procs:8 114638 114639 0
Size:32768 Procs:16 114642 114645 0
Size:32768 Procs:64 114932 114777 -.13
I will run netperf and report CPU utilization too.
Thanks,
- KK
David Miller <davem at davemloft.net> wrote on 08/21/2007 12:48:24 PM:
> From: Krishna Kumar2 <krkumar2 at in.ibm.com>
> Date: Fri, 17 Aug 2007 11:36:03 +0530
>
> > > I ran 3 iterations of 45 sec tests (total 1 hour 16 min, but I will
> > > run a longer one tonight). The results are (results in KB/s, and %):
> >
> > I ran a 8.5 hours run with no batching + another 8.5 hours run with
> > batching (Buffer sizes: "32 128 512 4096 16384", Threads: "1 8 32",
> > Each test run time: 3 minutes, Iterations to average: 5). TCP seems
> > to get a small improvement.
>
> Using 16K buffer size really isn't going to keep the pipe full enough
> for TSO. And realistically applications queue much more data at a
> time. Also, with smaller buffer sizes can have negative effects for
> the dynamic receive and send buffer growth algorithm the kernel uses,
> it might consider the connection application limited for too long.
>
> I would really prefer to see numbers that use buffer sizes more in
> line with the amount of data that is typically inflight on a 1G
> connection on a local network.
>
> Do a tcpdump during the height of the transfer to see about what this
> value is. When an ACK comes in, compare the sequence number it's
> ACK'ing with the sequence number of the most recently sent frame.
> The difference is approximately the pipe size at maximum congestion
> window assuming a loss free local network.
More information about the general
mailing list