[ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

Tue Aug 28 20:23:30 PDT 2007

Hi Dave,

I am scp'ng from 192.168.1.1 to 192.168.1.2 and captured at the send
side.

192.168.1.1.37201 > 192.168.1.2.ssh: P 837178092:837178596(504) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837178596:837181492(2896) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837181492:837184388(2896) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837184388:837188732(4344) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837188732:837193076(4344) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.1.37201 > 192.168.1.2.ssh: . 837193076:837194524(1448) ack
1976304527 win 79 <nop,nop,timestamp 36791208 123919397>
192.168.1.2.ssh > 192.168.1.1.37201: . ack 837165060 win 3338
<nop,nop,timestamp 123919397 36791208>

Data in pipeline: 837194524 - 837165060 = 29464.

In most cases, I am getting 7K, 8K, 13K, and rarely close to 16K. I ran
iperf with
4K, 16K & 32K (as I could do multiple threads instead of single process).
Results
are (for E1000 82547GI chipset, BW in KB/s):

Test                           Org BW          New BW        %
Size:4096 Procs:1       114612          114644          .02
Size:16394 Procs:1      114634         114644           0
Size:32768 Procs:1      114645         114643           0

And for multiple threads:

Test                            Org BW          New BW       %
Size:4096 Procs:8       114632          114637           0
Size:4096 Procs:16      114639          114637          0
Size:4096 Procs:64      114893          114800         -.08
Size:16394 Procs:8      114641          114642          0
Size:16394 Procs:16     114642         114643          0
Size:16394 Procs:64     114911         114781          -.11
Size:32768 Procs:8      114638          114639           0
Size:32768 Procs:16     114642          114645          0
Size:32768 Procs:64     114932          114777         -.13

I will run netperf and report CPU utilization too.

Thanks,

- KK

David Miller <davem at davemloft.net> wrote on 08/21/2007 12:48:24 PM:

> From: Krishna Kumar2 <krkumar2 at in.ibm.com>
> Date: Fri, 17 Aug 2007 11:36:03 +0530
>
> > > I ran 3 iterations of 45 sec tests (total 1 hour 16 min, but I will
> > > run a longer one tonight). The results are (results in KB/s, and %):
> >
> > I ran a 8.5 hours run with no batching + another 8.5 hours run with
> > batching (Buffer sizes: "32 128 512 4096 16384", Threads: "1 8 32",
> > Each test run time: 3 minutes, Iterations to average: 5). TCP seems
> > to get a small improvement.
>
> Using 16K buffer size really isn't going to keep the pipe full enough
> for TSO.  And realistically applications queue much more data at a
> time.  Also, with smaller buffer sizes can have negative effects for
> the dynamic receive and send buffer growth algorithm the kernel uses,
> it might consider the connection application limited for too long.
>
> I would really prefer to see numbers that use buffer sizes more in
> line with the amount of data that is typically inflight on a 1G
> connection on a local network.
>
> Do a tcpdump during the height of the transfer to see about what this
> value is.  When an ACK comes in, compare the sequence number it's
> ACK'ing with the sequence number of the most recently sent frame.
> The difference is approximately the pipe size at maximum congestion
> window assuming a loss free local network.