[ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
Bill Fink
billfink at mindspring.com
Thu Aug 23 20:18:20 PDT 2007
On Thu, 23 Aug 2007, Rick Jones wrote:
> jamal wrote:
> > [TSO already passed - iirc, it has been
> > demostranted to really not add much to throughput (cant improve much
> > over closeness to wire speed) but improve CPU utilization].
>
> In the one gig space sure, but in the 10 Gig space, TSO on/off does make a
> difference for throughput.
Not too much.
TSO enabled:
[root at lang2 ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
[root at lang2 ~]# nuttcp -w10m 192.168.88.16
11813.4375 MB / 10.00 sec = 9906.1644 Mbps 99 %TX 80 %RX
TSO disabled:
[root at lang2 ~]# ethtool -K eth2 tso off
[root at lang2 ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
[root at lang2 ~]# nuttcp -w10m 192.168.88.16
11818.2500 MB / 10.00 sec = 9910.0176 Mbps 100 %TX 78 %RX
Pretty negligible difference it seems.
This is with a 2.6.20.7 kernel, Myricom 10-GigE NICs, and 9000 byte
jumbo frames, in a LAN environment.
For grins, I also did a couple of tests with an MSS of 1460 to
emulate a standard 1500 byte Ethernet MTU.
TSO enabled:
[root at lang2 ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
[root at lang2 ~]# nuttcp -M1460 -w10m 192.168.88.16
5102.8503 MB / 10.06 sec = 4253.9124 Mbps 39 %TX 99 %RX
TSO disabled:
[root at lang2 ~]# ethtool -K eth2 tso off
[root at lang2 ~]# ethtool -k eth2
Offload parameters for eth2:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
[root at lang2 ~]# nuttcp -M1460 -w10m 192.168.88.16
5399.5625 MB / 10.00 sec = 4527.9070 Mbps 99 %TX 76 %RX
Here you can see there is a major difference in the TX CPU utilization
(99 % with TSO disabled versus only 39 % with TSO enabled), although
the TSO disabled case was able to squeeze out a little extra performance
from its extra CPU utilization. Interestingly, with TSO enabled, the
receiver actually consumed more CPU than with TSO disabled, so I guess
the receiver CPU saturation in that case (99 %) was what restricted
its performance somewhat (this was consistent across a few test runs).
-Bill
More information about the general
mailing list