[ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB

John Heffner jheffner at psc.edu
Sun Aug 26 18:32:26 PDT 2007


Bill Fink wrote:
> Here's the beforeafter delta of the receiver's "netstat -s"
> statistics for the TSO enabled case:
> 
> Ip:
>     3659898 total packets received
>     3659898 incoming packets delivered
>     80050 requests sent out
> Tcp:
>     2 passive connection openings
>     3659897 segments received
>     80050 segments send out
> TcpExt:
>     33 packets directly queued to recvmsg prequeue.
>     104956 packets directly received from backlog
>     705528 packets directly received from prequeue
>     3654842 packets header predicted
>     193 packets header predicted and directly queued to user
>     4 acknowledgments not containing data received
>     6 predicted acknowledgments
> 
> And here it is for the TSO disabled case (GSO also disabled):
> 
> Ip:
>     4107083 total packets received
>     4107083 incoming packets delivered
>     1401376 requests sent out
> Tcp:
>     2 passive connection openings
>     4107083 segments received
>     1401376 segments send out
> TcpExt:
>     2 TCP sockets finished time wait in fast timer
>     48486 packets directly queued to recvmsg prequeue.
>     1056111048 packets directly received from backlog
>     2273357712 packets directly received from prequeue
>     1819317 packets header predicted
>     2287497 packets header predicted and directly queued to user
>     4 acknowledgments not containing data received
>     10 predicted acknowledgments
> 
> For the TSO disabled case, there are a huge amount more TCP segments
> sent out (1401376 versus 80050), which I assume are ACKs, and which
> could possibly contribute to the higher throughput for the TSO disabled
> case due to faster feedback, but not explain the lower CPU utilization.
> There are many more packets directly queued to recvmsg prequeue
> (48486 versus 33).  The numbers for packets directly received from
> backlog and prequeue in the TCP disabled case seem bogus to me so
> I don't know how to interpret that.  There are only about half as
> many packets header predicted (1819317 versus 3654842), but there
> are many more packets header predicted and directly queued to user
> (2287497 versus 193).  I'll leave the analysis of all this to those
> who might actually know what it all means.

There are a few interesting things here.  For one, the bursts caused by 
TSO seem to be causing the receiver to do stretch acks.  This may have a 
negative impact on flow performance, but it's hard to say for sure how 
much.  Interestingly, it will even further reduce the CPU load on the 
sender, since it has to process fewer acks.

As I suspected, in the non-TSO case the receiver gets lots of packets 
directly queued to user.  This should result in somewhat lower CPU 
utilization on the receiver.  I don't know if it can account for all the 
difference you see.

The backlog and prequeue values are probably correct, but netstat's 
description is wrong.  A quick look at the code reveals these values are 
in units of bytes, not packets.

   -John



More information about the general mailing list