[ofa-general] Re: [PATCH 0/10 REV5] Implement skb batching and support in IPoIB/E1000

David Miller davem at davemloft.net
Sun Sep 16 20:13:18 PDT 2007


From: jamal <hadi at cyberus.ca>
Date: Sun, 16 Sep 2007 23:01:43 -0400

> I think GSO is still useful on top of this.
> In my patches anything with gso gets put into the batch list and shot
> down the driver. Ive never considered checking whether the nic is TSO
> capable, that may be something worth checking into. The netiron allows
> you to shove upto 128 skbs utilizing one tx descriptor, which makes for
> interesting possibilities.

We're talking past each other, but I'm happy to hear that for
sure your code does the right thing :-)

Right now only TSO capable hardware sets the TSO capable bit,
except perhaps for the XEN netfront driver.

What Herbert and I want to do is basically turn on TSO for
devices that can't do it in hardware, and rely upon the GSO
framework to do the segmenting in software right before we
hit the device.

This only makes sense for devices which can 1) scatter-gather
and 2) checksum on transmit.  Otherwise we make too many
copies and/or passes over the data.

And we can only get the full benefit if we can pass all the
sub-segments down to the driver in one ->hard_start_xmit()
call.

> On a side note: My observation is that with large packets on a very busy
> system; bulk transfer type app, one approaches wire speed; with or
> without batching, the apps are mostly idling (Ive seen upto 90% idle
> time polling at the socket level for write to complete with a really
> busy system). This is the case with or without batching. cpu seems a
> little better with batching. As the aggregation of the apps gets more
> aggressive (achievable by reducing their packet sizes), one can achieve
> improved throughput and reduced cpu utilization. This all with UDP; i am
> still studying tcp. 

UDP apps spraying data tend to naturally batch well and load balance
amongst themselves because each socket fills up to it's socket send
buffer limit, then sleeps, and we then get a stream from the next UDP
socket up to it's limit, and so on and so forth.

UDP is too easy a test case in fact :-)



More information about the general mailing list