[ofa-general] Re: [PATCH 2/3][NET_BATCH] net core use batching
David Miller
davem at davemloft.net
Wed Oct 10 02:25:50 PDT 2007
From: Andi Kleen <andi at firstfloor.org>
Date: Wed, 10 Oct 2007 11:16:44 +0200
> > A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you
>
> With TSO really?
Yes.
> > increase the size much more performance starts to go down due to L2
> > cache thrashing.
>
> Another possibility would be to consider using cache avoidance
> instructions while updating the TX ring (e.g. write combining
> on x86)
The chip I was working with at the time (UltraSPARC-IIi) compressed
all the linear stores into 64-byte full cacheline transactions via
the store buffer.
It's true that it would allocate in the L2 cache on a miss, which
is different from your suggestion.
In fact, such a thing might not pan out well, because most of the time
you write a single descriptor or two, and that isn't a full cacheline,
which means a read/modify/write is the only coherent way to make such
a write to RAM.
Sure you could batch, but I'd rather give the chip work to do unless
I unequivocably knew I'd have enough pending to fill a cacheline's
worth of descriptors. And since you suggest we shouldn't queue in
software... :-)
More information about the general
mailing list