[openib-general] How about ib_send_page() ?
Roland Dreier
roland at topspin.com
Tue May 17 19:48:32 PDT 2005
First of all, let me say that to me, IPoIB performance tuning isn't
really that interesting. IPoIB is very easy to set up and there's a
wide variety of tools that spit out all sorts of numbers, so it's
definitely a very accessible area of research, but in the end there
are probably better ways to use IB hardware.
With that said, I should emphasize that I don't want to discourage
anyone from working on whatever strikes their fancy, and I'd certainly
be happy to merge patches that improve our performance.
The most interesting optimization available is implementing the IPoIB
connected mode draft, although I don't think it's as easy as Vivek
indicated -- for example, I'm not sure how to deal with having
different MTUs depending on the destination.
Jeff> Hey. is there was a way to not generate so many interrupts?
Jeff> That's handled by NAPI OK. That looks interesting. But, we
Jeff> can't do NAPI because we can't just disable interrupts.
Jeff> Darn. But wait, why can't we just not generate interrupts
Jeff> in the first place then?
NAPI isn't really a throughput optimization. It pretty much only
helps with small packet workloads. Also, it helps the RX path much
more than the TX path, which means it doesn't help much in a symmetric
test like netperf or NPtcp. If you look at the table at the beginning
of Documentation/networking/NAPI_HOWTO.txt, you can see that for e1000
with 1024 byte packets, the NIC generates 872K RX interrupts for 1M RX
packets -- NAPI doesn't kick in totally until the packet size is down
to 256 bytes.
It is possible although somewhat ugly to implement NAPI for IPoIB, and
I actually had hacked up such a patch a while ago. However it didn't
end up helping so I never pursued it.
When I profile a system running IPoIB throughput tests, half or more
of the CPU time is going to skb_copy_and_csum() and other parts of the
core kernel's network stack, so our ability to optimize IPoIB is
somewhat limited. I've already dealt with all of the easy
optimization targets from profiling in IPoIB and mthca -- with MSI-X,
we do zero MMIO reads, and Michael Tsirkin has heavily tuned our locking.
I'd be curious to know what kind of TCP throughput a modern 10 gigE
NIC gets with an MTU of 2044. In some sense that is probably an upper
bound for IPoIB throughput.
- R.
More information about the general
mailing list