[openib-general] How about ib_send_page() ?

Tue May 17 21:29:32 PDT 2005

-----Original Message-----
From: openib-general-bounces at openib.org
[mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier
Sent: Tuesday, May 17, 2005 7:49 PM
To: Jeff Carr
Cc: openib-general at openib.org
Subject: Re: [openib-general] How about ib_send_page() ?

First of all, let me say that to me, IPoIB performance tuning isn't
really that interesting.  IPoIB is very easy to set up and there's a
wide variety of tools that spit out all sorts of numbers, so it's
definitely a very accessible area of research, but in the end there
are probably better ways to use IB hardware.

With that said, I should emphasize that I don't want to discourage
anyone from working on whatever strikes their fancy, and I'd certainly
be happy to merge patches that improve our performance.

The most interesting optimization available is implementing the IPoIB
connected mode draft, although I don't think it's as easy as Vivek
indicated -- for example, I'm not sure how to deal with having
different MTUs depending on the destination.

    Jeff> Hey. is there was a way to not generate so many interrupts?
    Jeff> That's handled by NAPI OK. That looks interesting.  But, we
    Jeff> can't do NAPI because we can't just disable interrupts.
    Jeff> Darn.  But wait, why can't we just not generate interrupts
    Jeff> in the first place then?

NAPI isn't really a throughput optimization.  It pretty much only
helps with small packet workloads.  Also, it helps the RX path much
more than the TX path, which means it doesn't help much in a symmetric
test like netperf or NPtcp.  If you look at the table at the beginning
of Documentation/networking/NAPI_HOWTO.txt, you can see that for e1000
with 1024 byte packets, the NIC generates 872K RX interrupts for 1M RX
packets -- NAPI doesn't kick in totally until the packet size is down
to 256 bytes.

It is possible although somewhat ugly to implement NAPI for IPoIB, and
I actually had hacked up such a patch a while ago.  However it didn't
end up helping so I never pursued it.

When I profile a system running IPoIB throughput tests, half or more
of the CPU time is going to skb_copy_and_csum() and other parts of the
core kernel's network stack, so our ability to optimize IPoIB is
somewhat limited.  I've already dealt with all of the easy
optimization targets from profiling in IPoIB and mthca -- with MSI-X,
we do zero MMIO reads, and Michael Tsirkin has heavily tuned our
locking.

I'd be curious to know what kind of TCP throughput a modern 10 gigE
NIC gets with an MTU of 2044.  In some sense that is probably an upper
bound for IPoIB throughput.

>>>>
I get just above 5G on RX (goodput, as reported by netperf) on a single
opteron 248 (100%) using standard ethernet MTU (1500).

TX performance is higher (close to 7G), but it is probably not the kind
of comparison that you're interested in, since TSO removes the
dependency on the wire MTU.

Similarly, the TOE completely shields the host from the wire MTU (in
addition of removing ACK traffic) and I'm getting 7.5G RX and TX with
about 50% CPU utilization (user/kernel space memory copy!)

These numbers are without NAPI. 

felix
<<<   

 - R.
_______________________________________________
openib-general mailing list
openib-general at openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general