[ofa-general] performance and Kernel support

Rick Jones rick.jones2 at hp.com
Tue Sep 11 10:17:05 PDT 2007


H. N. HARAKE wrote:

> The second question is regarding performance parameters using netperf
> I reach 4GBit/s between two nodes using OFED version 1.2.51 and
> 3GBit/s using OFED version 1.1 (10 Gig Mellanox cards)  is their any
> parameters to apply for improving the performance or is their any
> document around.

What is the CPU util being reported by netperf (-c and -C options for 
local and remote respectively) and how many cores are there in the system?

Here are some numbers I get with a pair of rx2660's connected via an HP 
4x IB switch:

		RedHat Enterprise Linux 5 2.6.18-8.el5
		    Peak Single-Stream Performance

                              Bulk Transfer                  "Latency"
                          Unidir            Bidir
     Card          Mbit/s SDx   SDr   Mbit/s SDx   SDr   Tran/s SDx   SDr
---------------------------------------------------------------------------
  AD313A  IPoIB 1.1 2970  4.418 4.544  3530  3.59  3.95 19290 n/a   n/a
  AD313A  SDP   1.1 7810  0.453 1.048 12820  0.69  0.68 38030 26.29 26.29
  AD313A  SDP p0    7810  0.346 0.527 12670  0.42  0.43 19380 n/a   n/a
  AD313A  IPoIP 1.2 5510  0.426 1.593  5730  n/a   n/a  18990 n/a   n/a
  AD313A  SDP   1.2 7820  0.409 1.047 12890  0.64  0.68 41988 25.89 26.32
  AD313A SDP p0 1.2 7820  0.309 0.517 12760  0.36  0.36 19800 15.47 15.72

The big change between 1.1 and 1.2 was, IIRC the increase in the default 
IP MTU from 2044 to 65520 (?) bytes.  The limitation in the 1.1 case at 
least was CPU saturation (although I don't show the CPU utils in the 
table above, just the service demands.  Notice the very significant 
change in service deman (microseconds of CPU consumed per KB 
transferred) between 1.1 and 1.2.  I suspect the receive side would go 
down even further with CKO support but alas I've none of those sorts of 
cards at my disposal...

For those test I was likely using -s 1M -S 1M -m 64K  on the Unidir, and 
-s 1M -S 1M -r 64K -b 12 on the Bidir (TCP_RR ./configured with 
--enable-burst).  The latency figures are the "standard" :) single-byte 
TCP_RR test.

p0 means the SDP stuff was configured to sleep rather than sit and spin.

happy benchmarking

rick jones



More information about the general mailing list