[ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh

Jim Mott jimmott at austin.rr.com
Fri Jan 25 06:23:32 PST 2008


I had to rebuild netperf (netperf-2.4.4.tar.gz) to enable demo mode, so I do not know how these results related to what I have seen
before due to differences in the test program.  Running from the AMD machine against a 2-socket/quad-core Intel running
SLES10-SP1-RealTime (sorry; I am testing other things...) with sdp_zcopy_thresh=64K and a fresh default netperf-2.4.4 build gives
me:

[root at dirk unit_tests]# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.125 -D -l 30 -- -r 1000000
Interim result: 2945.95 10^6bits/s over 1.00 seconds
Interim result: 6440.04 10^6bits/s over 1.00 seconds
Interim result: 7974.69 10^6bits/s over 1.00 seconds
Interim result: 8007.55 10^6bits/s over 1.00 seconds
Interim result: 7807.40 10^6bits/s over 1.03 seconds
Interim result: 7947.53 10^6bits/s over 1.01 seconds
Interim result: 7857.82 10^6bits/s over 1.01 seconds
Interim result: 7551.87 10^6bits/s over 1.04 seconds
Interim result: 6939.62 10^6bits/s over 1.09 seconds
Interim result: 7270.93 10^6bits/s over 1.00 seconds
Interim result: 7978.83 10^6bits/s over 1.00 seconds
Interim result: 7948.33 10^6bits/s over 1.00 seconds
Interim result: 7964.85 10^6bits/s over 1.00 seconds
Interim result: 7784.14 10^6bits/s over 1.02 seconds
Interim result: 8126.98 10^6bits/s over 1.01 seconds
Interim result: 7999.08 10^6bits/s over 1.02 seconds
Interim result: 7535.65 10^6bits/s over 1.06 seconds
Interim result: 7934.04 10^6bits/s over 1.00 seconds
Interim result: 7846.96 10^6bits/s over 1.01 seconds
Interim result: 7813.65 10^6bits/s over 1.00 seconds
Interim result: 7768.61 10^6bits/s over 1.01 seconds
Interim result: 7905.22 10^6bits/s over 1.00 seconds
Interim result: 8038.71 10^6bits/s over 1.00 seconds
Interim result: 7233.60 10^6bits/s over 1.11 seconds
Interim result: 7884.16 10^6bits/s over 1.00 seconds
Interim result: 7711.27 10^6bits/s over 1.02 seconds
Interim result: 7708.41 10^6bits/s over 1.00 seconds
Interim result: 7150.47 10^6bits/s over 1.08 seconds
Interim result: 7736.27 10^6bits/s over 1.00 seconds
8388608  16384  16384    30.00      7545.98   52.81    12.84    1.147   1.115  

After turning off bzcopy:

[root at dirk unit_tests]# echo 0 > /sys/module/ib_sdp/parameters/sdp_zcopy_thresh 
[root at dirk unit_tests]# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.125 -D -l 30 -- -r 1000000
Interim result: 2198.71 10^6bits/s over 1.39 seconds
Interim result: 3198.83 10^6bits/s over 1.00 seconds
Interim result: 3547.62 10^6bits/s over 1.00 seconds
Interim result: 7633.09 10^6bits/s over 1.00 seconds
Interim result: 7647.69 10^6bits/s over 1.00 seconds
Interim result: 7932.82 10^6bits/s over 1.00 seconds
Interim result: 8027.73 10^6bits/s over 1.00 seconds
Interim result: 7642.15 10^6bits/s over 1.05 seconds
Interim result: 7304.84 10^6bits/s over 1.05 seconds
Interim result: 7800.30 10^6bits/s over 1.02 seconds
Interim result: 7808.03 10^6bits/s over 1.00 seconds
Interim result: 7875.02 10^6bits/s over 1.00 seconds
Interim result: 8031.41 10^6bits/s over 1.00 seconds
Interim result: 7963.28 10^6bits/s over 1.01 seconds
Interim result: 7842.06 10^6bits/s over 1.02 seconds
Interim result: 7783.37 10^6bits/s over 1.01 seconds
Interim result: 7994.32 10^6bits/s over 1.00 seconds
Interim result: 7281.62 10^6bits/s over 1.10 seconds
Interim result: 7478.55 10^6bits/s over 1.01 seconds
Interim result: 7993.52 10^6bits/s over 1.00 seconds
Interim result: 7914.57 10^6bits/s over 1.01 seconds
Interim result: 7809.30 10^6bits/s over 1.01 seconds
Interim result: 7996.93 10^6bits/s over 1.00 seconds
Interim result: 7867.03 10^6bits/s over 1.02 seconds
Interim result: 7880.61 10^6bits/s over 1.01 seconds
Interim result: 8095.24 10^6bits/s over 1.00 seconds
Interim result: 7942.04 10^6bits/s over 1.02 seconds
Interim result: 7818.13 10^6bits/s over 1.02 seconds
Interim result: 7301.62 10^6bits/s over 1.07 seconds
8388608  16384  16384    30.01      7228.62   52.21    13.03    1.183   1.181  

So I see your results (sort of).  I have been using the netperf that ships with the OS (Rhat4u4 and Rhat5 mostly) or is built with
default options.  Maybe that is the difference.

Target in this case is SLES10 SP1 Real Time.  I notice far less variance in my results than in what you have shown.  We might have a
Linux kernel scheduling issue that accounts for the variance you see (and I have seen).  That would be wonderful news because the
variance has been driving me crazy.

=======

There is something very different in the netperf 2.4.4 I built to enable demo mode and the ones I have been using.  Even after I
rebuilt it without demo mode, using the 2.4.4 netserver as a target resulted in much lower numbers than I have seen before.  There
was also a problem with IPoIB tests reporting no bandwith (0.01 Mb/sec).  Possibly due to MTU problems that were being ignored by
earlier versions.  Using the rebuilt (non-demo mode) 2.4.4 netperf code against a Rhat5 native netserver more or less replicates my
previous results for everything except bzcopy bandwidth.  The bzcopy bandwidth reported is really bad compared to everything else.
Also ifconfig on the ib devices shows errors and dropped.

I will look into the test code next chance I get.  For now, it appears to me that the differences we are seeing in performance
relate to netperf differences.  I suspect that later versions of netperf are trying to optimize around transport geometries (device
reported MTUs, transport reported buffer sizes, etc.) and SDP is probably not putting it's best buffer forward.  

If you could possible rerun some of your tests with the OS provided netperf (fresh OS install, fresh OFED install, simple command
line test), I would be interested to know if it shows different behavior from what you have been reporting.



-----Original Message-----
From: general-bounces at lists.openfabrics.org [mailto:general-bounces at lists.openfabrics.org] On Behalf Of Scott Weitzenkamp (sweitzen)
Sent: Thursday, January 24, 2008 3:58 PM
To: Jim Mott; Weikuan Yu
Cc: general at lists.openfabrics.org
Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance changes inOFED 1.3 beta, and I get Oops when enabling
sdp_zcopy_thresh

Jim,

Like I've said before, I don't see any change in throughput with SDP
zcopy, plus the throughput bounces around.  When you run netperf with
-D, do you see variations in throughput?

Here's example from a dual socket Xeon 5355 quad core RHEL5 x86_64
system with ConnectX 2.3.0 firmware, the interim results throughput
bounces around between 4-7 Gbps.

[releng at svbu-qaclus-98 ~]$ cat
/sys/module/ib_sdp/parameters/sdp_zcopy_thresh
16384
[releng at svbu-qaclus-98 ~]$ LD_PRELOAD=libsdp.so netperf241 -C -c -P 0 -t
TCP_STREAM -H 192.168.1.127 -D -l 60 -- -m 1000000
Interim result: 6676.50 10^6bits/s over 1.00 seconds
Interim result: 6674.47 10^6bits/s over 1.00 seconds
Interim result: 6687.89 10^6bits/s over 1.00 seconds
Interim result: 7075.40 10^6bits/s over 1.00 seconds
Interim result: 7065.08 10^6bits/s over 1.00 seconds
Interim result: 7074.69 10^6bits/s over 1.00 seconds
Interim result: 6667.10 10^6bits/s over 1.06 seconds
Interim result: 4492.29 10^6bits/s over 1.48 seconds
Interim result: 4503.65 10^6bits/s over 1.00 seconds
Interim result: 4481.25 10^6bits/s over 1.01 seconds
Interim result: 4495.91 10^6bits/s over 1.00 seconds
Interim result: 4521.51 10^6bits/s over 1.00 seconds
Interim result: 4466.58 10^6bits/s over 1.01 seconds
Interim result: 4482.09 10^6bits/s over 1.00 seconds
Interim result: 4480.21 10^6bits/s over 1.00 seconds
Interim result: 4490.07 10^6bits/s over 1.00 seconds
Interim result: 4479.47 10^6bits/s over 1.00 seconds
Interim result: 4480.30 10^6bits/s over 1.00 seconds
Interim result: 4489.14 10^6bits/s over 1.00 seconds
Interim result: 4484.38 10^6bits/s over 1.00 seconds
Interim result: 4473.64 10^6bits/s over 1.00 seconds
Interim result: 4479.71 10^6bits/s over 1.00 seconds
Interim result: 4486.54 10^6bits/s over 1.00 seconds
Interim result: 4456.65 10^6bits/s over 1.01 seconds
Interim result: 4483.70 10^6bits/s over 1.00 seconds
Interim result: 4486.41 10^6bits/s over 1.00 seconds
Interim result: 4489.58 10^6bits/s over 1.00 seconds
Interim result: 4478.15 10^6bits/s over 1.00 seconds
Interim result: 4476.67 10^6bits/s over 1.00 seconds
Interim result: 4496.49 10^6bits/s over 1.00 seconds
Interim result: 4489.26 10^6bits/s over 1.00 seconds
Interim result: 4479.86 10^6bits/s over 1.00 seconds
Interim result: 4500.97 10^6bits/s over 1.00 seconds
Interim result: 4473.96 10^6bits/s over 1.00 seconds
Interim result: 7346.56 10^6bits/s over 1.00 seconds
Interim result: 7524.94 10^6bits/s over 1.00 seconds
Interim result: 7540.16 10^6bits/s over 1.00 seconds
Interim result: 7553.53 10^6bits/s over 1.00 seconds
Interim result: 7552.08 10^6bits/s over 1.00 seconds
Interim result: 7550.08 10^6bits/s over 1.00 seconds
Interim result: 7554.35 10^6bits/s over 1.00 seconds
Interim result: 7550.85 10^6bits/s over 1.00 seconds
Interim result: 7557.27 10^6bits/s over 1.00 seconds
Interim result: 7568.28 10^6bits/s over 1.00 seconds
Interim result: 7497.24 10^6bits/s over 1.01 seconds
Interim result: 7436.44 10^6bits/s over 1.01 seconds
Interim result: 6098.26 10^6bits/s over 1.22 seconds
Interim result: 5644.82 10^6bits/s over 1.08 seconds
Interim result: 5639.07 10^6bits/s over 1.00 seconds
Interim result: 5636.32 10^6bits/s over 1.00 seconds
Interim result: 5640.45 10^6bits/s over 1.00 seconds
Interim result: 6319.06 10^6bits/s over 1.00 seconds
Interim result: 7324.10 10^6bits/s over 1.00 seconds
Interim result: 7323.53 10^6bits/s over 1.00 seconds
Interim result: 7333.88 10^6bits/s over 1.00 seconds
Interim result: 7172.70 10^6bits/s over 1.02 seconds
Interim result: 4488.97 10^6bits/s over 1.60 seconds
Interim result: 4492.37 10^6bits/s over 1.00 seconds
 87380  16384 1000000    60.00      5701.15   17.41    16.26    2.001
1.870

Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems


 

> -----Original Message-----
> From: Jim Mott [mailto:jim at mellanox.com] 
> Sent: Thursday, January 24, 2008 9:47 AM
> To: Scott Weitzenkamp (sweitzen); Weikuan Yu
> Cc: general at lists.openfabrics.org
> Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP 
> performance changes inOFED 1.3 beta, and I get Oops when 
> enabling sdp_zcopy_thresh
> 
> I am really puzzled.  The majority of my testing has been between
> Rhat4U4 and Rhat5.  Using netperf command lines of the form:
>   netperf -C -c -P 0 -t TCP_RR -H 193.168.10.143 -l 60 ---r 64
>   netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 
> ---r 1000000
> and a process of:
>   - set sdp_zcopy_thresh=0, run bandwidth test
>   - set sdp_zcopy_thresh=size, run bandwidth test
> I repeatedly get results that look like this:
>      size     SDP     Bzcopy
>     65536   7375.00   7515.98
>    131072   7465.70   8105.58
>   1000000   6541.87   9948.76
> 
> These numbers are from high end (2-socket, quad-core) machines.  When
> you
> use smaller machines, like the AMD dual-core shown below, the
> differences
> between SDP with and without bzcopy are more striking.
> 
> The process to start the netserver is:
>   export LD_LIBRARY_PATH=/usr/local/ofed/lib64:/usr/local/ofed/lib
>   export LD_PRELOAD=libsdp.so
>   export LIBSDP_CONFIG_FILE=/etc/infiniband/libsdp.conf
>   netserver 
> 
> The process to start the netperf is similar:
>   export LD_LIBRARY_PATH=/usr/local/ofed/lib64:/usr/local/ofed/lib
>   export LD_PRELOAD=libsdp.so
>   export LIBSDP_CONFIG_FILE=/etc/infiniband/libsdp.conf
>   netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 
> ---r 1000000
> 
> You and unload and reload ib_sdp between tests, but I just echo 0 and
> echo size into sdp_zcopy_thresh on the sending side.  Note that it is 
> in a different place on Rhat4u4 and Rhat5.
> 
> My libsdp.conf is the default that ships with OFED.  Stripping the
> comments (grep -v), it is just:
>   log min-level 9 destination file libsdp.log
>   use both server * *:*
>   use both client * *:*
> Note that if you build locally:
>   cd /tmp/openib_gen2/xxxx/ofa_1_3_dev_kernel
>   make install
> the libsdp.conf file seems to get lost.  You must restore it by
> hand.
> 
> I have a shell script that automates this testing for a
> wide range of message sizes:
>   64 128 512 1024 2048 4096 8192 16000 32768 65536 131072 1000000
> on multiple transports:
>   IP		both	"echo datagram > /sys/class/net/ib0/mode"
>   IP-CM	both  "echo connected > /sys/class/net/ib0/mode"
>   SDP		both
>   Bzcopy	TCP_STREAM
> Where both is TCP_RR and TCP_STREAM testing.
> 
> The variance in SDP bandwidth results can be 10%-15% between 
> runs.  The
> difference between Bzcopy and non-Bzcopy is always very 
> visible for 128K
> and up tests though.
> 
> Could some other people please try to run some of these 
> tests?  If only
> help me know if I am crazy?
> 
> Thanks,
> JIm
> 
> Jim Mott
> Mellanox Technologies Ltd.
> mail: jim at mellanox.com
> Phone: 512-294-5481
> 
> 
> -----Original Message-----
> From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com] 
> Sent: Thursday, January 24, 2008 11:17 AM
> To: Jim Mott; Weikuan Yu
> Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance
> changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
> 
> I've tested on RHEL4 and RHEL5, and see no sdp_zcopy_thresh 
> improvement
> for any message size, as measured with netperf, for any Arbel or
> ConnectX HCA.
> 
> Scott
> 
>  
> > -----Original Message-----
> > From: Jim Mott [mailto:jim at mellanox.com] 
> > Sent: Thursday, January 24, 2008 7:57 AM
> > To: Weikuan Yu; Scott Weitzenkamp (sweitzen)
> > Cc: ewg at lists.openfabrics.org; general at lists.openfabrics.org
> > Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP 
> > performance changes inOFED 1.3 beta, and I get Oops when 
> > enabling sdp_zcopy_thresh
> > 
> > Hi,
> >   64K is borderline for seeing bzcopy effect.  Using an AMD 
> > 6000+ (3 Ghz
> > dual core) in Asus M2A-VM motherboard with ConnectX running 
> > 2.3 firmware
> > and OFED 1.3-rc3 stack running on 2.6.23.8 kernel.org kernel, 
> > I ran the
> > test for 128K:
> >   5546  sdp_zcopy_thresh=0 (off)
> >   8709  sdp_zcopy_thresh=65536
> > 
> > For these tests, I just have LD_PRELOAD set in my environment.
> > 
> > =======================
> > 
> > I see that TCP_MAXSEG is not being handled by libsdp and will 
> > look into
> > it.
> > 
> > 
> > [root at dirk ~]# modprobe ib_sdp
> > [root at dirk ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t 
> TCP_STREAM -c
> > -C -- -m 128K
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > 193.168.10.198
> > (193.168.10.198) port 0 AF_INET
> > netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
> > Recv   Send    Send                          Utilization    
>    Service
> > Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send
> > Recv
> > Size   Size    Size     Time     Throughput  local    remote   local
> > remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB
> > us/KB
> > 
> >  87380  16384 131072    30.01      5545.69   51.47    14.43    1.521
> > 1.706  
> > 
> > Alignment      Offset         Bytes    Bytes       Sends   Bytes
> > Recvs
> > Local  Remote  Local  Remote  Xfered   Per                 Per
> > Send   Recv    Send   Recv             Send (avg)          
> Recv (avg)
> >     8       8      0       0 2.08e+10  131072.00    158690  
>  33135.60
> > 627718
> > 
> > Maximum
> > Segment
> > Size (bytes)
> >     -1
> > [root at dirk ~]# echo 65536
> > >/sys/module/ib_sdp/parameters/sdp_zcopy_thresh 
> > [root at dirk ~]# netperf -v2 -4 -H 193.168.10.198 -l 30 -t 
> TCP_STREAM -c
> > -C -- -m 128K
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > 193.168.10.198
> > (193.168.10.198) port 0 AF_INET
> > netperf: get_tcp_info: getsockopt TCP_MAXSEG: errno 92
> > Recv   Send    Send                          Utilization    
>    Service
> > Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send
> > Recv
> > Size   Size    Size     Time     Throughput  local    remote   local
> > remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB
> > us/KB
> > 
> >  87380  16384 131072    30.01      8708.58   50.63    14.55    0.953
> > 1.095  
> > 
> > Alignment      Offset         Bytes    Bytes       Sends   Bytes
> > Recvs
> > Local  Remote  Local  Remote  Xfered   Per                 Per
> > Send   Recv    Send   Recv             Send (avg)          
> Recv (avg)
> >     8       8      0       0 3.267e+10  131072.00    249228 
>   26348.30
> > 1239807
> > 
> > Maximum
> > Segment
> > Size (bytes)
> >     -1
> > 
> > Thanks,
> > JIm
> > 
> > Jim Mott
> > Mellanox Technologies Ltd.
> > mail: jim at mellanox.com
> > Phone: 512-294-5481
> > 
> > 
> > -----Original Message-----
> > From: Weikuan Yu [mailto:weikuan.yu at gmail.com] 
> > Sent: Thursday, January 24, 2008 9:09 AM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: Jim Mott; ewg at lists.openfabrics.org; 
> general at lists.openfabrics.org
> > Subject: Re: [ofa-general] RE: [ewg] Not seeing any SDP performance
> > changes inOFED 1.3 beta, and I get Oops when enabling 
> sdp_zcopy_thresh
> > 
> > Hi, Scott,
> > 
> > I have been running SDP tests across two woodcrest nodes 
> with 4x DDR 
> > cards using OFED-1.2.5.4. The card/firmware info is below.
> > 
> > CA 'mthca0'
> >          CA type: MT25208
> >          Number of ports: 2
> >          Firmware version: 5.1.400
> >          Hardware version: a0
> >          Node GUID: 0x0002c90200228e0c
> >          System image GUID: 0x0002c90200228e0f
> > 
> > I could not get a bandwidth more than 5Gbps like you have 
> shown here. 
> > Wonder if I need to upgrade to the latest software or firmware? Any 
> > suggestions?
> > 
> > Thanks,
> > --Weikuan
> > 
> > 
> > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > 192.168.225.77 
> > (192.168
> > .225.77) port 0 AF_INET
> > Recv   Send    Send                          Utilization      
> >  Service 
> > Demand
> > Socket Socket  Message  Elapsed              Send     Recv     Send
> > Recv
> > Size   Size    Size     Time     Throughput  local    
> remote   local 
> > remote
> > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB
> > us/KB
> > 
> > 131072 131072 131072    10.00      4918.95   21.29    24.99 
>    1.418 
> > 1.665
> > 
> > 
> > Scott Weitzenkamp (sweitzen) wrote:
> > > Jim,
> > > 
> > > I am trying OFED-1.3-20071231-0600 and RHEL4 x86_64 on a dual CPU
> > > (single core each CPU) Xeon system.  I do not see any performance
> > > improvement (either throughput or CPU utilization) using 
> > netperf when
> > I
> > > set /sys/module/ib_sdp/sdp_zcopy_thresh to 16384.  Can 
> you elaborate
> > on
> > > your HCA type, and performance improvement you see?
> > > 
> > > Here's an example netperf command line when using a 
> Cheetah DDR HCA
> > and
> > > 1.2.917 firmware (I have also tried ConnectX and 2.3.000 firmware
> > too):
> > > 
> > > [releng at svbu-qa1850-2 ~]$ LD_PRELOAD=libsdp.so netperf241 
> -v2 -4 -H
> > > 192.168.1.201 -l 30 -t TCP_STREAM -c -C --   -m 65536
> > > TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > 192.168.1.201
> > > (192.168.1.201) port 0 AF_INET : histogram : demo
> > > 
> > > Recv   Send    Send                          Utilization    
> >    Service
> > > Demand
> > > Socket Socket  Message  Elapsed              Send     
> Recv     Send
> > > Recv
> > > Size   Size    Size     Time     Throughput  local    
> remote   local
> > > remote
> > > bytes  bytes   bytes    secs.    10^6bits/s  % S      % S 
>      us/KB
> > > us/KB
> > > 
> > >  87380  16384  65536    30.01      7267.70   55.06    
> 61.27    1.241
> > > 1.381 
> > > 
> > > Alignment      Offset         Bytes    Bytes       Sends   Bytes
> > > Recvs
> > > Local  Remote  Local  Remote  Xfered   Per                 Per
> > > Send   Recv    Send   Recv             Send (avg)          
> > Recv (avg)
> > >     8       8      0       0 2.726e+10  65536.00    415942  
> >  48106.01
> > > 566648
> > > 
> > 
> 
_______________________________________________
general mailing list
general at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list