[ofa-general] RE: [ewg] Not seeing any SDP performance changes
Jim Mott
jim at mellanox.com
Mon Feb 4 10:34:41 PST 2008
Hi,
I am back in the office and have installed a fresh Rhat4U4 system on a
test machine that was running Rhat5. The only non-default options I
used were:
- No fireware
- Disable SELinux
Then I built Netperf 2.4.3 on the new system. (./configure; make; make
install)
Then I downloaded and installed today's OFED 1.3 release.
At this point I have two identical hardware platforms running Rhat4U4
(2.6.9-42.ELsmp) kernel right off the install media. They are both
running Netperf 2.4.3 and today's OFED stack. Both are using ConnectX
cards with 2.3 firmware.
Running as root on both (netserver and netperf) sides my little shell
script pulled out the following bandwidth numbers:
64K 128K 1M
SDP 8215.17 6429.09 6862.66
BZCOPY 8748.00 9997.07 9847.76
Looking at uS/KB transferred we see:
64K 128K 1M
LCL RMT LCL RMT LCL RMT
SDP 1.025 1.243 1.391 1.493 1.274 1.407
BZCOPY 0.966 1.148 0.838 1.014 0.603 0.984
The output of "lspci -vv" for the HCA is:
0a:00.0 InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)
Subsystem: Mellanox Technologies: Unknown device 634a
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size 10
Interrupt: pin A routed to IRQ 169
Region 0: Memory at b9300000 (64-bit, non-prefetchable)
[size=1M]
Region 2: Memory at b8000000 (64-bit, prefetchable) [size=8M]
Region 4: Memory at b9400000 (64-bit, non-prefetchable)
[size=8K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Capabilities: [9c] MSI-X: Enable+ Mask- TabSize=256
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00001000
Capabilities: [60] Express Endpoint IRQ 0
Device: Supported: MaxPayload 256 bytes, PhantFunc 0,
ExtTag+
Device: Latency L0s <64ns, L1 unlimited
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable+ Non-Fatal+ Fatal+
Unsupported-
Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port
8
Link: Latency L0s unlimited, L1 unlimited
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x8
So I know it is an 8x PCIe (gen1) slot, and I am running MaxReadReq=512.
To verify that it is not something strange in my script, I executed the
bandwidth commands by hand:
# echo $LD_PRELOAD
libsdp.so
# echo 0 > /sys/module/ib_sdp/sdp_zcopy_thresh
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 64K
87380 16384 16384 60.00 7106.72 13.32 14.87 1.228 1.370
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 128K
87380 16384 16384 60.00 6906.18 14.02 15.18 1.330 1.441
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 1M
87380 16384 16384 60.00 7030.98 13.97 15.13 1.303 1.410
# echo 1 > /sys/module/ib_sdp/sdp_zcopy_thresh
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 64K
87380 16384 16384 60.00 6491.93 13.83 14.90 1.396 1.504
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 128K
87380 16384 16384 60.00 6536.61 14.19 14.80 1.423 1.484
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---r 1M
87380 16384 16384 60.00 6623.94 13.68 14.82 1.353 1.466
Now these numbers look like what you report. The problem here is that
we are giving SDP data in 16K chunks (Send Message Size bytes is 16384),
and the overhead of pinning 16K, sending it, and unpinning it is too
high to give us any benefit.
Rerunning the whole test with -m instead of -r give me the numbers that
I keep reporting:
# echo $LD_PRELOAD
libsdp.so
# echo 0 > /sys/module/ib_sdp/sdp_zcopy_thresh
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 64K
87380 16384 65536 60.00 8323.20 12.96 13.64 1.020 1.074
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 128K
87380 16384 131072 60.00 6661.77 13.74 13.41 1.352 1.320
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 1M
87380 16384 1048576 60.00 6691.83 13.39 13.58 1.312 1.330
# echo 1 > /sys/module/ib_sdp/sdp_zcopy_thresh
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 64K
87380 16384 65536 60.00 9052.22 12.88 14.18 0.932 1.027
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 128K
87380 16384 131072 60.00 10294.87 12.70 13.44 0.808 0.855
# netperf -C -c -P 0 -t TCP_STREAM -H 193.168.10.143 -l 60 ---m 1M
87380 16384 1048576 60.00 10254.89 7.74 13.07 0.495 0.835
Maybe this is the problem? My tests are giving sdp_sendmsg() enough
data to sink its teeth into. When you send a big buffer, instead of
lots of little ones, you can see the benefit.
Could you guys try the "-m size" instead of "-r size" and see if that
works better?
Thanks,
JIm
Jim Mott
Mellanox Technologies Ltd.
mail: jim at mellanox.com
Phone: 512-294-5481
-----Original Message-----
From: Jim Mott
Sent: Friday, January 25, 2008 4:07 PM
To: 'Scott Weitzenkamp (sweitzen)'; Weikuan Yu
Cc: general at lists.openfabrics.org
Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance
changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Not today, but I will give it a shot next time I get a free machine. I
have tested between Rhat4u4 MLX4 and Rhat4u4 mthca and seen the same
trend though.
Thanks,
JIm
Jim Mott
Mellanox Technologies Ltd.
mail: jim at mellanox.com
Phone: 512-294-5481
-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com]
Sent: Friday, January 25, 2008 4:03 PM
To: Jim Mott; Weikuan Yu
Cc: general at lists.openfabrics.org
Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance
changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
Is there any way you can make sender and receiver the same RHEL kernel?
> -----Original Message-----
> From: Jim Mott [mailto:jim at mellanox.com]
> Sent: Friday, January 25, 2008 1:58 PM
> To: Scott Weitzenkamp (sweitzen); Weikuan Yu
> Cc: general at lists.openfabrics.org
> Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP
> performance changes inOFED 1.3 beta, and I get Oops when
> enabling sdp_zcopy_thresh
>
> Receive side:
> - 2.6.23.8 kernel.org kernel on Rhat5 distro
> - HCA is MLX4 with 2.3.914
> I get the same number on released 2.3 firmware
>
> Send side:
> - 2.6.9-42.ELsmp x86_64 (Rhat4u4)
> - HCA is MLX4 with 2.3.914
>
> I get the same trends (SDP < BZCOPY if message_size > 64K) on
> unmodifed
> Rhat5, Rhat4u4, and SLES10-SP1-RT distros. I also see it on
> kernel.org
> kernels 2.6.23.12, 2.6.24-rc2, 2.6.23, and 2.6.22.9. I am in
> the midst
> of testing some things, so I do not have all the machines available
> right now to repeat most of the tests though.
>
>
> Thanks,
> JIm
>
> Jim Mott
> Mellanox Technologies Ltd.
> mail: jim at mellanox.com
> Phone: 512-294-5481
>
>
> -----Original Message-----
> From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com]
> Sent: Friday, January 25, 2008 3:39 PM
> To: Jim Mott; Weikuan Yu
> Cc: general at lists.openfabrics.org
> Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance
> changes inOFED 1.3 beta, and I get Oops when enabling sdp_zcopy_thresh
>
> Jim, what kernel and HCA are these numbers for?
>
> Scott
>
>
>
> > -----Original Message-----
> > From: Jim Mott [mailto:jim at mellanox.com]
> > Sent: Friday, January 25, 2008 11:09 AM
> > To: Scott Weitzenkamp (sweitzen); Weikuan Yu
> > Cc: general at lists.openfabrics.org
> > Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP
> > performance changes inOFED 1.3 beta, and I get Oops when
> > enabling sdp_zcopy_thresh
> >
> > Right you are (as usual).
> >
> > Hunting around these systems shows that I have been using
> > netperf-2.4.3
> > for testing. No configuration options; just ./configure; make; make
> > install.
> >
> > To try and understand version differences, I installed 2.4.1 (your
> > version?), 2.4.3, and 2.4.4. Built them with default
> options and ran
> > the tests using each.
> >
> > Using netperf-2.4.1 and reran "netperf -v2 -4 -H
> > 193.168.10.143 -l 30 -t
> > TCP_STREAM -c -C -- -m size" with target AMD and driver as
> > 8-processor
> > Intel:
> >
> > 64K 128K 1M
> > SDP 7749.66 6925.68 6281.17
> > BZCOPY 8492.85 9867.06 11105.50
> >
> > I tried running these tests a few times and saw a lot of
> > variance in the
> > reported results. Reloading 2.4.3 and running the same tests:
> >
> > 64K 128K 1M
> > SDP 7553.77 6747.58 5986.42
> > BZCOPY 8839.46 9572.49 10654.52
> >
> > and finally, I tried 2.4.4 and running the same tests:
> >
> > 64K 128K 1M
> > SDP 7935.97 6325.69 7682.65
> > BZCOPY 8905.94 9935.45 10615.03
> >
> > At this point, I am confused. The difference between SDP with and
> > without Bzcopy is obvious in all three sets of numbers. I can not
> > explain why you see something different.
> >
> > If you could try a vanilla netperf build, it would be
> > interesting to see
> > if you get any different results.
> >
> > Thanks,
> > JIm
> >
> > Jim Mott
> > Mellanox Technologies Ltd.
> > mail: jim at mellanox.com
> > Phone: 512-294-5481
> >
> >
> > -----Original Message-----
> > From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com]
> > Sent: Friday, January 25, 2008 10:36 AM
> > To: Jim Mott; Jim Mott; Weikuan Yu
> > Cc: general at lists.openfabrics.org
> > Subject: RE: [ofa-general] RE: [ewg] Not seeing any SDP performance
> > changes inOFED 1.3 beta, and I get Oops when enabling
> sdp_zcopy_thresh
> >
> > > So I see your results (sort of). I have been using the
> > > netperf that ships with the OS (Rhat4u4 and Rhat5 mostly) or
> > > is built with
> > > default options. Maybe that is the difference.
> >
> > Jim, AFAIK Red Hat does not ship netperf with RHEL.
> >
> > Scott
> >
>
More information about the general
mailing list