[ofa-general] why is CPU util/service demand so much higher with SDP than TCP?
Rick Jones
rick.jones2 at hp.com
Thu Apr 26 18:21:54 PDT 2007
So, while playing around with my new netperf SDP_RR test I've noticed that a
single-byte _RR test over SDP has a much higher transactions per second (ie
lower latency) than over TCP over the same HCA, but the CPU utilization is
_very_ much higher and the service demand (cpu per transaction) as well. CPU
util being higher makes sense with a higher transaction rate, but not the
increased service demand - well at least not to my experience thusfar.
[root at hpcpc106 ~]# for i in SDP_RR TCP_RR; do netperf -t $i -l 60 -c -C -H
192.168.0.107;done
SDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.107
(192.168.0.107) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
126976 126976 1 1 60.00 37868.61 28.02 27.65 29.598 29.210
126976 126976
TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.107
(192.168.0.107) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem
Send Recv Size Size Time Rate local remote local remote
bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr
87380 87380 1 1 60.00 19281.49 3.40 3.90 7.049 8.089
87380 87380
The systems here are running RHEL5:
[root at hpcpc106 ~]# uname -a
Linux hpcpc106.cup.hp.com 2.6.18-8.el5 #1 SMP Fri Jan 26 14:16:09 EST 2007 ia64
ia64 ia64 GNU/Linux
and whatever bits come with that (this is not OFED 1.2 rc bits - I still don't
know how to remove enough of what ships with RHEL5 to put all of OFED 1.2 (well,
the modules I want) on there without conflict. I'm not sure how to check the
versions - normally I'd use ethtool, but that doesn't work against an ibN
device. Someone elsewhere suggested that the bits in RHEL5 might be OFED 1.1.
These systems have four real cores, and no HW threads enabled, so 25% CPU util
means that the equivalent of an entire CPU core is being consumed.
Before I start trying to hit the system with a profiler I thought I would ask if
this was expected with SDP. Normally a single-instance, single-byte _RR test
between otherwise identical systems consumes at most 50% of a core ( a bit
handwaving, but that has been my experience thusfar)
rick jones
More information about the general
mailing list