[ofa-general] SDP performance with bzcopy testing help needed
Craig Prescott
prescott at hpc.ufl.edu
Wed Feb 13 21:32:03 PST 2008
Scott Weitzenkamp (sweitzen) wrote:
>> But the effect is still clear.
>>
>> throughput:
>>
>> 64K 128K 1M
>> SDP 7602.40 7560.57 5791.56
>> BZCOPY 5454.20 6378.48 7316.28
>>
>
> Looks unclear to me. Sometimes BZCOPY does better, sometimes worse.
>
>
Fair enough.
While measuring a broader spectrum of message sizes, I noted a
big variation in throughput and send service demand for the SDP
case as a function of which core/CPU the netperf ran on.
Particularly, which CPU the netperf ran on relative to which
CPU was handling the interrupts for ib_mthca.
Netperf has an option (-T) to allow for local and remote cpu
binding. So I used it to force the client and server to run on
CPU 0. Further, I mapped all ib_mthca interrupts to CPU 1 (irqbalance
was already disabled). This appears to have reduced the statistical
error between netperf runs to negligible amounts. I'll do more runs
to verify this and check out the other permutations, but this is what
has come out so far.
TPUT = throughput (Mbits/sec)
LCL = send service demand (usec/KB)
RMT = recv service demand (usec/KB)
"-T 0,0" option given to netperf client:
SDP BZCOPY
-------------------- --------------------
MESGSIZE TPUT LCL RMT TPUT LCL RMT
-------- ------- ----- ----- ------- ----- -----
64K 7581.14 0.746 1.105 5547.66 1.491 1.495
128K 7478.37 0.871 1.116 6429.84 1.282 1.291
256K 7427.38 0.946 1.115 6917.20 1.197 1.201
512K 7310.14 1.122 1.129 7229.13 1.145 1.150
1M 7251.29 1.143 1.129 7457.95 0.996 1.109
2M 7249.27 1.146 1.133 7340.26 0.502 1.105
4M 7217.26 1.156 1.136 7322.63 0.397 1.096
In this case, BZCOPY send service demand is significantly
less for the largest message sizes, though the throughput
for large messages is not very different.
However, with "-T 2,2", the result looks like this:
SDP BZCOPY
-------------------- --------------------
MESGSIZE TPUT LCL RMT TPUT LCL RMT
-------- ------- ----- ----- ------- ----- -----
64K 7599.40 0.841 1.114 5493.56 1.510 1.585
128K 7556.53 1.039 1.121 6483.12 1.274 1.325
256K 7155.13 1.128 1.180 6996.30 1.180 1.220
512K 5984.26 1.357 1.277 7285.86 1.130 1.166
1M 5641.28 1.443 1.343 7250.43 0.811 1.141
2M 5657.98 1.439 1.387 7265.85 0.492 1.127
4M 5623.94 1.447 1.370 7274.43 0.385 1.112
For BZCOPY, the results are pretty similar; but for SDP,
the service demands are much higher, and the throughputs
have dropped dramatically relative to "-T 0,0".
In either case, though, BZCOPY is more efficient for
large messages.
Cheers,
Craig
More information about the general
mailing list