[ofa-general] SDP performance with bzcopy testing help needed

Craig Prescott prescott at hpc.ufl.edu
Wed Feb 13 21:32:03 PST 2008


Scott Weitzenkamp (sweitzen) wrote:
>> But the effect is still clear.
>>
>> throughput:
>>
>>                64K    128K      1M
>>    SDP      7602.40  7560.57  5791.56
>>    BZCOPY   5454.20  6378.48  7316.28
>>     
>
> Looks unclear to me.  Sometimes BZCOPY does better, sometimes worse.
>
>   
Fair enough.

While measuring a broader spectrum of message sizes, I noted a
big variation in throughput and send service demand for the SDP
case as a function of which core/CPU the netperf ran on. 
Particularly, which CPU the netperf ran on relative to which
CPU was handling the interrupts for ib_mthca.

Netperf has an option (-T) to allow for local and remote cpu
binding.  So I used it to force the client and server to run on
CPU 0.  Further, I mapped all ib_mthca interrupts to CPU 1 (irqbalance
was already disabled).  This appears to have reduced the statistical
error between netperf runs to negligible amounts.  I'll do more runs
to verify this and check out the other permutations, but this is what
has come out so far.

TPUT = throughput (Mbits/sec)
LCL  = send service demand (usec/KB)
RMT  = recv service demand (usec/KB)

"-T 0,0" option given to netperf client:

                     SDP                   BZCOPY
           --------------------    --------------------
MESGSIZE     TPUT    LCL   RMT      TPUT     LCL   RMT
--------   -------  ----- -----    -------  ----- -----
64K        7581.14  0.746 1.105    5547.66  1.491 1.495
128K       7478.37  0.871 1.116    6429.84  1.282 1.291
256K       7427.38  0.946 1.115    6917.20  1.197 1.201
512K       7310.14  1.122 1.129    7229.13  1.145 1.150
1M         7251.29  1.143 1.129    7457.95  0.996 1.109
2M         7249.27  1.146 1.133    7340.26  0.502 1.105
4M         7217.26  1.156 1.136    7322.63  0.397 1.096

In this case, BZCOPY send service demand is significantly
less for the largest message sizes, though the throughput
for large messages is not very different.

However, with "-T 2,2", the result looks like this:

                     SDP                   BZCOPY
           --------------------    --------------------
MESGSIZE     TPUT    LCL   RMT      TPUT     LCL   RMT
--------   -------  ----- -----    -------  ----- -----
64K        7599.40  0.841 1.114    5493.56  1.510 1.585
128K       7556.53  1.039 1.121    6483.12  1.274 1.325
256K       7155.13  1.128 1.180    6996.30  1.180 1.220
512K       5984.26  1.357 1.277    7285.86  1.130 1.166
1M         5641.28  1.443 1.343    7250.43  0.811 1.141
2M         5657.98  1.439 1.387    7265.85  0.492 1.127
4M         5623.94  1.447 1.370    7274.43  0.385 1.112

For BZCOPY, the results are pretty similar; but for SDP,
the service demands are much higher, and the throughputs
have dropped dramatically relative to "-T 0,0".

In either case, though, BZCOPY is more efficient for
large messages.

Cheers,
Craig




More information about the general mailing list