[openib-general] Re: [PATCH] rdma_lat-09 and results

Wed Jun 1 22:29:34 PDT 2005

Quoting r. Grant Grundler <iod00d at hp.com>:
> Subject: [PATCH] rdma_lat-09 and results
> 
> Michael,
> 
> Good news:
> 	My next cleanup of rdma_lat.c is working and patch is appended.
> 	Summary of changes below.
> 
> Bad News:
> 	perf is about ~15 cycles slower since the last time I tested.
> 	(Hrm...maybe it's time to cycle power on the TS90 switch again.)
> 
> 
> Here's with the new rdma_lat.c:
> grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat  -C
>    local address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
>   remote address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
> Latency typical: 7140 cycles
> Latency best   : 6915 cycles
> Latency worst  : 52915.5 cycles
> grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$ 
> 
> And the "client" side:
> grundler at iota:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat -C 10.0.0.51
>    local address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
>   remote address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
> Latency typical: 7140 cycles
> Latency best   : 6907 cycles
> Latency worst  : 94920 cycles
> 
> 
> The previous set of rdma_lat results are here:
>     http://openib.org/pipermail/openib-general/2005-May/006721.html
> 
> I'll guess the previous SVN verion was no older than r2229.
> 
> 
> I get 7140 to 7151 for the original rdma_lat.   Usually 7147.5.
> I get 7132 to 7155 with my version of rdma_lat. Usually 7140.
> No statistically significant differences.
> Both essentially agree on the higher result.
> Using "-n 10000" gave more consistent results *

I changed the timestamping strategy. I used to:

post
tstamp
poll
post
tstamp
poll
post
tstamp
poll
post
tstamp
poll

This meant that tstamp instruction was out of the data path,
while we did polling.
On the negative side, although the average (and likely median) delta
between tstamps was a reliable measurement of round trip time
(since there was one tstamp each roundtrip),
min/max values were not measuring anything reliably: if I start polling
late, two tstamps can be closer than what the wire allows for.

So I changed that to:

tstamp
post
poll
tstamp
post
poll
tstamp
post
poll
tstamp
post
poll

And now, on the plus side, the mix/max deltas are actually pessimistic about
roundtrip times, on the minus side, we are calling tstamp on detapath,
slowing it down. ~15 cycles is a bit high: of course tstamp
needs to prevent instructions from being reordered across it, and so
it should take on the order of the pipeline depth to perform, but then
maybe its a microcode thing.

I'm not against going back to the previous measurement, but we'd have to
give up the min/max reporting since its an artefact.
What do you say?

-- 
MST