[openib-general] Re: [PATCH] rdma_lat-09 and results

Gleb Natapov glebn at voltaire.com
Thu Jun 2 00:01:43 PDT 2005


On Thu, Jun 02, 2005 at 08:29:34AM +0300, Michael S. Tsirkin wrote:
> Quoting r. Grant Grundler <iod00d at hp.com>:
> > Subject: [PATCH] rdma_lat-09 and results
> > 
> > Michael,
> > 
> > Good news:
> > 	My next cleanup of rdma_lat.c is working and patch is appended.
> > 	Summary of changes below.
> > 
> > Bad News:
> > 	perf is about ~15 cycles slower since the last time I tested.
> > 	(Hrm...maybe it's time to cycle power on the TS90 switch again.)
> > 
> > 
> > Here's with the new rdma_lat.c:
> > grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat  -C
> >    local address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
> >   remote address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
> > Latency typical: 7140 cycles
> > Latency best   : 6915 cycles
> > Latency worst  : 52915.5 cycles
> > grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$ 
> > 
> > And the "client" side:
> > grundler at iota:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat -C 10.0.0.51
> >    local address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
> >   remote address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
> > Latency typical: 7140 cycles
> > Latency best   : 6907 cycles
> > Latency worst  : 94920 cycles
> > 
> > 
> > The previous set of rdma_lat results are here:
> >     http://openib.org/pipermail/openib-general/2005-May/006721.html
> > 
> > I'll guess the previous SVN verion was no older than r2229.
> > 
> > 
> > I get 7140 to 7151 for the original rdma_lat.   Usually 7147.5.
> > I get 7132 to 7155 with my version of rdma_lat. Usually 7140.
> > No statistically significant differences.
> > Both essentially agree on the higher result.
> > Using "-n 10000" gave more consistent results *
> 
> I changed the timestamping strategy. I used to:
> 
> post
> tstamp
> poll
> post
> tstamp
> poll
> post
> tstamp
> poll
> post
> tstamp
> poll
> 
> This meant that tstamp instruction was out of the data path,
> while we did polling.
> On the negative side, although the average (and likely median) delta
> between tstamps was a reliable measurement of round trip time
> (since there was one tstamp each roundtrip),
> min/max values were not measuring anything reliably: if I start polling
> late, two tstamps can be closer than what the wire allows for.
> 
> So I changed that to:
> 
> 
> tstamp
> post
> poll
> tstamp
> post
> poll
> tstamp
> post
> poll
> tstamp
> post
> poll
> 
> And now, on the plus side, the mix/max deltas are actually pessimistic about
> roundtrip times, on the minus side, we are calling tstamp on detapath,
> slowing it down. ~15 cycles is a bit high: of course tstamp
> needs to prevent instructions from being reordered across it, and so
> it should take on the order of the pipeline depth to perform, but then
> maybe its a microcode thing.
This is what I recently saw on linux-kernel:

### begin quote of Andi Kleen

> RDTSC on older Intel CPUs takes something like 6 cycles. On P4's it
> takes much more, since it's decoded to a microcode MSR access.

It actually seems to flush the trace cache, because Intel figured
out that out of order RDTSC is probably not too useful (which is right) 
and the only way to ensure that on Netburst seems to stop the trace
cache in its track. That can be pretty slow, we're talking 1000+ cycles
here.
### end quote

> 
> I'm not against going back to the previous measurement, but we'd have to
> give up the min/max reporting since its an artefact.
> What do you say?
> 
> -- 
> MST
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

--
			Gleb.



More information about the general mailing list