[openib-general] Re: Re: [PATCH] rdma_lat-09 and results
Michael S. Tsirkin
mst at mellanox.co.il
Thu Jun 2 00:12:03 PDT 2005
Quoting r. Gleb Natapov <glebn at voltaire.com>:
> Subject: Re: Re: [PATCH] rdma_lat-09 and results
>
> On Thu, Jun 02, 2005 at 08:29:34AM +0300, Michael S. Tsirkin wrote:
> > Quoting r. Grant Grundler <iod00d at hp.com>:
> > > Subject: [PATCH] rdma_lat-09 and results
> > >
> > > Michael,
> > >
> > > Good news:
> > > My next cleanup of rdma_lat.c is working and patch is appended.
> > > Summary of changes below.
> > >
> > > Bad News:
> > > perf is about ~15 cycles slower since the last time I tested.
> > > (Hrm...maybe it's time to cycle power on the TS90 switch again.)
> > >
> > >
> > > Here's with the new rdma_lat.c:
> > > grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat -C
> > > local address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
> > > remote address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
> > > Latency typical: 7140 cycles
> > > Latency best : 6915 cycles
> > > Latency worst : 52915.5 cycles
> > > grundler at gsyprf3:/usr/src/openib_gen2/src/userspace/perftest$
> > >
> > > And the "client" side:
> > > grundler at iota:/usr/src/openib_gen2/src/userspace/perftest$ ./rdma_lat -C 10.0.0.51
> > > local address: LID 0x25 QPN 0x70406 PSN 0x5d4824 RKey 0x2a0434 VAddr 0x6000000000014001
> > > remote address: LID 0x27 QPN 0x80406 PSN 0x9188f7 RKey 0x300434 VAddr 0x6000000000014001
> > > Latency typical: 7140 cycles
> > > Latency best : 6907 cycles
> > > Latency worst : 94920 cycles
> > >
> > >
> > > The previous set of rdma_lat results are here:
> > > http://openib.org/pipermail/openib-general/2005-May/006721.html
> > >
> > > I'll guess the previous SVN verion was no older than r2229.
> > >
> > >
> > > I get 7140 to 7151 for the original rdma_lat. Usually 7147.5.
> > > I get 7132 to 7155 with my version of rdma_lat. Usually 7140.
> > > No statistically significant differences.
> > > Both essentially agree on the higher result.
> > > Using "-n 10000" gave more consistent results *
> >
> > I changed the timestamping strategy. I used to:
> >
> > post
> > tstamp
> > poll
> > post
> > tstamp
> > poll
> > post
> > tstamp
> > poll
> > post
> > tstamp
> > poll
> >
> > This meant that tstamp instruction was out of the data path,
> > while we did polling.
> > On the negative side, although the average (and likely median) delta
> > between tstamps was a reliable measurement of round trip time
> > (since there was one tstamp each roundtrip),
> > min/max values were not measuring anything reliably: if I start polling
> > late, two tstamps can be closer than what the wire allows for.
> >
> > So I changed that to:
> >
> >
> > tstamp
> > post
> > poll
> > tstamp
> > post
> > poll
> > tstamp
> > post
> > poll
> > tstamp
> > post
> > poll
> >
> > And now, on the plus side, the mix/max deltas are actually pessimistic about
> > roundtrip times, on the minus side, we are calling tstamp on detapath,
> > slowing it down. ~15 cycles is a bit high: of course tstamp
> > needs to prevent instructions from being reordered across it, and so
> > it should take on the order of the pipeline depth to perform, but then
> > maybe its a microcode thing.
> >
> This is what I recently saw on linux-kernel:
>
> ### begin quote of Andi Kleen
>
> > RDTSC on older Intel CPUs takes something like 6 cycles. On P4's it
> > takes much more, since it's decoded to a microcode MSR access.
>
> It actually seems to flush the trace cache, because Intel figured
> out that out of order RDTSC is probably not too useful (which is right)
> and the only way to ensure that on Netburst seems to stop the trace
> cache in its track. That can be pretty slow, we're talking 1000+ cycles
> here.
> ### end quote
OK, but I imagine thats probably a worst case.
I certainly see nothing like a 1 usec here, and my systems
are P4 based Xeons.
> > I'm not against going back to the previous measurement, but we'd have to
> > give up the min/max reporting since its an artefact.
> > What do you say?
--
MST
More information about the general
mailing list