[openib-general] Re: Re: Userspace testing results (many kernels, many svn trees)
Michael S. Tsirkin
mst at mellanox.co.il
Tue Jan 24 08:36:30 PST 2006
Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [openib-general] Re: Re: Userspace testing results (many kernels, many svn trees)
>
> Michael> Could the high/low bits be swapped? What happends if you
> Michael> change cycles_t from long long to long? Could you try
> Michael> running the clock_test utility?
>
> What seems to be happening is that mftb is giving the low 32 bits of
> the timebase (as expected on ppc32). Since your get_cycles() is
> returning a long long, those 32 bits get put in the most significant
> 32 bits of the return value, and the low 32 bits are garbage (ppc is
> big endian).
>
> If I compile clock_test for ppc32, I see that get_cycles() compiles to:
>
> 1000064c <get_cycles>:
> 1000064c: 7c 6c 42 e6 mftb r3
> 10000650: 4e 80 00 20 blr
>
> For comparison, a function like
>
> unsigned long long blah(void) { return 0x100000002ull; }
>
> compiles to
>
> 00000000 <blah>:
> 0: 38 60 00 01 li r3,1
> 4: 38 80 00 02 li r4,2
> 8: 4e 80 00 20 blr
>
> In other words the convention on ppc32 is that unsigned long long
> return values have the high 32 bits in r3 and the low 32 bits in r4.
>
> I think you want to use something like
>
> typedef unsigned long long cycles_t;
> static inline cycles_t get_cycles()
> {
> unsigned long low, hi, hi2;
>
> do {
> asm volatile ("mftbu %0" : "=r" (hi));
> asm volatile ("mftb %0" : "=r" (low));
> asm volatile ("mftbu %0" : "=r" (hi2));
> } while (hi != hi2);
>
> return ((unsigned long long) hi << 32) | low;
> }
>
> for ppc32.
I'm convinced, I moved it back to 32 bit.
> However, this is not quite enough to make things work on
> all powerpc systems, because the timebase does not necessarily run at
> the same speed as the CPU. For example, on an IBM JS20 blade,
> clock_test prints
>
> 1 sec = 6536.8 usec
> 1 sec = 6537.05 usec
>
> (both as a 32-bit and 64-bit executable) because, as /proc/cpuinfo shows:
>
> processor : 0
> cpu : PPC970FX, altivec supported
> clock : 2194.624509MHz
> revision : 3.0
>
> processor : 1
> cpu : PPC970FX, altivec supported
> clock : 2194.624509MHz
> revision : 3.0
>
> timebase : 14318000
> machine : CHRP IBM,8842-P2C
>
> the timebase runs at about 14.3 MHz, or approx 153 times slower than
> the CPU clock.
>
> I'm not sure how you want to fix this in perftest.
I just added some cycle calibration code to get_cpu_mhz().
Check it out (you can just run clock_test).
--
MST
More information about the general
mailing list