[openib-general] Re: Re: Userspace testing results (many kernels, many svn trees)

Michael S. Tsirkin mst at mellanox.co.il
Tue Jan 24 08:36:30 PST 2006


Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [openib-general] Re: Re: Userspace testing results (many kernels, many svn trees)
> 
>     Michael> Could the high/low bits be swapped?  What happends if you
>     Michael> change cycles_t from long long to long?  Could you try
>     Michael> running the clock_test utility?
> 
> What seems to be happening is that mftb is giving the low 32 bits of
> the timebase (as expected on ppc32).  Since your get_cycles() is
> returning a long long, those 32 bits get put in the most significant
> 32 bits of the return value, and the low 32 bits are garbage (ppc is
> big endian).
> 
> If I compile clock_test for ppc32, I see that get_cycles() compiles to:
> 
> 	1000064c <get_cycles>:
> 	1000064c:	7c 6c 42 e6 	mftb    r3
> 	10000650:	4e 80 00 20 	blr
> 
> For comparison, a function like
> 
> 	unsigned long long blah(void) { return 0x100000002ull; }
> 
> compiles to
> 
> 	00000000 <blah>:
> 	   0:	38 60 00 01 	li      r3,1
> 	   4:	38 80 00 02 	li      r4,2
> 	   8:	4e 80 00 20 	blr
> 
> In other words the convention on ppc32 is that unsigned long long
> return values have the high 32 bits in r3 and the low 32 bits in r4.
> 
> I think you want to use something like
> 
> 	typedef unsigned long long cycles_t;
> 	static inline cycles_t get_cycles()
> 	{
> 		unsigned long low, hi, hi2;
> 	
> 		do {
> 			asm volatile ("mftbu %0" : "=r" (hi));
> 			asm volatile ("mftb  %0" : "=r" (low));
> 			asm volatile ("mftbu %0" : "=r" (hi2));
> 		} while (hi != hi2);
> 	
> 		return ((unsigned long long) hi << 32) | low;
> 	}
> 
> for ppc32.

I'm convinced, I moved it back to 32 bit.

> However, this is not quite enough to make things work on
> all powerpc systems, because the timebase does not necessarily run at
> the same speed as the CPU.  For example, on an IBM JS20 blade,
> clock_test prints
> 
> 	1 sec = 6536.8 usec
> 	1 sec = 6537.05 usec
> 
> (both as a 32-bit and 64-bit executable) because, as /proc/cpuinfo shows:
> 
> 	processor	: 0
> 	cpu		: PPC970FX, altivec supported
> 	clock		: 2194.624509MHz
> 	revision	: 3.0
> 	
> 	processor	: 1
> 	cpu		: PPC970FX, altivec supported
> 	clock		: 2194.624509MHz
> 	revision	: 3.0
> 	
> 	timebase	: 14318000
> 	machine		: CHRP IBM,8842-P2C
> 
> the timebase runs at about 14.3 MHz, or approx 153 times slower than
> the CPU clock.
> 
> I'm not sure how you want to fix this in perftest.

I just added some cycle calibration code to get_cpu_mhz().
Check it out (you can just run clock_test).

-- 
MST



More information about the general mailing list