[openib-general] Re: Re: Userspace testing results (many kernels, many svn trees)

Roland Dreier rdreier at cisco.com
Mon Jan 23 21:52:46 PST 2006


    Michael> Could the high/low bits be swapped?  What happends if you
    Michael> change cycles_t from long long to long?  Could you try
    Michael> running the clock_test utility?

What seems to be happening is that mftb is giving the low 32 bits of
the timebase (as expected on ppc32).  Since your get_cycles() is
returning a long long, those 32 bits get put in the most significant
32 bits of the return value, and the low 32 bits are garbage (ppc is
big endian).

If I compile clock_test for ppc32, I see that get_cycles() compiles to:

	1000064c <get_cycles>:
	1000064c:	7c 6c 42 e6 	mftb    r3
	10000650:	4e 80 00 20 	blr

For comparison, a function like

	unsigned long long blah(void) { return 0x100000002ull; }

compiles to

	00000000 <blah>:
	   0:	38 60 00 01 	li      r3,1
	   4:	38 80 00 02 	li      r4,2
	   8:	4e 80 00 20 	blr

In other words the convention on ppc32 is that unsigned long long
return values have the high 32 bits in r3 and the low 32 bits in r4.

I think you want to use something like

	typedef unsigned long long cycles_t;
	static inline cycles_t get_cycles()
	{
		unsigned long low, hi, hi2;
	
		do {
			asm volatile ("mftbu %0" : "=r" (hi));
			asm volatile ("mftb  %0" : "=r" (low));
			asm volatile ("mftbu %0" : "=r" (hi2));
		} while (hi != hi2);
	
		return ((unsigned long long) hi << 32) | low;
	}

for ppc32.  However, this is not quite enough to make things work on
all powerpc systems, because the timebase does not necessarily run at
the same speed as the CPU.  For example, on an IBM JS20 blade,
clock_test prints

	1 sec = 6536.8 usec
	1 sec = 6537.05 usec

(both as a 32-bit and 64-bit executable) because, as /proc/cpuinfo shows:

	processor	: 0
	cpu		: PPC970FX, altivec supported
	clock		: 2194.624509MHz
	revision	: 3.0
	
	processor	: 1
	cpu		: PPC970FX, altivec supported
	clock		: 2194.624509MHz
	revision	: 3.0
	
	timebase	: 14318000
	machine		: CHRP IBM,8842-P2C

the timebase runs at about 14.3 MHz, or approx 153 times slower than
the CPU clock.

I'm not sure how you want to fix this in perftest.

 - R.



More information about the general mailing list