[openib-general] user-mode verbs on Itanium

Grant Grundler iod00d at hp.com
Mon May 9 08:57:38 PDT 2005


On Sun, May 08, 2005 at 03:49:16PM +0300, Michael S. Tsirkin wrote:
> Besides the new flags, I noticed the following changes:
> - asm/timex.h usage to get cycles.
>   Unfortunately not portable to all platforms:
>   on x86_64 asm/timex.h includes linux/config so its not legal
>   for userspace to include. Please just implement get_cycles instead.

That's a bug in the x86_64 header file. x86_64 doesn't need config.h.
(as noted below, ppc doesn't either)

But i386 does need config.h since TSC wasn't implemented until pentium
boxes came around. Debian happens to do something that works:
"linux/config.h" and "linux/autoconf.h" are installed in /usr/include/
and based on some standard kernel they build. So it "works for me"
and feels like the right answer.

IMHO it's just obviously stupid to replicate get_cycles in a private file
when it's required (and available) in existing the arch specific header files.

> - get_cpu_mhz() instead of get_cpu_khz() - is that important?
>   Actually I planned to change get_cpu_khz() to cpu_khz to match
>   linux in-kernel interface. Does this make sense to you?

No. Only i386 defines cpu_khz. AFAICT, no other arch has ever defined it.
(I'm assuming x86_64 is copying i386 so it's compatible.)

I was just reflecting what the code actually was doing: reading Mhz
from /proc/cpuinfo. That makes alot more sense to me.

> > I'm consistently getting 4.74 usec median latency with 2.6.11 kernel +
> > svn r2229 on HP ZX1 platform (PCI-X) + 1.5Ghz/6M Madison processors.
> 
> Thats a bit higher latency than what I see with Intel, but at least
> in the right ballbark.


Ok. Not too surprising since the ZX1 chipset has two bridges
between PCI-X device and memory. What values did you see?
(which chipset/kernel/compiler too please)

...
> > > /*
> > >  * pp_get_local_lid() uses a pretty bogus method for finding the LID
> > >  * of a local port.  Please don't copy this into your app (or if you
> > >  * do, please rip it out soon).
> > >  */
> 
> ibv_pingpong used to have the same comment too.
> I'll go back and look at ibv_pingpong.

ok - thanks.

> > This is still outstanding. But I'd like to first see perftest-02
> > land in a sane place in openib.org Subversion tree.
> 
> I could just copy stuff to say userspace/perftests.
> Is that OK with everyone?

Sure.

> > I can then submit patches against some stuff:
> > 	o update the README with notes on how to use/interpret the data
> > 	o stop replicating code and make subroutines
> 
> I considered this but I'm afraid that adding another layer on top of
> libibverbs would deduct from readability, and cause people
> to copy it wholesale, something which I would like to avoid.

I think you are reading more into this than I intended.
I was thinking trivial things like the loop that dumps the values of delta[].

BTW, copying code wholesale is the whole point behind "Open Source".
One measure of success is if people in fact do copy the code.

> > 	o split up main() into bite-sized chunks so people
> > 	  know which part is "initialization" and "run time".
> 
> I guess adding more comments like /* Initialization */ is the way to go.

That might help, but it's not addressing the basic problem.
People retrofitting an existing application will need at least
three different peices of code: subsystem initialization, connection
setup/tear down, and runtime communication. Just making those three
code blocks subroutines is NOT creating a new layer.


> 
> > 	o stop pretending there is no global state and get major
> > 	  variables off the stack and into .bss or respective
> > 	  subroutines.
> 
> Why is that good? Passing flags in global parameters would
> make the code less readable.

I agree. But pulling out global state (relative to the program) 
is a first step to figuring out what needs to be a parameter
and which variables are just local to certain bits of code.
That's just my methodology for cleaning up stuff like this
and not the end point.

IMHO, the current code isn't readable either.
main() has 268 lines and  22 (in my version, more in the original)
local variables. Of those 22 variables, some are really local
and not part of the global state but we can't easily tell which.

...
> > Another clue about this failure: I'm only seeing this if I specify "-s".
> > 1,2,4,8,16,24,25,26,27,28 all worked.  All the values > 29 failed.
> > (I tried 29, 30, 31, 32, 33, 64, 65, 4096, 8192).
...
> I think thats because when I wrote the test qp attribute max_inline didnt
> exist. I'll update the code.

Ah good - I'm glad it's obvious to you since it sure wasnt' to me. :^)

> :)
> I'll merge the -U code, I see how it can be useful.

thanks!

...
> > That first sample on the server represents a tuning opportunity.
> > A ~10x difference between startup and runtime is significant
> > for short lived connections/regions.
> 
> On the other hand, it could be negligible compared to connection
> setup time.

hrm...true. We should measure and report that too.
A script could iterate the invocation with -n 1 to collect
a reasonable sample.

> OK, so I see 3 changes there:
> 
> 1. support for -U and -c
>    I'll merge that

thanks!

> 2. get_clock in assembly replaced with asm/timex.h:
>    Unfortunately asm/timex.h was never intended for userspace,

Exact copies from linux/include/asm* are not intended for userspace.
But arch specific kernel headers are required to interact with the
kernel on any given platform (e.g. syscalls). They just need to be
"adjusted" by distro's so they are suitable for userspace consumption.

>    so this trick doesnt work on all platforms: specifically its
>    broken on ppc,i386 and x86_64, so ppc64 and ia64 where it does
>    work are kind of in the minority.

x86_64 and ppc don't need to include config.h.
I'll submit patches to fix this.

i386 is the only one that has a reason to do so and I think one
right answer is for the distro to include config.h/autoconf.h.

>    And given that even these may be broken on some distros,
>    I suggest we simply add an assembly implementation for now,
>    rather than add this dependency.

I suggest define get_cycles() only for i386 and let everyone else use timex.h.

> Grant, you mentioned some other fixes - what are they?
> Maybe I'll notice them when I do the actual merge.

Sorry - I made alot of changes to certain chunks of code and don't
recall details.  Main ones were off-by-one errors and the failure to
initialize the first element of tstamp[].

thanks,
grant



More information about the general mailing list