[openib-general] Re: user-mode verbs on Itanium

Michael S. Tsirkin mst at mellanox.co.il
Mon May 9 09:50:22 PDT 2005


Hello, Grant!
Thanks for comments on the perftest. I think fixed all bug you found.

I just did

>svn mv https://openib.org/svn/trunk/contrib/mellanox/perftest https://openib.org/svn/gen2/trunk/src/userspace
Committed revision 2285.

so we'll be able to exchange patches in the regular way.

Quoting r. Grant Grundler <iod00d at hp.com>:
> Subject: Re: user-mode verbs on Itanium
> > > I can then submit patches against some stuff:
> > > 	o update the README with notes on how to use/interpret the data
> > > 	o stop replicating code and make subroutines
> > 
> > I considered this but I'm afraid that adding another layer on top of
> > libibverbs would deduct from readability, and cause people
> > to copy it wholesale, something which I would like to avoid.
> 
> I think you are reading more into this than I intended.
> I was thinking trivial things like the loop that dumps the values of delta[].
> 
> BTW, copying code wholesale is the whole point behind "Open Source".
> One measure of success is if people in fact do copy the code.

Right but why dont we put it in the core then?

> > > 	o split up main() into bite-sized chunks so people
> > > 	  know which part is "initialization" and "run time".
> > 
> > I guess adding more comments like /* Initialization */ is the way to go.
> 
> That might help, but it's not addressing the basic problem.
> People retrofitting an existing application will need at least
> three different peices of code: subsystem initialization, connection
> setup/tear down, and runtime communication. Just making those three
> code blocks subroutines is NOT creating a new layer.

But once these subroutines are moved into a library, it sure
begins to look like one.

> > > 	o stop pretending there is no global state and get major
> > > 	  variables off the stack and into .bss or respective
> > > 	  subroutines.
> > 
> > Why is that good? Passing flags in global parameters would
> > make the code less readable.
> 
> I agree. But pulling out global state (relative to the program) 
> is a first step to figuring out what needs to be a parameter
> and which variables are just local to certain bits of code.
> That's just my methodology for cleaning up stuff like this
> and not the end point.

I see how separating things to subroutines would help here,
but making things global will just hide who uses what, IMO.

> IMHO, the current code isn't readable either.
> main() has 268 lines and  22 (in my version, more in the original)
> local variables. Of those 22 variables, some are really local
> and not part of the global state but we can't easily tell which.

Its a work in progress :)

I'm not sure what do you call "global state". I am trying to think
of an application that has many connections, global state would
be something common to all of them?
The original idea was to keep the connection state stuff inside the context
structure. Maybe I departed from that.

> ...
> > > Another clue about this failure: I'm only seeing this if I specify "-s".
> > > 1,2,4,8,16,24,25,26,27,28 all worked.  All the values > 29 failed.
> > > (I tried 29, 30, 31, 32, 33, 64, 65, 4096, 8192).
> ...
> > I think thats because when I wrote the test qp attribute max_inline didnt
> > exist. I'll update the code.
> 
> Ah good - I'm glad it's obvious to you since it sure wasnt' to me. :^)
> 
> > :)
> > I'll merge the -U code, I see how it can be useful.
> 
> thanks!

You'll need my two patches I sent previously to make it work though.
As a work around, bump max_send_sge a bit.

> ...
> > > That first sample on the server represents a tuning opportunity.
> > > A ~10x difference between startup and runtime is significant
> > > for short lived connections/regions.
> > 
> > On the other hand, it could be negligible compared to connection
> > setup time.
> 
> hrm...true. We should measure and report that too.
> A script could iterate the invocation with -n 1 to collect
> a reasonable sample.

Problem is, connection setup involves the CM (or the socket kludge),
so its not trivial to define what the overhead is.

> > OK, so I see 3 changes there:
> > 
> > 1. support for -U and -c
> >    I'll merge that
> 
> thanks!

Ok, I checked that in: rev 2283. I also renamed --histogram to --report-all.

> > 2. get_clock in assembly replaced with asm/timex.h:
> >    Unfortunately asm/timex.h was never intended for userspace,
> 
> Exact copies from linux/include/asm* are not intended for userspace.
> But arch specific kernel headers are required to interact with the
> kernel on any given platform (e.g. syscalls). They just need to be
> "adjusted" by distro's so they are suitable for userspace consumption.

IMO get_cycles is never needed to interact with the kernel.
It could have been part of libc or something, but its not.

> >    so this trick doesnt work on all platforms: specifically its
> >    broken on ppc,i386 and x86_64, so ppc64 and ia64 where it does
> >    work are kind of in the minority.
> 
> x86_64 and ppc don't need to include config.h.
> I'll submit patches to fix this.
> 
> i386 is the only one that has a reason to do so and I think one
> right answer is for the distro to include config.h/autoconf.h.
>
> >    And given that even these may be broken on some distros,
> >    I suggest we simply add an assembly implementation for now,
> >    rather than add this dependency.
> 
> I suggest define get_cycles() only for i386 and let everyone else use timex.h.

Sure. Still, I think I'll keep the one-line assembly
till its fixed in most distros :) I added ia64 so everyone can make progress
meanwhile.

BTW, some architectures (notably sparc) define get_cycles to (0).
Need to google for the right answer there.

I also saw this errata for Itanium:

2. AR.ITC returns an incorrect value  Implication: The 64-bit Interval
Timer Counter (AR.ITC) register may return an incorrect value when the
lower 32-bits are all F's. In this case, the value returned in the upper
32-bits is incremented by 1. For
example, when the value returned is 0x1FFFFFFFF, the actual value should
be 0x0FFFFFFFF. Implication: Software that utilizes the AR.ITC register
will receive an incorrect value in this case. Workaround: The workaround
is for software to re-read the AR.ITC register when it detects all F's
in the lower 32-bits.


which linux seems to ignore (possibly because high value of get_cycles
does not matter there).

Does this bug still affect Itanium systems in use out there?
What about Itanium 2?


-- 
MST - Michael S. Tsirkin



More information about the general mailing list