[ofa-general] New proposal for memory management

Supalov, Alexander alexander.supalov at intel.com
Thu Apr 30 08:03:13 PDT 2009


Hi,

Mem reg caching has direct relation to the apps performance. Can we guarantee, while putting the caching into the kernel, that the algorithms used will be good for all apps? How will one control their parameters at runtime? Will one be able to change the algorithm if necessary?

Best regards.

Alexander 

-----Original Message-----
From: Jeff Squyres [mailto:jsquyres at cisco.com] 
Sent: Thursday, April 30, 2009 4:39 PM
To: Barrett, Brian W
Cc: Roland Dreier (rdreier); OpenFabrics General; Pavel Shamis; Hans Westgaard Ry; Terry Dontje; Lenny Verkhovsky; HÃ¥kon Bugge; Donald Kerr; Supalov, Alexander
Subject: Re: [ofa-general] New proposal for memory management

On Apr 29, 2009, at 4:45 PM, Barrett, Brian W wrote:

> If you think this sounds like a hassle, think about what it looks  
> like from
> the point of view of the MPI implementer (or any other developer  
> writing
> libraries which sit between user data and OFED, like GASNet).
>

If you don't care about what pain MPI implementors have to go through  
(and you probably don't ;-) ) -- consider that this is a major  
roadblock to most *anyone* who wants to write to user verbs.

<banging the same old drum>

I heard lots of variations of "Why isn't OFED more popular?" in Sonoma  
this year.  This is at least one big reason why: no (normal/non- 
superhuman programmers) can write verbs code (IMHO).  MPI's *have* to  
support OpenFabrics -- HPC customers demand it.  But non-HPC customers  
have a clear alternative: they'll just write sockets code.  And the  
price/performance for using sockets over IB/iWARP may or may not be  
attractive depending on the customer's buying capacity.  Hence -- they  
just buy gigE (10gigE, when the price drops low enough).

Doesn't OpenFabrics want to grow beyond MPI?  Woody said that verbs is  
designed to support a billion different things -- outside of MPI and a  
few storage protocols (none of which are widely adopted), how much is  
OFED used?

</banging the same old drum>

> Jeff and I talked for a while today, and we're pretty sure that as  
> long as
> the byte set by the kernel notifier is written before the pages are  
> returned
> into the unallocated list, there isn't actually a race condition.  
> [snip]
>
> However, there's still then the problem with the notifier concept of  
> how the
> kernel passes which pages were given back to the kernel.  It has to  
> pass a
> (potentially very large) amount of data back to the user, so the  
> memory
> ownership issues with kernel/user space are interesting.  It also  
> has to
> somewhat atomically prepare the list and undset the notifier byte,  
> which is
> also problematic.  But probably workable.
>


I feel compelled to amend this: this notifier concept *may be  
workable*, but it's still quite complex for the reasons Brian cited.   
The goal here is to *reduce* complexity, especially for applications/ 
ULPs using the verbs stack.

If we put the registration cache in the network stack, application/ULP  
complexity will be reduced significantly.  My $0.02 is that using a  
notifier solution is still fairly complex and introduces a new set of  
problems.

FWIW: Putting the registration cache in the userspace verbs stack  
means that verbs will now have to do the horrid malloc/mmap/etc.  
intercept tricks that MPI implementations currently do.  Take it from  
us -- this is not a business you want to be in.  Such intercepts  
breaks tools like valgrind and other memory-checking debuggers.  Even  
the best intercept hooks available today can still be subverted.  Open  
MPI (and MX!) has to insert a pre-main hook to setup these intercepts,  
and then check later to ensure that no one else subverted our hooks.   
Yuck.

It's memory management.  And that belongs in the kernel.

-- 
Jeff Squyres
Cisco Systems

---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




More information about the general mailing list