[ofa-general] Memory registration redux

Roland Dreier rdreier at cisco.com
Mon May 18 14:15:11 PDT 2009


 > When our memory hooks tell us that memory is about to be removed from
 > the process, we unregister all pages in the relevant region and remove
 > those entries from the cache.  So the next time you look in the cache
 > for 0x3000-0x3fff, it won't be there -- it'll be treated as
 > cache-cold.

So you want the registration cache to be reference counted per-page?
Seems like potentially a lot of overhead -- if someone registers a
million pages, then to check for a cache hit, you have to potentially
check millions of reference counts.

 > > How does 0x1000 to 0x3fff get registered as a single Memory Region?
 > > If it is legitimate to free() 0x3000..0x3fff then how can there ever
 > > be a
 > > legitimate reference to 0x1000..0x3fff? If there is no such single
 > > reference,
 > > I don't see how a Memory Region is every created covering that range.
 > >
 > > If the user creates the Memory Region, then they are responsible for
 > > not
 > > free()ing a portion of it.
 > >
 > 
 > Agreed.  If an application does that, it deserves what it gets.

Hang on.  The whole point of MR caching is exactly that you don't
unregister a memory region, even after you're done using the memory it
covers, in the hope that you'll want to reuse that registration.  And
the whole point of this thread is that an application can then free()
some of the memory that is still registered in the cache.

 > Per my prior mail, Open MPI registers chucks at a time.  Each chunk is
 > potentially a multiple of pages.  So yes, you could end up having a
 > single registration that spans the buffers used in multiple, distinct
 > MPI sends.  We reference count by page to ensure that deregistrations
 > do not occur prematurely.

Hmm, I'm worried that the exact semantics of the memory cache seem to be
tied into how the MPI implementation is registering memory.  Open MPI
happens to work in small chunks (I guess) and so your cache is tailored
for that use case.  I know the original proposal was an attempt to come
up with something that all the MPIs can agree on, but it didn't cover
the full semantics, at least not for cases like the overlapping
sub-registrations that we're discussing here.  Is there still one set of
semantics everyone can agree on?

 - R.



More information about the general mailing list