[ofa-general] Memory registration redux

Tue May 19 07:57:30 PDT 2009

On May 18, 2009, at 5:15 PM, Roland Dreier (rdreier) wrote:

> So you want the registration cache to be reference counted per-page?
> Seems like potentially a lot of overhead -- if someone registers a
> million pages, then to check for a cache hit, you have to potentially
> check millions of reference counts.
>

Our caches are hash tables of balanced red-black trees.  So in  
practice, we won't be trolling through anywhere near a million  
reference counts to find a hit.

> Hang on.  The whole point of MR caching is exactly that you don't
> unregister a memory region, even after you're done using the memory it
> covers, in the hope that you'll want to reuse that registration.  And
> the whole point of this thread is that an application can then free()
> some of the memory that is still registered in the cache.
>

Sorry -- the implication that I took from Caitlyn's text was that the  
memory was *used* after it was freed.  That is clearly erroneous.

What OMPI does (and apparently other MPI's do) is simply invalidate  
any registration for free'd memory.  Additionally, we won't unregister  
memory while there is at least one use of it outstanding (that MPI  
knows about, such as a pending non-blocking communication).  We lazily  
unregister just for exactly the case you're talking about (might want  
to use it for verbs communication again later).

>  > Per my prior mail, Open MPI registers chucks at a time.  Each  
> chunk is
>  > potentially a multiple of pages.  So yes, you could end up having a
>  > single registration that spans the buffers used in multiple,  
> distinct
>  > MPI sends.  We reference count by page to ensure that  
> deregistrations
>  > do not occur prematurely.
>
> Hmm, I'm worried that the exact semantics of the memory cache seem  
> to be
> tied into how the MPI implementation is registering memory.  Open MPI
> happens to work in small chunks (I guess) and so your cache is  
> tailored
> for that use case.  I know the original proposal was an attempt to  
> come
> up with something that all the MPIs can agree on, but it didn't cover
> the full semantics, at least not for cases like the overlapping
> sub-registrations that we're discussing here.  Is there still one  
> set of
> semantics everyone can agree on?
>

So just to be clear -- let's separate the two issues that are evolving  
from this thread:

1. fix the hole where memory returned to the OS cannot be guaranteed  
to be caught by userspace (and therefore may still stay registered and/ 
or invalidate userspace registration cache entries)

2. have libibverbs include some form of memory registration caching  
(potentially using the solution to #1 to know when to invalidate reg.  
cache entries)

Personally, I would prioritize them in the issues in this order.  Did  
a solution for #1 get agreed upon?  I admit that I got lost in the  
kernel discussion of issues between you, Jason, etc.

Agreeing on registration caching semantics may take a little more  
discussion (although, as someone pointed out earlier, if libibverbs'  
reg caching is optional, then the verbs-based app can choose to use it  
or their own scheme).

-- 
Jeff Squyres
Cisco Systems