[ofa-general] Memory registration redux
Jeff Squyres
jsquyres at cisco.com
Tue May 19 07:57:30 PDT 2009
On May 18, 2009, at 5:15 PM, Roland Dreier (rdreier) wrote:
> So you want the registration cache to be reference counted per-page?
> Seems like potentially a lot of overhead -- if someone registers a
> million pages, then to check for a cache hit, you have to potentially
> check millions of reference counts.
>
Our caches are hash tables of balanced red-black trees. So in
practice, we won't be trolling through anywhere near a million
reference counts to find a hit.
> Hang on. The whole point of MR caching is exactly that you don't
> unregister a memory region, even after you're done using the memory it
> covers, in the hope that you'll want to reuse that registration. And
> the whole point of this thread is that an application can then free()
> some of the memory that is still registered in the cache.
>
Sorry -- the implication that I took from Caitlyn's text was that the
memory was *used* after it was freed. That is clearly erroneous.
What OMPI does (and apparently other MPI's do) is simply invalidate
any registration for free'd memory. Additionally, we won't unregister
memory while there is at least one use of it outstanding (that MPI
knows about, such as a pending non-blocking communication). We lazily
unregister just for exactly the case you're talking about (might want
to use it for verbs communication again later).
> > Per my prior mail, Open MPI registers chucks at a time. Each
> chunk is
> > potentially a multiple of pages. So yes, you could end up having a
> > single registration that spans the buffers used in multiple,
> distinct
> > MPI sends. We reference count by page to ensure that
> deregistrations
> > do not occur prematurely.
>
> Hmm, I'm worried that the exact semantics of the memory cache seem
> to be
> tied into how the MPI implementation is registering memory. Open MPI
> happens to work in small chunks (I guess) and so your cache is
> tailored
> for that use case. I know the original proposal was an attempt to
> come
> up with something that all the MPIs can agree on, but it didn't cover
> the full semantics, at least not for cases like the overlapping
> sub-registrations that we're discussing here. Is there still one
> set of
> semantics everyone can agree on?
>
So just to be clear -- let's separate the two issues that are evolving
from this thread:
1. fix the hole where memory returned to the OS cannot be guaranteed
to be caught by userspace (and therefore may still stay registered and/
or invalidate userspace registration cache entries)
2. have libibverbs include some form of memory registration caching
(potentially using the solution to #1 to know when to invalidate reg.
cache entries)
Personally, I would prioritize them in the issues in this order. Did
a solution for #1 get agreed upon? I admit that I got lost in the
kernel discussion of issues between you, Jason, etc.
Agreeing on registration caching semantics may take a little more
discussion (although, as someone pointed out earlier, if libibverbs'
reg caching is optional, then the verbs-based app can choose to use it
or their own scheme).
--
Jeff Squyres
Cisco Systems
More information about the general
mailing list