[ofa-general] Memory registration redux
Jason Gunthorpe
jgunthorpe at obsidianresearch.com
Tue May 26 16:51:58 PDT 2009
On Tue, May 26, 2009 at 04:13:08PM -0700, Roland Dreier wrote:
> > > > Or, ignore the overlapping problem, and use your original technique,
> > > > slightly modified:
> > > > - Userspace registers a counter with the kernel. Kernel pins the
> > > > page, sets up mmu notifiers and increments the counter when
> > > > invalidates intersect with registrations
> > > > - Kernel maintains a linked list of registrations that have been
> > > > invalidated via mmu notifiers using the registration structure
> > > > and a dirty bit
> > > > - Userspace checks the counter at every cache hit, if different it
> > > > calls into the kernel:
> > > > MR_Cookie *mrs[100];
> > > > int rc = ibv_get_invalid_mrs(mrs,100);
> > > > invalidate_cache(mrs,rc);
> > > > // Repeat until drained
> > > >
> > > > get_invalid_mrs traverses the linked list and returns an
> > > > identifying value to userspace, which looks it up in the cache,
> > > > calls unregister and removes it from the cache.
> > >
> > > What's the advantage of this? I have to do the get_invalid_mrs() call a
> > > bunch of times, rather than just reading which ones are invalid from the
> > > cache directly?
> >
> > This is a trade off, the above is a more normal kernel API and lets
> > the app get an list of changes it can scan. Having the kernel update
> > flags means if the app wants a list of changes it has to scan all
> > registrations.
>
> The more I thought about this, the more I liked the idea, until I liked
> it so much that I actually went ahead and prototyped this. A
> preliminary version is below -- *very* lightly tested, and no doubt
> there are obvious bugs that any real use or review will uncover. But I
> thought I'd throw it out and hope for comments and/or testing. I'm
> actually pretty happy with how small and simple this ended up being.
Seems reasonable to me. This doesn't catch all mmap cases, ie this
kind of stuff:
t = mmap(NULL, 3 * page_size, PROT_READ,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
if (umn_register(t, 3 * page_size, 123))
return 1;
t = mmap(t,page_size,PROT_READ,MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,-1,0);
// Event? Probably
munmap(t,page_size);
// Event? No, no MAP_POPULATE
t = mmap(t,page_size,PROT_READ,MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED,-1,0);
// Event? No
And I guess the use of MAP_POPULATE is deliberate as thats how mmu
notifier works..
So the use model for a MPI would be to call ibv_register/umn_register
and watch for events. Any event at all means the entire region is
toast and must be re-registered the next time someone calls with that
address. ibv_register does the same as MAP_POPULATE internally..
The MPI library uses the result of this to build a list of invalided
regions. From time to time the MPI library should unregister those
regions.
If that is the use then the kernel side should probably also be a
one-shot type of interface..
I'm also trying to think of a use case outside of RDMA and failing - if
the kernel hasn't pinned the pages being watched through some other
means it seems useless as a general feature??
Jason
More information about the general
mailing list