[ofa-general] Re: New proposal for memory management

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Fri May 1 11:18:31 PDT 2009


On Fri, May 01, 2009 at 07:56:48AM -0400, Jeff Squyres wrote:
> On Apr 30, 2009, at 6:22 PM, Jason Gunthorpe wrote:
> 
> >After reading all the postings, I think my idea to fix the verbs API
> >to not, essentially, corrupt an existing registration when the virtual
> >address space changes is the best bet. This slightly changes the
> >semantics of the verbs MR to refer to virtual address space within the
> >process, not the underlying object(s) that happen to be mapped there
> >when the registration is made.
 
> I'm not sure how this helps MPI -- our registration caches will still  
> become invalid if the MPI app free()'s registered memory...?

No, they don't. The only reason you have a problem today is because
the memory registration is tied to the underlying *object* not the
virtual address. So when the app fiddles with things and changed the
virtual address to object mapping it wrecks your caching.

If instead the registration is tied to a virtual address, then it
doesn't matter what the app does, that virtual address range will
*always* point to the currently mapped objects.

If the app does free() and then mallocs() without an intervining kernel
call then it doesn't matter, your cache of registered VM addreses
still says that it is available

If the app does free() resulting in munmap and then malloc() resulting
in mmap() and re-uses the same address then, again, it doesn't matter
to you because the VM address is still registered by the kernel and is
switched to the new mmap().

The only problem is over time your cache will have registions of VM
that are not in use by the app, or don't have backing objects any
longer. This is not a correctness problem, but it might be a
performance problem.

> MPI maintains a registration cache because registration is so  
> expensive.  Even if the registration cache becomes "safely" invalid  
> (e.g., you'll never get a scenario where one virtual address could  
> have previously pointed to a different hardware address within the  
> span of one process), it doesn't help.

How so? That would seem to close the data corruption hole
entirely. Sure you still have to call registration functions but one
step at a time :)

> Ok, I'll back off slightly: if you want verbs to go mainstream, there  
> will be many other ULPs / middleware libraries that have memory models  
> like MPI's (that the upper layer is responsible for allocating/freeing  
> message buffers).  Put differently: the TCP/sockets stack doesn't have  
> this restriction; it will be extremely difficult to convert legions of  
> sockets programmers to verbs if you effectively restrict large  
> messages to only be allocated/freed by the network layer (kinda  
> defeats the point of RDMA if you have to copy large messages, right?).

Fair enough - but the registration model is pretty much an inevitable
consequence of kernel bypass. If you really want to get rid of it then
you need to have an operating mode where the WRs are generated by the
kernel through syscalls like all the other network stacks. I've not
seen any notion of how to seperate the two ideas at least..

Jason



More information about the general mailing list