[ofa-general] Re: New proposal for memory management

Tom Talpey tmtalpey at gmail.com
Thu Apr 30 12:21:42 PDT 2009


At 02:36 PM 4/30/2009, Jason Gunthorpe wrote:
>On Thu, Apr 30, 2009 at 02:24:47PM -0400, Tom Talpey wrote:
>> At 06:11 PM 4/29/2009, Barrett, Brian W wrote:
>> >On 4/29/09 15:55 , "Jason Gunthorpe" <jgunthorpe at obsidianresearch.com>
>> >wrote:
>> >
>> >>> The problem is that MPI needs to be aware of the application doing
>> >>> the free() and unregister or flush its MR cache for that virtual
>> >>> address range. Of course it would be difficult for OpenMPI to have
>> >>> callbacks or hooks into every way memory could be allocated/freed
>> >>> that an application might use.
>> >> 
>> >> There are only three calls that affect the way VM memory maps to
>> >> physical and thus would invalidate the mr cache: mmap, munmap and brk.
>> >
>> >There's also System V shared memory, which at least one scientific code out
>> >there uses.
>> 
>> Don't forget fork, vfork, clone and exec, and also don't forget any
>> copy-on-write mappings that result. Oh, and those pesky stack pages.
>
>verbs just plain can't inherit QP and MR across fork and all the
>related. There is no way to split a QP and a MR across to processes
>and have things still make sense. One process gets it, so there isn't
>really a problem.
>
>Stack pages are mostly fixed in VM address space, particularly if you 
>give up MAP_GROWSDOWN which I think most threading libraries do these
>days?
>
>Anyway, it doesn't matter, if the kernel has a syscall to enable
>registering all VM in a process then the kernel is perfectly able to
>capture all the wierd cases and fix them up. The API to userspace is
>very simple and sane.

Not sure I agree with the "perfectly able" part. What if the process'
stack grows after the syscall? Do the extra pages magically become
registered? What if the adapter's page table required remapping to
cover the extra pages, maybe changing the memory handle?

Also, does this get harder or easier if there's an IOMMU in the loop?
With direct access to I/O mapping, there are additional degrees
of freedom to rearrange pages. OTOH, there's an additional layer
to manage far beyond the page-wiring and mapping of userspace.

>
>> I think the point is that making any guarantees that memory remains fixed
>> and present will inevitably lead to nontransparent API requirements on the
>> applications. Been there, done that, got plenty of t-shirts. It's a 
>hard road,
>> because APIs are forever.
>
>That's why we are here, the MPI spec says all memory in a process is
>valid to use with a network operation - that is utterly incompatible
>with verb's notion of memory registration.

Of course I agree. I am however questioning the goal of making this
100% transparent. To do so, both sides of the interface will need to
behave in specific ways (e.g. no forking), and that can be very, very
difficult to achieve.

I'm not trying to argue against this, btw. But I do think it's prohibitively
hard, without making requirements on the applications, which in turn
are very difficult to change. Choose the requirements carefully, IOW.

Tom.




More information about the general mailing list