[ofa-general] Re: New proposal for memory management

Tom Talpey tmtalpey at gmail.com
Thu Apr 30 11:24:47 PDT 2009


At 06:11 PM 4/29/2009, Barrett, Brian W wrote:
>On 4/29/09 15:55 , "Jason Gunthorpe" <jgunthorpe at obsidianresearch.com>
>wrote:
>
>>> The problem is that MPI needs to be aware of the application doing
>>> the free() and unregister or flush its MR cache for that virtual
>>> address range. Of course it would be difficult for OpenMPI to have
>>> callbacks or hooks into every way memory could be allocated/freed
>>> that an application might use.
>> 
>> There are only three calls that affect the way VM memory maps to
>> physical and thus would invalidate the mr cache: mmap, munmap and brk.
>
>There's also System V shared memory, which at least one scientific code out
>there uses.

Don't forget fork, vfork, clone and exec, and also don't forget any
copy-on-write mappings that result. Oh, and those pesky stack pages.

I think the point is that making any guarantees that memory remains fixed
and present will inevitably lead to nontransparent API requirements on the
applications. Been there, done that, got plenty of t-shirts. It's a hard road,
because APIs are forever.

Tom.


>
>> Specifically what must be happening is the app registers memory, calls
>> munmap on it, then gets the same VA back from mmap and the kernel
>> level mr is still pointing to the original mmap:
>> 
>>  foo = mmap(...);
>>  ibv_reg_mr(mr,foo)
>>  munmap(foo..)
>>  mmap(...) == foo; // By chance due to VA randomization
>>  // Ooops, mr no longer matches proc/self/maps
>> 
>> Actually, maybe that is the simple answer here - have the kernel fixup
>> the mr before returning from the 2nd mmap. Then the cache in user
>> space is still correct to assume that VA XX is registered and working.
>
>Yeah, although that could get really nasty as there's generally not one call
>to ibv_reg_mr per call to mmap.  It's usually a couple of calls to
>ibv_reg_mr for different segments of the same mmap buffer (think sending
>faces of a 3-d block of space to the nearest neighbors in a physics
>simulation).
>
>> Removing entries from the registration cache would have to be done in
>> some other way (age?).
>
>Brian
>
>--
>   Brian W. Barrett
>   Dept. 1423: Scalable System Software
>   Sandia National Laboratories
>
>_______________________________________________
>general mailing list
>general at lists.openfabrics.org
>http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list