[openib-general] Question about pinning memory

Jeff Squyres jsquyres at open-mpi.org
Mon Jul 25 08:06:11 PDT 2005


On Jul 25, 2005, at 10:31 AM, Roland Dreier wrote:

> [...clarifications snipped...]
>     Jeff> Is there any thought within the IB community to getting rid
>     Jeff> of this whole memory registration issue at the user level?
>     Jeff> Perhaps a la what Quadrics did (make the driver understand
>     Jeff> virtual memory) or what Va Tech did (put in their own
>     Jeff> kernel-level hooks to make malloc/free to do the Right
>     Jeff> Things)?  Handling all this memory registration stuff is
>     Jeff> certainly quite a big chunk of code that every MPI (and IB
>     Jeff> app) needs to handle.  Could this stuff be moved down into
>     Jeff> the realm of the IB stack itself?  From an abstraction /
>     Jeff> software architecture standpoint, it could make sense to
>     Jeff> unify this handling in one place rather than N (MPI
>     Jeff> implementations and other IB apps).
>
> There's definitely thinking about this, but the correct approach is by
> no means obvious.  The Quadrics hooks seem too invasive to merge into
> the kernel.  I'm not familiar with the VA Tech work.

I agree that the kernel patches for Quadrics was somewhat painful (and 
kernel-dependent, which was sometimes a problem).

I don't know the specifics of the VA Tech work either, but my [crude] 
understanding was that they somehow made the memory [de-]registration 
totally opaque to user level (to include the MPI library).  I don't 
know if it was hidden in the IB stack or in the kernel itself.  Anyone 
here know more details on this?

> Unfortunately the IB and iWARP specs are written in terms of
> applications handling memory registration.  So we can't really expect
> all RDMA hardware to have the hooks required to improve the memory
> pinning interface.  As it becomes clearer what hardware capabilities
> are common, and what the userspace requirements are, we should be able
> to improve things.

I know nothing about how the kernel works, now how typical IB stacks 
work down in the kernel, so pardon me if this is a silly question: why 
do hardware capabilities have anything to do with this?  Isn't this a 
kernel-level and/or software API issue?

More specifically, the IB stacks already do checks to see if memory is 
registered before attempting send/receive operations.

...after thinking about this a little, I realize that for maximum 
performance, they may not, so I guess I don't know that this is true.  
But if it is, couldn't all memory [de-]registration be handled by the 
IB stack?  If it can, this would eliminate lookups and memory 
management up in the MPI level, which would potentially help decrease a 
little latency (i.e., the redundant lookups in the MPI layer could be 
removed).

I realize that there's a bunch of if's in there, but it would certainly 
be nice to not have to handle all this stuff.  Indeed, the MPI problem 
hasn't really been solved yet -- there have been a few proposals / 
techniques used in the past, but none are really attractive (all have 
benefits and drawbacks, and there are scenarios that break each of 
them):

- use mallopt() to not let memory be returned to the OS
- include a memory manager (e.g., ptmalloc2) in the MPI to override 
munmap/sbrk to catch when memory is returned to the OS
- use LD_PRELOAD to override munmap/sbrk

The method that Pete described in an earlier post could certainly be 
interesting, especially if it could be tweaked to not have to make a 
system call to check every memory reference (e.g., Gleb's suggestion of 
a signal / callback / whatever).  I would certainly take that over 
today's mechanisms (the 3 mentioned above), but would still like to 
push forward with the possibility of all the memory registration being 
handled at a level lower than MPI.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/




More information about the general mailing list