[openib-general] Question about pinning memory
Jeff Squyres
jsquyres at open-mpi.org
Mon Jul 25 08:06:11 PDT 2005
On Jul 25, 2005, at 10:31 AM, Roland Dreier wrote:
> [...clarifications snipped...]
> Jeff> Is there any thought within the IB community to getting rid
> Jeff> of this whole memory registration issue at the user level?
> Jeff> Perhaps a la what Quadrics did (make the driver understand
> Jeff> virtual memory) or what Va Tech did (put in their own
> Jeff> kernel-level hooks to make malloc/free to do the Right
> Jeff> Things)? Handling all this memory registration stuff is
> Jeff> certainly quite a big chunk of code that every MPI (and IB
> Jeff> app) needs to handle. Could this stuff be moved down into
> Jeff> the realm of the IB stack itself? From an abstraction /
> Jeff> software architecture standpoint, it could make sense to
> Jeff> unify this handling in one place rather than N (MPI
> Jeff> implementations and other IB apps).
>
> There's definitely thinking about this, but the correct approach is by
> no means obvious. The Quadrics hooks seem too invasive to merge into
> the kernel. I'm not familiar with the VA Tech work.
I agree that the kernel patches for Quadrics was somewhat painful (and
kernel-dependent, which was sometimes a problem).
I don't know the specifics of the VA Tech work either, but my [crude]
understanding was that they somehow made the memory [de-]registration
totally opaque to user level (to include the MPI library). I don't
know if it was hidden in the IB stack or in the kernel itself. Anyone
here know more details on this?
> Unfortunately the IB and iWARP specs are written in terms of
> applications handling memory registration. So we can't really expect
> all RDMA hardware to have the hooks required to improve the memory
> pinning interface. As it becomes clearer what hardware capabilities
> are common, and what the userspace requirements are, we should be able
> to improve things.
I know nothing about how the kernel works, now how typical IB stacks
work down in the kernel, so pardon me if this is a silly question: why
do hardware capabilities have anything to do with this? Isn't this a
kernel-level and/or software API issue?
More specifically, the IB stacks already do checks to see if memory is
registered before attempting send/receive operations.
...after thinking about this a little, I realize that for maximum
performance, they may not, so I guess I don't know that this is true.
But if it is, couldn't all memory [de-]registration be handled by the
IB stack? If it can, this would eliminate lookups and memory
management up in the MPI level, which would potentially help decrease a
little latency (i.e., the redundant lookups in the MPI layer could be
removed).
I realize that there's a bunch of if's in there, but it would certainly
be nice to not have to handle all this stuff. Indeed, the MPI problem
hasn't really been solved yet -- there have been a few proposals /
techniques used in the past, but none are really attractive (all have
benefits and drawbacks, and there are scenarios that break each of
them):
- use mallopt() to not let memory be returned to the OS
- include a memory manager (e.g., ptmalloc2) in the MPI to override
munmap/sbrk to catch when memory is returned to the OS
- use LD_PRELOAD to override munmap/sbrk
The method that Pete described in an earlier post could certainly be
interesting, especially if it could be tweaked to not have to make a
system call to check every memory reference (e.g., Gleb's suggestion of
a signal / callback / whatever). I would certainly take that over
today's mechanisms (the 3 mentioned above), but would still like to
push forward with the possibility of all the memory registration being
handled at a level lower than MPI.
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
More information about the general
mailing list