[openib-general] Question about pinning memory
Gleb Natapov
glebn at voltaire.com
Mon Jul 25 08:43:01 PDT 2005
On Mon, Jul 25, 2005 at 11:06:11AM -0400, Jeff Squyres wrote:
> On Jul 25, 2005, at 10:31 AM, Roland Dreier wrote:
> >Unfortunately the IB and iWARP specs are written in terms of
> >applications handling memory registration. So we can't really expect
> >all RDMA hardware to have the hooks required to improve the memory
> >pinning interface. As it becomes clearer what hardware capabilities
> >are common, and what the userspace requirements are, we should be able
> >to improve things.
>
> I know nothing about how the kernel works, now how typical IB stacks
> work down in the kernel, so pardon me if this is a silly question: why
> do hardware capabilities have anything to do with this? Isn't this a
> kernel-level and/or software API issue?
>
> More specifically, the IB stacks already do checks to see if memory is
> registered before attempting send/receive operations.
>
Application don't even enter kernel to send/receive data over IB after
registration. Hardware do all necessary checks.
> ...after thinking about this a little, I realize that for maximum
> performance, they may not, so I guess I don't know that this is true.
> But if it is, couldn't all memory [de-]registration be handled by the
> IB stack? If it can, this would eliminate lookups and memory
> management up in the MPI level, which would potentially help decrease a
> little latency (i.e., the redundant lookups in the MPI layer could be
> removed).
How will it eliminate lookups? You still need to check cache before
registering memory. You have to obtain lkey/rkey somehow. If kernel
derigister memory automatically you will have cache not synced with
hardware registration and it will cause troubles.
>
> I realize that there's a bunch of if's in there, but it would certainly
> be nice to not have to handle all this stuff. Indeed, the MPI problem
> hasn't really been solved yet -- there have been a few proposals /
> techniques used in the past, but none are really attractive (all have
> benefits and drawbacks, and there are scenarios that break each of
> them):
>
> - use mallopt() to not let memory be returned to the OS
> - include a memory manager (e.g., ptmalloc2) in the MPI to override
> munmap/sbrk to catch when memory is returned to the OS
> - use LD_PRELOAD to override munmap/sbrk
And if some strange application calls directly to kernel non of them
will work.
>
> The method that Pete described in an earlier post could certainly be
> interesting, especially if it could be tweaked to not have to make a
> system call to check every memory reference (e.g., Gleb's suggestion of
> a signal / callback / whatever). I would certainly take that over
> today's mechanisms (the 3 mentioned above), but would still like to
> push forward with the possibility of all the memory registration being
> handled at a level lower than MPI.
>
I think the Pete approach with callback and application maintaining
registration cache is the right one.
--
Gleb.
More information about the general
mailing list