[openib-general] Question about pinning memory

Mon Jul 25 08:43:01 PDT 2005

On Mon, Jul 25, 2005 at 11:06:11AM -0400, Jeff Squyres wrote:
> On Jul 25, 2005, at 10:31 AM, Roland Dreier wrote:
> >Unfortunately the IB and iWARP specs are written in terms of
> >applications handling memory registration.  So we can't really expect
> >all RDMA hardware to have the hooks required to improve the memory
> >pinning interface.  As it becomes clearer what hardware capabilities
> >are common, and what the userspace requirements are, we should be able
> >to improve things.
> 
> I know nothing about how the kernel works, now how typical IB stacks 
> work down in the kernel, so pardon me if this is a silly question: why 
> do hardware capabilities have anything to do with this?  Isn't this a 
> kernel-level and/or software API issue?
> 
> More specifically, the IB stacks already do checks to see if memory is 
> registered before attempting send/receive operations.
> 
Application don't even enter kernel to send/receive data over IB after
registration. Hardware do all necessary checks.

> ...after thinking about this a little, I realize that for maximum 
> performance, they may not, so I guess I don't know that this is true.  
> But if it is, couldn't all memory [de-]registration be handled by the 
> IB stack?  If it can, this would eliminate lookups and memory 
> management up in the MPI level, which would potentially help decrease a 
> little latency (i.e., the redundant lookups in the MPI layer could be 
> removed).
How will it eliminate lookups? You still need to check cache before
registering memory. You have to obtain lkey/rkey somehow. If kernel
derigister memory automatically you will have cache not synced with 
hardware registration and it will cause troubles.

> 
> I realize that there's a bunch of if's in there, but it would certainly 
> be nice to not have to handle all this stuff.  Indeed, the MPI problem 
> hasn't really been solved yet -- there have been a few proposals / 
> techniques used in the past, but none are really attractive (all have 
> benefits and drawbacks, and there are scenarios that break each of 
> them):
> 
> - use mallopt() to not let memory be returned to the OS
> - include a memory manager (e.g., ptmalloc2) in the MPI to override 
> munmap/sbrk to catch when memory is returned to the OS
> - use LD_PRELOAD to override munmap/sbrk
And if some strange application calls directly to kernel non of them
will work.

> 
> The method that Pete described in an earlier post could certainly be 
> interesting, especially if it could be tweaked to not have to make a 
> system call to check every memory reference (e.g., Gleb's suggestion of 
> a signal / callback / whatever).  I would certainly take that over 
> today's mechanisms (the 3 mentioned above), but would still like to 
> push forward with the possibility of all the memory registration being 
> handled at a level lower than MPI.
> 
I think the Pete approach with callback and application maintaining
registration cache is the right one.

--
			Gleb.