[openib-general] Re: Question about pinning memory

Gleb Natapov glebn at voltaire.com
Sun Jul 24 00:53:36 PDT 2005


On Sun, Jul 24, 2005 at 12:08:25AM +0300, Michael S. Tsirkin wrote:
> Quoting Jeff Squyres <jsquyres at open-mpi.org>:
> > [...]
> >
> > Since MPI usually does not have visibility 
> > for memory that has been free'd (or sbrk'ed), the following scenario is 
> > possible (and in our experience, likely):
> > 
> > - user calls malloc() and gets virtual address A back
> > - user calls MPI_Send with A
> > - MPI pins address A
> > - MPI saves A in internal lookup tables
> > - MPI_Send returns
> > - user calls free(A)
> > - user calls malloc() and gets virtual address A back, but now A points 
> > to different physical memory than before
> > - user calls MPI_Send with A
> > - MPI thinks that A is pinned (because A is in MPI's internal tables), 
> > and therefore doesn't pin it
> > - MPI tries to send A with the IB API, and Bad Things occur (either an 
> > error, or worse, bad data is sent)
> 
> Jeff, I dont think openib stack can currently help you with this problem.
> 
> I suggest using malloc_hook and friends to track memory allocations.
> Avoid caching memory for anything that isnt allocated with malloc,
> and you are guaranteed that your hook gets called when the memory is
> freed.
The only useful hook I see is __after_morecore_hook and when I last
checked it doesn't get called when glibs uses mmap for allocations.
And I don't know how you can distinguish how memory was obtained by user
via malloc or mmap of some file or user can even call sbrk() by himself.

> 
> Having said that - do you have some benchmarks to prove that
> pinning is that expensive? Are you sure that all these internal table
> lookups aren't even slower?
> I cant promise, but given a microbenchmark, maybe something could be done to
> improve the speed of this operation.
> 
In my experience cache hit in MVAPICH improves registration by factor of
two. Internal table lookup is a search in balanced tree. Hardware memory
registration is system call + writes on PCI bus + search in balanced
tree in get_user_pages, how can you improve this?

--
			Gleb.



More information about the general mailing list