[openib-general] Re: Question about pinning memory
Gleb Natapov
glebn at voltaire.com
Sun Jul 24 00:53:36 PDT 2005
On Sun, Jul 24, 2005 at 12:08:25AM +0300, Michael S. Tsirkin wrote:
> Quoting Jeff Squyres <jsquyres at open-mpi.org>:
> > [...]
> >
> > Since MPI usually does not have visibility
> > for memory that has been free'd (or sbrk'ed), the following scenario is
> > possible (and in our experience, likely):
> >
> > - user calls malloc() and gets virtual address A back
> > - user calls MPI_Send with A
> > - MPI pins address A
> > - MPI saves A in internal lookup tables
> > - MPI_Send returns
> > - user calls free(A)
> > - user calls malloc() and gets virtual address A back, but now A points
> > to different physical memory than before
> > - user calls MPI_Send with A
> > - MPI thinks that A is pinned (because A is in MPI's internal tables),
> > and therefore doesn't pin it
> > - MPI tries to send A with the IB API, and Bad Things occur (either an
> > error, or worse, bad data is sent)
>
> Jeff, I dont think openib stack can currently help you with this problem.
>
> I suggest using malloc_hook and friends to track memory allocations.
> Avoid caching memory for anything that isnt allocated with malloc,
> and you are guaranteed that your hook gets called when the memory is
> freed.
The only useful hook I see is __after_morecore_hook and when I last
checked it doesn't get called when glibs uses mmap for allocations.
And I don't know how you can distinguish how memory was obtained by user
via malloc or mmap of some file or user can even call sbrk() by himself.
>
> Having said that - do you have some benchmarks to prove that
> pinning is that expensive? Are you sure that all these internal table
> lookups aren't even slower?
> I cant promise, but given a microbenchmark, maybe something could be done to
> improve the speed of this operation.
>
In my experience cache hit in MVAPICH improves registration by factor of
two. Internal table lookup is a search in balanced tree. Hardware memory
registration is system call + writes on PCI bus + search in balanced
tree in get_user_pages, how can you improve this?
--
Gleb.
More information about the general
mailing list