[ewg] [RFC] libibverbs: ibv_fork_init() and libhugetlbfs
Roland Dreier
rdreier at cisco.com
Thu May 6 13:55:31 PDT 2010
> When fork support is enabled in libibverbs, madvise() is called for every
> memory page that is registered as a memory region. Memory ranges that
> are passed to madvise() must be page aligned and the size must be a
> multiple of the page size. libibverbs uses sysconf(_SC_PAGESIZE) to find
> out the system page size and rounds all ranges passed to reg_mr() according
> to this page size. When memory from libhugetlbfs is passed to reg_mr(), this
> does not work as the page size for this memory range might be different
> (e.g. 16Mb). So libibverbs would have to use the huge page size to
> calculate a page aligned range for madvise.
Yes, Alex Vainman reaised this same issue a while ago.
> The patch below demonstrates a possible solution for this. It parses the
> /proc/PID/maps file when registering a memory region and decides if the
> memory that is to be registered is part of a libhugetlbfs range or not. If so,
> a page size of 16Mb is used to align the memory range passed to madvise().
>
> We see two problems with this: it is not a very elegant solution to parse the
> procfs file and the 16Mb are hardcoded currently. The latter point could be
> solved by calling gethugepagesize() from libhugetlbfs, which would add a new
> dependency to libibverbs.
I think that we cannot assume huge pages only come from libhugetlbfs --
we should support an application directly enabling huge pages (possibly
via another library too, so we can't assume that an application knows
the page size for a memory range it is about to register).
And also the 16 MB page size constant is of course not feasible -- with
all due respect, the x86 page size of 2 MB is much more likely in
practice :) (Although perhaps the much slower PowerPC TLB refill makes
users more likely to try and use hugetlb pages ;)
Alex suggested parsing files in the same way as libhugetlbfs does to get
the page size, and that seems to be the best solution, since I don't
think the libhugetlbfs license is compatible with the BSD license for
libibverbs.
But your trick of using /proc/*/maps looks nice. Does that only work
for libhugetlbfs or can we recognize direct mmap of hugetlb pages?
- R.
--
Roland Dreier <rolandd at cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
More information about the ewg
mailing list