[ofa-general] Problem with libibverbs and huge pages registration.
Gleb Natapov
glebn at voltaire.com
Tue Apr 22 04:14:13 PDT 2008
On Mon, Apr 21, 2008 at 02:53:51PM -0700, Roland Dreier wrote:
> > ibv_reg_mr() fails if I try to register a memory region backed by a
> > huge page, but is not aligned to huge page boundary. Digging deeper I
> > see that libibverbs aligns memory region to a regular page size and
> > calls madvise() and the call fails. See program below to reproduce.
> > The program assumes that hugetlbfs is mounted on /huge and there is at
> > least one huge page available. I am not use it is possible to know if a
> > memory buffer is backed by huge page to solve the problem.
>
> Hmm, not sure off the top of my head how we should deal with this.
Me too :(
>
> > Another issue with libibverbs is that after first ibv_reg_mr() fails the
> > second registration attempt of the same buffer succeed since
> > ibv_madvise_range() doesn't cleanup after madvice failure and thinks
> > that memory is already "madvised".
>
> I guess we shouldn't change the refcnt until after we know if madvise
> has succeeded or not. Does the patch below help? I'm not sure if this
> is a good enough fix -- we might have split up a node and want to
> remerge it if the madvise fails... rolling back is a little tricky... I
> think this will take a little more thought.
>
> - R.
>
> --- a/src/memory.c
> +++ b/src/memory.c
> @@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
> __mm_add(tmp);
> }
>
> - node->refcnt += inc;
> -
I suppose "if" below depends on updated refcnt, so update can't be moved
down without changing the "if" statement.
> if ((inc == -1 && node->refcnt == 0) ||
> (inc == 1 && node->refcnt == 1)) {
> /*
> @@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
> goto out;
> }
>
> + node->refcnt += inc;
> +
> node = __mm_next(node);
> }
>
--
Gleb.
More information about the general
mailing list