[ofa-general] Problem with libibverbs and huge pages registration.
Roland Dreier
rdreier at cisco.com
Mon Apr 21 14:53:51 PDT 2008
> ibv_reg_mr() fails if I try to register a memory region backed by a
> huge page, but is not aligned to huge page boundary. Digging deeper I
> see that libibverbs aligns memory region to a regular page size and
> calls madvise() and the call fails. See program below to reproduce.
> The program assumes that hugetlbfs is mounted on /huge and there is at
> least one huge page available. I am not use it is possible to know if a
> memory buffer is backed by huge page to solve the problem.
Hmm, not sure off the top of my head how we should deal with this.
> Another issue with libibverbs is that after first ibv_reg_mr() fails the
> second registration attempt of the same buffer succeed since
> ibv_madvise_range() doesn't cleanup after madvice failure and thinks
> that memory is already "madvised".
I guess we shouldn't change the refcnt until after we know if madvise
has succeeded or not. Does the patch below help? I'm not sure if this
is a good enough fix -- we might have split up a node and want to
remerge it if the madvise fails... rolling back is a little tricky... I
think this will take a little more thought.
- R.
--- a/src/memory.c
+++ b/src/memory.c
@@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
__mm_add(tmp);
}
- node->refcnt += inc;
-
if ((inc == -1 && node->refcnt == 0) ||
(inc == 1 && node->refcnt == 1)) {
/*
@@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
goto out;
}
+ node->refcnt += inc;
+
node = __mm_next(node);
}
More information about the general
mailing list