[ofa-general] Problem with libibverbs and huge pages registration.

Roland Dreier rdreier at cisco.com
Mon Apr 21 14:53:51 PDT 2008


 >    ibv_reg_mr() fails if I try to register a memory region backed by a
 > huge page, but is not aligned to huge page boundary. Digging deeper I
 > see that libibverbs aligns memory region to a regular page size and
 > calls madvise() and the call fails. See program below to reproduce.
 > The program assumes that hugetlbfs is mounted on /huge and there is at
 > least one huge page available. I am not use it is possible to know if a
 > memory buffer is backed by huge page to solve the problem.

Hmm, not sure off the top of my head how we should deal with this.

 > Another issue with libibverbs is that after first ibv_reg_mr() fails the
 > second registration attempt of the same buffer succeed since
 > ibv_madvise_range() doesn't cleanup after madvice failure and thinks
 > that memory is already "madvised".

I guess we shouldn't change the refcnt until after we know if madvise
has succeeded or not.  Does the patch below help?  I'm not sure if this
is a good enough fix -- we might have split up a node and want to
remerge it if the madvise fails... rolling back is a little tricky... I
think this will take a little more thought.

 - R.

--- a/src/memory.c
+++ b/src/memory.c
@@ -506,8 +506,6 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
 			__mm_add(tmp);
 		}
 
-		node->refcnt += inc;
-
 		if ((inc == -1 && node->refcnt == 0) ||
 		    (inc ==  1 && node->refcnt == 1)) {
 			/*
@@ -532,6 +530,8 @@ static int ibv_madvise_range(void *base, size_t size, int advice)
 				goto out;
 		}
 
+		node->refcnt += inc;
+
 		node = __mm_next(node);
 	}
 



More information about the general mailing list