[openib-general] RFC: [PATCH untested] IB/uverbs: optimize registration for huge pages

Michael S. Tsirkin mst at mellanox.co.il
Tue Aug 15 14:13:19 PDT 2006


Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: question: ib_umem page_size
> 
>     Michael> Roland, could you please clarify what does the page_size
>     Michael> field in struct ib_mem do?
> 
> It gives the page size for the user memory described by the struct.
> The idea was that if/when someone tries to optimize for huge pages,
> then the low-level driver can know that a region is using huge pages
> without having to walk through the page list and search for the
> minimum physically contiguous size.

OK, so here's a patch [warning: untested] that attempts to do this - we have
customers that run out of resources when they register lots of huge pages,
and this will help.

How does this look?  Is this the intended usage?

 uverbs_mem.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletion(-)

--

Optimize memory registration for huge pages, by walking through
the page list and searching for the minimum physically contiguous
size.

Signed-off-by: Michael S. Tsirkin <mst at mellanox.co.il>

diff --git a/drivers/infiniband/core/uverbs_mem.c b/drivers/infiniband/core/uverbs_mem.c
index efe147d..f750652 100644
--- a/drivers/infiniband/core/uverbs_mem.c
+++ b/drivers/infiniband/core/uverbs_mem.c
@@ -73,6 +73,8 @@ int ib_umem_get(struct ib_device *dev, s
 	unsigned long lock_limit;
 	unsigned long cur_base;
 	unsigned long npages;
+	dma_addr_t a, seg_end;
+	u32 mask = 0;
 	int ret = 0;
 	int off;
 	int i;
@@ -87,7 +89,6 @@ int ib_umem_get(struct ib_device *dev, s
 	mem->user_base = (unsigned long) addr;
 	mem->length    = size;
 	mem->offset    = (unsigned long) addr & ~PAGE_MASK;
-	mem->page_size = PAGE_SIZE;
 	mem->writable  = write;
 
 	INIT_LIST_HEAD(&mem->chunk_list);
@@ -149,6 +150,15 @@ int ib_umem_get(struct ib_device *dev, s
 				goto out;
 			}
 
+			for (i = 0; i < chunk->nents; ++i) {
+				a = sg_dma_adress(chunk->page_list[i]);
+				if ((i || off) && a != seg_end) {
+					mask |= seg_end;
+					mask |= a;
+				}
+				seg_end = a + sg_dma_len(chunk->page_list[i]);
+			}
+			
 			ret -= chunk->nents;
 			off += chunk->nents;
 			list_add_tail(&chunk->list, &mem->chunk_list);
@@ -157,6 +167,8 @@ int ib_umem_get(struct ib_device *dev, s
 		ret = 0;
 	}
 
+	mem->page_size = ffs(mask) ? 1 << (ffs(mask) - 1) : (1 << 31);
+
 out:
 	if (ret < 0)
 		__ib_umem_release(dev, mem, 0);

-- 
MST




More information about the general mailing list