[ofa-general] Re: [PATCH 2/3] libmlx4 - Optimize memory allocation of QP buffers with 64K pages
Roland Dreier
rdreier at cisco.com
Tue May 19 15:01:13 PDT 2009
> QP buffers are allocated with mlx4_alloc_buf(), which rounds the buffers
> size to the page size and then allocates page aligned memory using
> posix_memalign().
>
> However, this allocation is quite wasteful on architectures using 64K pages
> (ia64 for example) because we then hit glibc's MMAP_THRESHOLD malloc
> parameter and chunks are allocated using mmap. thus we end up allocating:
>
> (requested size rounded to the page size) + (page size) + (malloc overhead)
>
> rounded internally to the page size.
>
> So for example, if we request a buffer of page_size bytes, we end up
> consuming 3 pages. In short, for each QP buffer we allocate, there is an
> overhead of 2 pages. This is quite visible on large clusters especially where
> the number of QP can reach several thousands.
>
> This patch creates a new function mlx4_alloc_page() for use by
> mlx4_alloc_qp_buf() that does an mmap() instead of a posix_memalign() when
> the page size is 64K.
makes sense I guess. It would be nice if glibc() were smart enough to
know that mmap(MAP_ANONYMOUS) is going to give something page-aligned
anyway, but it seems that malloc overhead (required to make the memory
from posix_memalign() work with free()) is going to cost at least one
extra page, which as you point out is pretty bad with 64KB pages. (Of
course 64KB pages are a disaster for any workload that deals with small
objects of any kind, but that's another story)
However I wonder why we want to make this optimization only for 64KB
pages. It seems the code would be simpler if we just had our own
page-aligned allocator using mmap(MAP_ANONYMOUS) and just used it
unconditionally everywhere. Or is it not actually better even on
sane-sized (ie 4KB) page systems? It seems we still have the malloc
overhead which is going to cost us another page?
- R.
More information about the general
mailing list