[ewg] Re: [PATCH] mlx4_ib: Optimize hugetlab pages support

Roland Dreier rdreier at cisco.com
Wed Jan 21 14:06:14 PST 2009


 > Since Linux does not merge adjacent pages into a single scatter entry through
 > calls to dma_map_sg(), we do this for huge pages which are guaranteed to be
 > comprised of adjacent natural pages of size PAGE_SIZE. This will result in a
 > significantly lower number of MTT segments used for registering hugetlb memory
 > regions.

Actually on some platforms, adjacent pages will be put into a single sg
entry (eg look at VMERGE under arch/powerpc).  So your
mlx4_ib_umem_write_huge_mtt() function won't work in such cases.  In any
case relying on such a fragile implementation detail with no checking is
not a good idea for future maintainability.  So this needs to be
reimplemented to work no matter how the DMA mapping layer presents the
hugetlb region (and also fail gracefully if the bus addresses returned
from DMA mapping don't work with the huge page size -- since it would be
correct for the DMA mapping layer to map every PAGE_SIZE chunk to an
arbitrary bus address, eg if the IOMMU address space becomes very
fragmented)

 > @@ -142,15 +176,34 @@ struct ib_mr *mlx4_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 >  
 >  	n = ib_umem_page_count(mr->umem);
 >  	shift = ilog2(mr->umem->page_size);
 > -
 > -	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
 > +	if (mr->umem->hugetlb) {
 > +		nhuge = ALIGN(n << shift, HPAGE_SIZE) >> HPAGE_SHIFT;
 > +		shift_huge = HPAGE_SHIFT;
 > +		err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
 > +				    convert_access(access_flags), nhuge, shift_huge, &mr->mmr);
 > +		if (err)
 > +			goto err_umem;
 > +
 > +		err = mlx4_ib_umem_write_huge_mtt(dev, &mr->mmr.mtt, mr->umem, nhuge);
 > +		if (err) {
 > +			if (err != -EAGAIN)
 > +				goto err_mr;
 > +			else {
 > +				mlx4_mr_free(to_mdev(pd->device)->dev, &mr->mmr);
 > +				goto regular_pages;
 > +			}
 > +		}
 > +	} else {
 > +regular_pages:
 > +		err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, virt_addr, length,
 >  			    convert_access(access_flags), n, shift, &mr->mmr);
 > -	if (err)
 > -		goto err_umem;
 > +		if (err)
 > +			goto err_umem;
 >  
 > -	err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem);
 > -	if (err)
 > -		goto err_mr;
 > +		err = mlx4_ib_umem_write_mtt(dev, &mr->mmr.mtt, mr->umem);
 > +		if (err)
 > +			goto err_mr;
 > +	}
 >  
 >  	err = mlx4_mr_enable(dev->dev, &mr->mmr);
 >  	if (err)

Also I think this could be made much clearer by putting the huge page
handling code in a separate function like mlx4_ib_reg_hugetlb_user_mr()
and then you could just do something like

	if (!mr->umem->hugetlb || mlx4_ib_reg_hugetlb_user_mr(...)) {
		// usual code path
	}

rather than having to use a goto into another block.

 - R.



More information about the ewg mailing list