[ofiwg] Hugepages usage in libfabric
jswaro at cray.com
Wed Apr 10 08:39:37 PDT 2019
From the ewg mailing list entry:
> Changes since v3:
> - determining the page size of a given memory range by watching madvise()
> fail has proven to be unreliable. So we introduce the RDMAV_HUGEPAGES_SAFE
> environment variable to let the user decide if the page size should be
> checked on every reg_mr() call or not. This requires the user to be aware
> if huge pages are used by the running application or not.
> I did not add an aditional API call to enable this, as applications can use
> setenv() + ibv_fork_init() to enable checking for huge pages in the code.
It looks like a conscious decision was made to force applications to call ibv_fork_init if they wanted hugepages support, regardless of intent to fork.
On 4/10/19, 10:32 AM, "ofiwg on behalf of James Swaro" <ofiwg-bounces at lists.openfabrics.org on behalf of jswaro at cray.com> wrote:
I'm not certain what prompted performing the check in ibv_fork_init. However, it seems to me that hugepages registration support is orthogonal to forking. Given what the environment variable addresses and the fact that the variable is only read in ibv_fork_init, I would wonder if the application would crash if it did NOT call ibv_fork_init -- even if it wasn't planning to fork.
On 4/10/19, 10:03 AM, "Jason Gunthorpe" <jgg at ziepe.ca> wrote:
On Wed, Apr 10, 2019 at 02:36:53PM +0000, James Swaro wrote:
> I’d be interested in exposing an environment variable, flag, or general
> tunable in libfabric for applications to indicate whether they want to
> use huge pages. Using the verbs provider on an internal development
> system, I’ve run into an issue where use of huge pages makes the memory
> registration function fail unless RDMAV_HUGEPAGES_SAFE is set.
> See https://lists.openfabrics.org/pipermail/ewg/2010-July/015609.htm
> l for context.
Oh gross, this should just be fixed in verbs to not require the ugly
environment variable in the first place.
> general, if necessary. For verbs, use of the RDMAV_HUGEPAGES_SAFE
> variable tends to impact performance mostly after the rendezvous
> threshold. After reaching the rendezvous threshold, latency for data
> transfer operations appears to increase by a factor of ten.
If the app isn't forking it shouldn't even be calling ibv_fork_init in
the first place, which solves both problems.
ofiwg mailing list
ofiwg at lists.openfabrics.org
More information about the ofiwg