[libfabric-users] OFI libfabric and forking
sean.hefty at intel.com
Tue Dec 14 09:53:19 PST 2021
> > Before I open an issue against libfabric, I've got a high level question.
> > If an application process initializes libfabric, and in particular, uses the
> > monitor infrastructure, should the application be able to fork additional
> > without problems?
> > Assume the forked processes are not utilizing libfabric in any way.
> With older Linux kernels, fork can cause issues because of copy-on-write. There
> are environment variable settings that can be used to help here, related to verbs fork
> support and memory registration.
> I believe very new kernels have a fix for problem, which disables copy-on-write for
> pinned memory.
> Sorry for the zombie thread answer, but even this answer isn't quite correct, right?
> Libfabric itself makes no statements about fork behaviors, and providers will each have
> their own behaviors. TCP should work regardless of kernel version, for example, and
> verbs and efa both have different sets of corner cases and setups where things do/do
> not work. GNI is likely to have an entirely different set of challenges, given its
> kernel module design. Should we add fork behaviors to the provider man pages going
I agree this is provider specific. The question was specifically asking about using the memory monitor infrastructure, so I'm assuming libibverbs memory registration (verbs and EFA provider, I don't think gni hooks into the memory monitor). There's some documentation for fork in fabric.7 and fi_verbs/efa.7 man pages. We could expand what's in fabric.7 to document that support is provider specific, and update the other provider man pages.
We've tried having fork support enabled as the default in the past, but the performance impact was too high, considering that most of the targeted apps didn't need it. We also looked at specifying fork support through the API, but there were cases where that came too late.
Anyway, your point raises the issue that we don't have a standard template for provider man pages to follow. It may be time to craft an outline.
More information about the Libfabric-users