[libfabric-users] OFI libfabric and forking

Barrett, Brian bbarrett at amazon.com
Tue Dec 14 09:21:33 PST 2021


On 11/29/21, 9:09 AM, "Libfabric-users on behalf of Hefty, Sean" <libfabric-users-bounces at lists.openfabrics.org on behalf of sean.hefty at intel.com> wrote:

    > Before I open an issue against libfabric, I've got a high level question.
    >
    > If an application process initializes libfabric, and in particular, uses the memory
    > monitor infrastructure,  should the application be able to fork additional processes
    > without problems?
    > Assume the forked processes are not utilizing libfabric in any way.

    With older Linux kernels, fork can cause issues because of copy-on-write.  There are environment variable settings that can be used to help here, related to verbs fork support and memory registration.

    I believe very new kernels have a fix for problem, which disables copy-on-write for pinned memory.

Sorry for the zombie thread answer, but even this answer isn't quite correct, right?  Libfabric itself makes no statements about fork behaviors, and providers will each have their own behaviors.  TCP should work regardless of kernel version, for example, and verbs and efa both have different sets of corner cases and setups where things do/do not work.  GNI is likely to have an entirely different set of challenges, given its kernel module design.  Should we add fork behaviors to the provider man pages going forward?

Brian



More information about the Libfabric-users mailing list