[ofiwg] fork support and MR cache
Jason Gunthorpe
jgg at ziepe.ca
Thu Oct 8 11:48:26 PDT 2020
On Thu, Oct 08, 2020 at 06:42:12PM +0000, Hefty, Sean wrote:
> > >>>>> There have been extensive discussions on github around the MR cache,
> > >>>>> deadlocks, libibverbs madvise tracking, and fork. The current
> > >>>>> direction is to only enable the MR cache when fork is disabled.
> > >>>>> This was done to work-around internal libibverbs tracking. But I
> > >>>>> suspect that bypassing that tracking (which is possible) can still
> > >>>>> lead to issues when registrations are made through the MR cache.
> > >>>>
> > >>>> MADV_DONTFORK will be obsolete starting in kernel v5.9
> > >>>>
> > >>>> If you can test and confirm that everything works without it then we
> > >>>> can detect and disable ibv_fork_init on new kernels.
> > >>>
> > >>> Interesting. What will the behavior be for registered regions when fork is called?
> > >>
> > >> They are copied into the fork.
> > >>
> > >>> My concern is that the registrations are made and maintained without
> > >>> the application being aware. Will cached registrations need to be
> > >>> released when fork is invoked, or is there some other mechanism
> > >>> coming into play now?
> > >>
> > >> MRs continue to reliably point to memory owned in the parent
> > >> process.
> > >>
> > >> The child process will be unable to use any MRs or verbs objects, just
> > >> like today.
> > >
> > > Thanks - I think this means that fork becomes a non-issue.
> >
> > We’ll have to figure out how to make some of these decisions at runtime. We need a fix
> > for today’s world (even if we’re ok with it being a little more hacky, since it has a
> > finite lifespan), and a way to know whether we need to run that hacky fix or not in the
> > future. I don’t think our current hack around trying ibv_fork_init() will work in a
> > world where rdma-core isn’t building the RB tree. So some way of exposing that new
> > behavior out of rdma-core would, unfortunately, be helpful to Libfabric.
>
> I think a reasonable hack could be to implement the behavior from my
> first email. That would involve bypassing the libibverbs calls and
> going directly to the verbs providers for reg/dereg mr.
If you don't want the MADV then just don't call ibv_fork_init
Please don't recode the kernel interface outside rdma-core, that stuff
is hard to do right.
Jason
More information about the ofiwg
mailing list