[openib-general] RHEL 4 U3 - lost completions
bhartner at austin.rr.com
Thu Oct 5 06:24:25 PDT 2006
Or Gerlitz wrote:
> Roland Dreier wrote:
> > Or> Roland - If indeed, does it make sense that the problem does
> > Or> not reproduce with single threaded runs?
> > Sorry, I can't parse the question. However, the problem here seems to
> > be that the CQ buffer pages end up being marked for copy-on-write, and
> > I don't know of any reason why that would happen other than a fork()
> > happening somewhere (possibly behind the scenes in a system() call or
> > something like that).
> My question was: assuming there is some fork() (eg behind the scenes of
> daemonize()) in the app, does it makes sense that everything works as
> long as the app is single threaded but when there are multiple threads
> things breaks (eg COW is applied on the page used to hold the CQ etc).
I found a fork() call in the app code that is made after after the
completion queue is created - thanks for keeping me pointed in the right
direction. I also modified the pthread test case by adding a fork()
call after the completion queue is created and it now hangs after the
2nd RDMA like the app did.
More information about the general