[openib-general] RHEL 4 U3 - lost completions

Bill Hartner bhartner at austin.rr.com
Thu Oct 5 06:24:25 PDT 2006



Or Gerlitz wrote:
> 
> Roland Dreier wrote:
> >     Or> Roland - If indeed, does it make sense that the problem does
> >     Or> not reproduce with single threaded runs?
> >
> > Sorry, I can't parse the question.  However, the problem here seems to
> > be that the CQ buffer pages end up being marked for copy-on-write, and
> > I don't know of any reason why that would happen other than a fork()
> > happening somewhere (possibly behind the scenes in a system() call or
> > something like that).
> 
> My question was: assuming there is some fork() (eg behind the scenes of
> daemonize()) in the app, does it makes sense that everything works as
> long as the app is single threaded but when there are multiple threads
> things breaks (eg COW is applied on the page used to hold the CQ etc).

I found a fork() call in the app code that is made after after the
completion queue is created - thanks for keeping me pointed in the right
direction.  I also modified the pthread test case by adding a fork()
call after the completion queue is created and it now hangs after the
2nd RDMA like the app did.

-Bill




More information about the general mailing list