[ofa-general] QP connection healthy detection problem with fork/exec
Roland Dreier
rdreier at cisco.com
Thu Mar 27 21:22:34 PDT 2008
>
> However in Rank 1, fork() is called, and parent exit(), the child call sleep for 5 minutes. But in rank 0,
> The hear-beat message is always success untill I kill rank 2's child.
>
> Further, rank 1 calls fork() and exits, the child calls
> execl("/bin/sleep", "sleep", "300", (char *)0);
>
> In rank 0, the heart-beat is still success untill I kill the 'sleep' process.
>
> It is easy to understand that if only fork() is called, the child will hold QP resources from parent, rank 0 can NOT detect
> anything wrong. But if child calls exec, everything in rank 1 has been destroyed, why can't rank 0 detect the connection is broken ?
I think the problem is that exec() does not clean everything up.
AFAIK exec doesn't close open file descriptors so there's no way for
the kernel uverbs module to know that it should clean up.
- R.
More information about the general
mailing list