[openib-general] MPI error when using a "system" call in mpi job.
glebn at voltaire.com
glebn at voltaire.com
Wed Jun 14 22:19:12 PDT 2006
On Wed, Jun 14, 2006 at 09:59:58AM -0700, Ira Weiny wrote:
> We are on a modified RedHat RHEL4 kernel. Roughly 2.6.9. :-(
>
This is known problen with kernel 2.6.9.
> I am going to try a 2.6.16 kernel I have built to see if it changes.
>
This will work with system(), but not with fork().
> Ira
>
>
> On Wed, 14 Jun 2006 09:04:48 -0400
> Sayantan Sur <surs at cse.ohio-state.edu> wrote:
>
> > Hello Ira,
> >
> > I am running the program on 2.6.15 (EM64T machine) and 2.6.16 (IA32
> > machine). The program seems to be running fine. Can you tell us which
> > kernel you are using? We are using drivers pulled out of the trunk
> > about 3-4 weeks back.
> >
> > Thanks,
> > Sayantan.
> >
> > Ira Weiny wrote:
> >
> > >A co-worker here was seeing the following MPI error from his job:
> > >
> > >[1] Abort: [ldev2:1] Got completion with error, code=1
> > > at line 2148 in file viacheck.c
> > >
> > >After some tracking down he found that apparently if he used a
> > >"system" call [int system(const char *string)] the next MPI command
> > >will fail.
> > >
> > >I have been able to reproduce this with the attached simple "hello"
> > >program.
> > >
> > >Perhaps someone has seen this type of error? Here is the output
> > >from 2 runs:
> > >
> > >weiny2 at ldev0:~/ior-test
> > >17:04:04 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello x
> > >ldev1
> > >[0] Abort: [ldev1:0] Got completion with error, code=1
> > > at line 2148 in file viacheck.c
> > >ldev2
> > >mpirun_rsh: Abort signaled from [0]
> > >done.
> > >weiny2 at ldev0:~/ior-test
> > >17:05:23 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello
> > >now = 0.000000
> > >now = 0.000052
> > >now = 0.000094
> > >now = 0.000121
> > >now = 0.000151
> > >now = 0.001072
> > >now = 0.001102
> > >now = 0.001118
> > >now = 0.001141
> > >now = 0.001160
> > >done.
> > >
> > >We are running mvapich 0.9.7 and the openib trunk rev 6829.
> > >
> > >Thanks,
> > >Ira
> > >
> > >
> > >
> > >------------------------------------------------------------------------
> > >
> > >_______________________________________________
> > >openib-general mailing list
> > >openib-general at openib.org
> > >http://openib.org/mailman/listinfo/openib-general
> > >
> > >To unsubscribe, please visit
> > >http://openib.org/mailman/listinfo/openib-general
> > >
> >
> > --
> > http://www.cse.ohio-state.edu/~surs
> >
>
> _______________________________________________
> openib-general mailing list
> openib-general at openib.org
> http://openib.org/mailman/listinfo/openib-general
>
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
--
Gleb.
More information about the general
mailing list