[openib-general] MPI error when using a "system" call in mpi job.

Ira Weiny weiny2 at llnl.gov
Wed Jun 14 09:59:58 PDT 2006


We are on a modified RedHat RHEL4 kernel.  Roughly 2.6.9.  :-(

I am going to try a 2.6.16 kernel I have built to see if it changes.

Ira


On Wed, 14 Jun 2006 09:04:48 -0400
Sayantan Sur <surs at cse.ohio-state.edu> wrote:

> Hello Ira,
> 
> I am running the program on 2.6.15 (EM64T machine) and 2.6.16 (IA32 
> machine). The program seems to be running fine. Can you tell us which 
> kernel you are using? We are using drivers pulled out of the trunk
> about 3-4 weeks back.
> 
> Thanks,
> Sayantan.
> 
> Ira Weiny wrote:
> 
> >A co-worker here was seeing the following MPI error from his job:
> >
> >[1] Abort: [ldev2:1] Got completion with error, code=1
> > at line 2148 in file viacheck.c
> >
> >After some tracking down he found that apparently if he used a
> >"system" call [int system(const char *string)] the next MPI command
> >will fail.
> >
> >I have been able to reproduce this with the attached simple "hello"
> >program.
> >
> >Perhaps someone has seen this type of error?  Here is the output
> >from 2 runs:
> >
> >weiny2 at ldev0:~/ior-test
> >17:04:04 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello x
> >ldev1
> >[0] Abort: [ldev1:0] Got completion with error, code=1
> > at line 2148 in file viacheck.c
> >ldev2
> >mpirun_rsh: Abort signaled from [0]
> >done.
> >weiny2 at ldev0:~/ior-test
> >17:05:23 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello
> >now = 0.000000
> >now = 0.000052
> >now = 0.000094
> >now = 0.000121
> >now = 0.000151
> >now = 0.001072
> >now = 0.001102
> >now = 0.001118
> >now = 0.001141
> >now = 0.001160
> >done.
> >
> >We are running mvapich 0.9.7 and the openib trunk rev 6829.
> >
> >Thanks,
> >Ira
> >
> >  
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >openib-general mailing list
> >openib-general at openib.org
> >http://openib.org/mailman/listinfo/openib-general
> >
> >To unsubscribe, please visit
> >http://openib.org/mailman/listinfo/openib-general
> >
> 
> -- 
> http://www.cse.ohio-state.edu/~surs
> 




More information about the general mailing list