[openib-general] MPI error when using a "system" call in mpi job.

Sayantan Sur surs at cse.ohio-state.edu
Wed Jun 14 06:04:48 PDT 2006


Hello Ira,

I am running the program on 2.6.15 (EM64T machine) and 2.6.16 (IA32 
machine). The program seems to be running fine. Can you tell us which 
kernel you are using? We are using drivers pulled out of the trunk about 
3-4 weeks back.

Thanks,
Sayantan.

Ira Weiny wrote:

>A co-worker here was seeing the following MPI error from his job:
>
>[1] Abort: [ldev2:1] Got completion with error, code=1
> at line 2148 in file viacheck.c
>
>After some tracking down he found that apparently if he used a "system" call
>[int system(const char *string)] the next MPI command will fail.
>
>I have been able to reproduce this with the attached simple "hello" program.
>
>Perhaps someone has seen this type of error?  Here is the output from 2 runs:
>
>weiny2 at ldev0:~/ior-test
>17:04:04 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello x
>ldev1
>[0] Abort: [ldev1:0] Got completion with error, code=1
> at line 2148 in file viacheck.c
>ldev2
>mpirun_rsh: Abort signaled from [0]
>done.
>weiny2 at ldev0:~/ior-test
>17:05:23 > mpirun_rsh -rsh -hostfile hostfile -np 2 ./hello
>now = 0.000000
>now = 0.000052
>now = 0.000094
>now = 0.000121
>now = 0.000151
>now = 0.001072
>now = 0.001102
>now = 0.001118
>now = 0.001141
>now = 0.001160
>done.
>
>We are running mvapich 0.9.7 and the openib trunk rev 6829.
>
>Thanks,
>Ira
>
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>

-- 
http://www.cse.ohio-state.edu/~surs





More information about the general mailing list