[ofa-general] Re: [mvapich-discuss] fork() failing in mvapich1 and mvapich2, using OFED 1.4

Matthew Koop koop at cse.ohio-state.edu
Wed Nov 12 09:13:00 PST 2008


Hi Mike,

In order to have the fork support enabled you need to set an additional
ENV. See Section 7.1.2 in the User Guide for more information:

http://mvapich.cse.ohio-state.edu/support/mvapich_user_guide.html#x1-350007.1.2

Thanks,

Matt


On Wed, 12 Nov 2008, Mike Heinz wrote:

> I'm not sure when this stopped working, but I'm getting a complaint from
> our QA people that our fork() test program is failing with mvapich1 and
> mvapich2 when tested with OFED 1.4. When I tested with OFED 1.3.1, I got
> a similar result:
>
>
> [root at panic mpi_fork]$ mpirun_rsh -np 2 panic homer mpi_fork 128 1024
> Exit code -3 signaled from homer
> Abort signaled by rank 0: [panic:0] Got completion with error
> IBV_WC_LOC_LEN_ERR, code=1, dest rank=1
>
> Killing remote processes...MPI process terminated unexpectedly
> DONE
>
>
> This is the program that generates the failure:
>
> #include <stdlib.h>
> #include <math.h>
> #include <assert.h>
> #include <sys/wait.h>
>
>
> #define MYBUFSIZE (4*1024*1028)
> #define MAX_REQ_NUM 100000
>
> char s_buf1[MYBUFSIZE];
> char r_buf1[MYBUFSIZE];
>
>
> MPI_Request request[MAX_REQ_NUM];
> MPI_Status my_stat[MAX_REQ_NUM];
>
> int main(int argc,char *argv[])
> {
>     int  myid, numprocs, i;
>     int size, loop, page_size;
>     char *s_buf, *r_buf;
>     double t_start=0.0, t_end=0.0, t=0.0;
>
>
>     MPI_Init(&argc,&argv);
>     MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
>     MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>
>     if ( argc < 3 ) {
>        fprintf(stderr, "Usage: mpi_fork loop msg_size\n");
>        MPI_Finalize();
>        return 0;
>     }
>     size=atoi(argv[2]);
>     loop = atoi(argv[1]);
>
>     if(size > MYBUFSIZE){
>          fprintf(stderr, "Maximum message size is %d\n",MYBUFSIZE);
>          MPI_Finalize();
>          return 0;
>     }
>
>     if(loop > MAX_REQ_NUM){
>          fprintf(stderr, "Maximum number of iterations is
> %d\n",MAX_REQ_NUM);
>          MPI_Finalize();
>          return 0;
>     }
>
>     page_size = getpagesize();
>
>     s_buf = (char*)(((unsigned long)s_buf1 + (page_size -1))/page_size *
> page_size);
>     r_buf = (char*)(((unsigned long)r_buf1 + (page_size -1))/page_size *
> page_size);
>
>     assert( (s_buf != NULL) && (r_buf != NULL) );
>
>     for ( i=0; i<size; i++ ){
>            s_buf[i]='a';
>            r_buf[i]='b';
>     }
>
>     /*warmup */
>     if (myid == 0)
>     {
>         for ( i=0; i< loop; i++ ) {
>             MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD,
> request+i);
>         }
>
>         MPI_Waitall(loop, request, my_stat);
>         MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD,
> &my_stat[0]);
>
>     }else{
>         for ( i=0; i< loop; i++ ) {
>         MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD,
> request+i);
>         }
>     MPI_Waitall(loop, request, my_stat);
>         MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD);
>     }
>     // fork a child process and make sure it lives beyond parent
> touching pages
>     // if fork is not properly handled in stack, parent would get a copy
>     // of its registered/locked pages (such as qp wqes) on 1st access
>     // and problems such as Local Length Error would be reported by HCA
>     if (fork() == 0) {
>         // child exists but doesn't touch anything, parent still owns
> pages
>         sleep(10);
>         // exec another program
>         execlp("date", "date", NULL);
>         // just in case exec fails
>         exit(0);
>     }
>
>     MPI_Barrier(MPI_COMM_WORLD);
>
>     if (myid == 0)
>     {
>         t_start=MPI_Wtime();
>         for ( i=0; i< loop; i++ ) {
>             MPI_Isend(s_buf, size, MPI_CHAR, 1, 100, MPI_COMM_WORLD,
> request+i);
>         }
>
>         MPI_Waitall(loop, request, my_stat);
>         MPI_Recv(r_buf, 4, MPI_CHAR, 1, 101, MPI_COMM_WORLD,
> &my_stat[0]);
>
>         t_end=MPI_Wtime();
>         t = t_end - t_start;
>
>     }else{
>         for ( i=0; i< loop; i++ ) {
>         MPI_Irecv(r_buf, size, MPI_CHAR, 0, 100, MPI_COMM_WORLD,
> request+i);
>         }
>     MPI_Waitall(loop, request, my_stat);
>         MPI_Send(s_buf, 4, MPI_CHAR, 0, 101, MPI_COMM_WORLD);
>     }
>
>     if ( myid == 0 ) {
>        double tmp;
>        tmp = ((size*1.0)/1.0e6)*loop;
>        fprintf(stdout,"%d\t%f\n", size, tmp/t);
>     }
>     {
>         int status;
>         int ret;
>
>         ret = wait(&status);
>         if (ret == -1 || ! WIFEXITED(status) || WEXITSTATUS(status) !=
> 0)
>         {
>            fprintf(stdout,"ERROR: child failure: ret=%d, status=0x%x,
> exit_status=%d\n", ret, status, WEXITSTATUS(status));
>         }
>     }
>
>     MPI_Barrier(MPI_COMM_WORLD);
>     MPI_Finalize();
>     return 0;
> }
>
>
> --
> Michael Heinz
> Principal Engineer, Qlogic Corporation
> King of Prussia, Pennsylvania
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
>




More information about the general mailing list