[openib-general] Fork issues with simple MPI program

Arlin Davis arlin.r.davis at intel.com
Mon Feb 19 10:32:27 PST 2007


We are seeing some fork issues with a simple MPI program (attached) running on a 2.6.16+ kernels and
OFED 1.1. We have tried both Intel MPI and mvapich2 with the same results:

t_fork> mpiexec -n 2 t_system_fork                                                            
parent process
[0] started child process with pid=31552
send desc error
parent process
[0] Abort: [] Got completion with error 1, vendor code=69, dest rank=1
 at line 540 in file ibv_channel_manager.c
[1] I am child process with pid=25437
[1] started child process with pid=25437
[0] I am child process with pid=31552
child process
[1] finished pid=25437
child process
[0] finished pid=31552

rank 0 in job 2  svlmpicl400_32925   caused collective abort of all ranks
  exit status of rank 0: return code 252

If you run mvapich2 for uDAPL, it hangs before second MPI_Barrier() just like Intel MPI. If you use
the I_MPI_RDMA_USE_EVD_FALLBACK=1 option with Intel MPI you get the following error similar to
mvapich2:

parent process
parent process
[0] I am child process with pid=9596
[0] started child process with pid=9596
[1] I am child process with pid=11477
[1] started child process with pid=11477
[0][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2.
cookie=0x1
[1][rdma_iba.c:1007] Intel MPI fatal error: DTO operation completed with error. status=0x2.
cookie=0x1
child process
[1] finished pid=11477
child process
[0] finished pid=9596
rank 0 in job 8  cst-19_54707   caused collective abort of all ranks
  exit status of rank 0: return code 255

Any insight would be greatly appreciated. It was our assumption that the parent process can continue
to use IB resources after the fixes went into 2.6.16 and OFED 1.1. Is this true? 

Thanks,

-arlin
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: t_system_fork.c
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20070219/96dabe76/attachment.c>


More information about the general mailing list