[ofa-general] OpenIB-cma: DAT_INSUFFICIENT_RESOURCES
Kilian CAVALOTTI
kilian at stanford.edu
Tue Apr 3 11:22:52 PDT 2007
Hi all,
I'm not sure if that's the right place to ask, but I encounter a few issues
trying to run Linpack runs on an Infiniband cluster using OFED 1.1
I'm using Intel MPI 3.0 to compile and run HPL, and upon execution, I got
the following error messages:
*******************************************************************************
$ mpiexec -n 16 -env I_MPI_DEVICE rdssm ./xhpl
============================================================================
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20, 2004
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 40000
NB : 112
PMAP : Row-major process mapping
P : 4
Q : 4
PFACT : Left
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 256)
L1 : no-transposed form
U : no-transposed form
EQUIL : no
ALIGN : 8 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
register failed 196608 [0] error(0x30000): OpenIB-cma: DAT_INSUFFICIENT_RESOURCES:
register failed 196608 [4] error(0x30000): OpenIB-cma: DAT_INSUFFICIENT_RESOURCES:
register failed 196608 [12] error(0x30000): OpenIB-cma: DAT_INSUFFICIENT_RESOURCES:
register failed 196608 [8] error(0x30000): OpenIB-cma: DAT_INSUFFICIENT_RESOURCES:
rank 4 in job 11 node-9-1_42298 caused collective abort of all ranks
exit status of rank 4: killed by signal 9
*******************************************************************************
The same "register failed 196608 [0] error(0x30000): OpenIB-cma: DAT_INSUFFICIENT_RESOURCES:"
error occurs when using rdma as I_MPI_DEVICE. I don't even know if it's really an
OpenIB issue or more a MPI issue, but the basic multi-nodes MPI tests I ran ('Hello
world' based) seemed to work fine.
I would really appreciate any hint on this issue,
Thanks a lot,
--
Kilian
More information about the general
mailing list