[ofa-general] Problems running many MPI concurrent prosesses
Ole Widar Saastad
o.w.saastad at usit.uio.no
Fri Nov 7 00:01:23 PST 2008
I have experienced problems running many MPI processes concurrently.
Some of the MPI processes run fine (the first started) while the others
hang or have very very slow progress.
I have dual socket quad core SUN 2200 nodes and Mellanox cards.
Se below.
I have tried the OFED 1.2.5 stack and the OFED 1.4rc3 stack.
Any suggestions about settings or increments of buffers, tokens etc is
welcome.
An example :
Barrier benchmark :
Barrier size 9 iterations 32768 [8 procs - Resolution 0.95us]
9 nodes 12186.93 us
A barrier using 9 nodes should not take 12 milliseconds.
One barrier normally takes 11.20 microseconds using 9 nodes.
Some background information :
Stack: OFED 1.4rc3
Card : InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)
Best regards,
Ole W. Saastad
--
Ole W. Saastad, dr. scient.
Scientific Computing Group, USIT, University of Oslo
http://hpc.uio.no
More information about the general
mailing list