[ofw] Problems running many MPI concurrent prosesses

Ole Widar Saastad o.w.saastad at usit.uio.no
Thu Nov 6 05:50:16 PST 2008


I have experienced problems running many MPI processes concurrently.
Some of the MPI processes run fine (the first started) while the others
hang or have very very slow progress.

I have dual socket quad core SUN 2200 nodes and Mellanox cards.
Se below. 

I have tried the OFED 1.2.5 stack and the OFED 1.4rc3 stack.


Any suggestions about settings or increments of buffers, tokens etc is
welcome.

An example :
Barrier benchmark :
Barrier size  9 iterations 32768 [8 procs - Resolution 0.95us]
9 nodes 12186.93 us

A barrier using 9 nodes should not take 12 milliseconds.
One barrier normally takes 11.20 microseconds using 9 nodes.


Some background information :

Stack: OFED 1.4rc3
Card : InfiniBand: Mellanox Technologies: Unknown device 634a (rev a0)




-- 
Ole W. Saastad, dr. scient.
Scientific Computing Group, USIT, University of Oslo
http://hpc.uio.no




More information about the ofw mailing list