[ewg] mvapich2 over iwarp DOA - bug520
Steve Wise
swise at opengridcomputing.com
Wed Apr 4 08:57:38 PDT 2007
I just built and installed today's daily ofed-1.2 build and mvapich2
doesn't work at all over iwarp. The build is
OFED-1.2-20070404-0600.tgz.
I've opened bug 520 to track this.
If I run a 2 node cpi, it hangs and never completes.
If I run a 2 node IMB-MPI1 I get a crash:
(gdb) bt
#0 0x00000038f5b71f13 in memcpy () from /lib64/tls/libc.so.6
#1 0x00002b99e2e61f40 in MPIDI_CH3I_MRAIL_Fill_Request ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#2 0x00002b99e2e2353a in handle_read ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#3 0x00002b99e2e23a99 in MPIDI_CH3I_Progress ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#4 0x00002b99e2e5db07 in MPIC_Wait ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#5 0x00002b99e2e5e22c in MPIC_Recv ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#6 0x00002b99e2e196f3 in MPIR_Bcast ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#7 0x00002b99e2e19f82 in PMPI_Bcast ()
from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#8 0x0000000000402444 in IMB_basic_input ()
#9 0x00000000004016b5 in main ()
(gdb)
This is mvapich2-0.9.8-9. I believe I tested a development build of
this code directly from OSU before they shipped it in OFED-1.2. I don't
know yet what's up.
dapltest and rping work across my cluster so I think this is an mvpich2
issue.
Shaun/Sundeep: Have you all tested the ofed-1.2 build with 0.9.8-9 yet?
Steve.
More information about the ewg
mailing list