[ewg] mvapich2 over iwarp DOA - bug520

Steve Wise swise at opengridcomputing.com
Wed Apr 4 08:57:38 PDT 2007


I just built and installed today's daily ofed-1.2 build and mvapich2
doesn't work at all over iwarp.   The build is
OFED-1.2-20070404-0600.tgz.

I've opened bug 520 to track this.

If I run a 2 node cpi, it hangs and never completes.

If I run a 2 node IMB-MPI1 I get a crash:

(gdb) bt
#0  0x00000038f5b71f13 in memcpy () from /lib64/tls/libc.so.6
#1  0x00002b99e2e61f40 in MPIDI_CH3I_MRAIL_Fill_Request ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#2  0x00002b99e2e2353a in handle_read ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#3  0x00002b99e2e23a99 in MPIDI_CH3I_Progress ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#4  0x00002b99e2e5db07 in MPIC_Wait ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#5  0x00002b99e2e5e22c in MPIC_Recv ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#6  0x00002b99e2e196f3 in MPIR_Bcast ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#7  0x00002b99e2e19f82 in PMPI_Bcast ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#8  0x0000000000402444 in IMB_basic_input ()
#9  0x00000000004016b5 in main ()
(gdb)

This is mvapich2-0.9.8-9.  I believe I tested a development build of
this code directly from OSU before they shipped it in OFED-1.2.  I don't
know yet what's up.

dapltest and rping work across my cluster so I think this is an mvpich2
issue.

Shaun/Sundeep:  Have you all tested the ofed-1.2 build with 0.9.8-9 yet?


Steve.






More information about the ewg mailing list