[ewg] mvapich2 over iwarp DOA - bug520
    Steve Wise 
    swise at opengridcomputing.com
       
    Wed Apr  4 08:57:38 PDT 2007
    
    
  
I just built and installed today's daily ofed-1.2 build and mvapich2
doesn't work at all over iwarp.   The build is
OFED-1.2-20070404-0600.tgz.
I've opened bug 520 to track this.
If I run a 2 node cpi, it hangs and never completes.
If I run a 2 node IMB-MPI1 I get a crash:
(gdb) bt
#0  0x00000038f5b71f13 in memcpy () from /lib64/tls/libc.so.6
#1  0x00002b99e2e61f40 in MPIDI_CH3I_MRAIL_Fill_Request ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#2  0x00002b99e2e2353a in handle_read ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#3  0x00002b99e2e23a99 in MPIDI_CH3I_Progress ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#4  0x00002b99e2e5db07 in MPIC_Wait ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#5  0x00002b99e2e5e22c in MPIC_Recv ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#6  0x00002b99e2e196f3 in MPIR_Bcast ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#7  0x00002b99e2e19f82 in PMPI_Bcast ()
   from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
#8  0x0000000000402444 in IMB_basic_input ()
#9  0x00000000004016b5 in main ()
(gdb)
This is mvapich2-0.9.8-9.  I believe I tested a development build of
this code directly from OSU before they shipped it in OFED-1.2.  I don't
know yet what's up.
dapltest and rping work across my cluster so I think this is an mvpich2
issue.
Shaun/Sundeep:  Have you all tested the ofed-1.2 build with 0.9.8-9 yet?
Steve.
    
    
More information about the ewg
mailing list