[ewg] Re: mvapich2 over iwarp DOA - bug520

Sundeep Narravula narravul at cse.ohio-state.edu
Wed Apr 4 11:23:45 PDT 2007


Steve,
  Thanks for forwarding the latest fw 3.3. We have verified the execution
of IMB and cpi on the beta1 and they work fine. I will upgrade to the
latest fw/drivers and look into this.

  --Sundeep.

On Wed, 4 Apr 2007, Steve Wise wrote:

> I just built and installed today's daily ofed-1.2 build and mvapich2
> doesn't work at all over iwarp.   The build is
> OFED-1.2-20070404-0600.tgz.
>
> I've opened bug 520 to track this.
>
> If I run a 2 node cpi, it hangs and never completes.
>
> If I run a 2 node IMB-MPI1 I get a crash:
>
> (gdb) bt
> #0  0x00000038f5b71f13 in memcpy () from /lib64/tls/libc.so.6
> #1  0x00002b99e2e61f40 in MPIDI_CH3I_MRAIL_Fill_Request ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #2  0x00002b99e2e2353a in handle_read ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #3  0x00002b99e2e23a99 in MPIDI_CH3I_Progress ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #4  0x00002b99e2e5db07 in MPIC_Wait ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #5  0x00002b99e2e5e22c in MPIC_Recv ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #6  0x00002b99e2e196f3 in MPIR_Bcast ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #7  0x00002b99e2e19f82 in PMPI_Bcast ()
>    from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #8  0x0000000000402444 in IMB_basic_input ()
> #9  0x00000000004016b5 in main ()
> (gdb)
>
> This is mvapich2-0.9.8-9.  I believe I tested a development build of
> this code directly from OSU before they shipped it in OFED-1.2.  I don't
> know yet what's up.
>
> dapltest and rping work across my cluster so I think this is an mvpich2
> issue.
>
> Shaun/Sundeep:  Have you all tested the ofed-1.2 build with 0.9.8-9 yet?
>
>
> Steve.
>
>
>




More information about the ewg mailing list