[ewg] Re: mvapich2 over iwarp DOA - bug520
Sundeep Narravula
narravul at cse.ohio-state.edu
Wed Apr 4 11:23:45 PDT 2007
Steve,
Thanks for forwarding the latest fw 3.3. We have verified the execution
of IMB and cpi on the beta1 and they work fine. I will upgrade to the
latest fw/drivers and look into this.
--Sundeep.
On Wed, 4 Apr 2007, Steve Wise wrote:
> I just built and installed today's daily ofed-1.2 build and mvapich2
> doesn't work at all over iwarp. The build is
> OFED-1.2-20070404-0600.tgz.
>
> I've opened bug 520 to track this.
>
> If I run a 2 node cpi, it hangs and never completes.
>
> If I run a 2 node IMB-MPI1 I get a crash:
>
> (gdb) bt
> #0 0x00000038f5b71f13 in memcpy () from /lib64/tls/libc.so.6
> #1 0x00002b99e2e61f40 in MPIDI_CH3I_MRAIL_Fill_Request ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #2 0x00002b99e2e2353a in handle_read ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #3 0x00002b99e2e23a99 in MPIDI_CH3I_Progress ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #4 0x00002b99e2e5db07 in MPIC_Wait ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #5 0x00002b99e2e5e22c in MPIC_Recv ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #6 0x00002b99e2e196f3 in MPIR_Bcast ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #7 0x00002b99e2e19f82 in PMPI_Bcast ()
> from /usr/mpi/gcc/mvapich2-0.9.8-9/lib/libmpich.so
> #8 0x0000000000402444 in IMB_basic_input ()
> #9 0x00000000004016b5 in main ()
> (gdb)
>
> This is mvapich2-0.9.8-9. I believe I tested a development build of
> this code directly from OSU before they shipped it in OFED-1.2. I don't
> know yet what's up.
>
> dapltest and rping work across my cluster so I think this is an mvpich2
> issue.
>
> Shaun/Sundeep: Have you all tested the ofed-1.2 build with 0.9.8-9 yet?
>
>
> Steve.
>
>
>
More information about the ewg
mailing list