[ewg] possible mvapich2 problem

Steve Wise swise at opengridcomputing.com
Wed Mar 7 14:57:38 PST 2007


Hey Shaun,

I have a MPI test program that is detecting a buffer corruption when run
on mvapich2-0.9.8-5.  The same program works on mvapich2-0.9.8-4.  The
corruption happens over IB as well as iWARP on alpha libs and a recent
set of kernel modules from ofa 1.2.  

At this point in this (complicated) test, all ranks enter into a
MPI_Bcast().  The root rank, who is sending the data, checksums a bit of
the data buffer before entering MPI_Bcast(), and afterwards if there was
no error to validate that the data wasn't corrupted in the send buffer.
The buffer checksum differs after the bcast.  So somehow the data in the
buffer was altered presumably by the MPI layer (but I don't know that
yet).

Have ya'll seen this problem?  Maybe it was fixed in -6?  I'm going to
try and reduce this to a simple test, but I wanted to see if this is a
known mvapich2 problem with the 0.9.8-5 release.

Steve.





More information about the ewg mailing list