[Users] A simple MPI application hangs with MXM
Mike Dubman
miked at mellanox.com
Fri Sep 21 08:58:25 PDT 2012
Hi Sebasien,
Thank you for your report. Could you please check that latest mxm is indeed installed on all nodes (rpm -qi mxm) ?
Also, please make sure that mpiexec you use refers to the newly compiled ompi? (please run with a full path and add "--display-map")
Aleksey - please follow this report.
Thanks
M
> -----Original Message-----
> From: Sébastien Boisvert [mailto:Sebastien.Boisvert at clumeq.ca]
> Sent: 21 September, 2012 18:26
> To: OpenFabrics Software users; Mike Dubman; Todd Wilde; Florent Parent;
> Jean-Francois Le Fillatre; Susan Coulter; sebastien.boisvert.3 at ulaval.ca
> Subject: A simple MPI application hangs with MXM
>
> Hello everyone,
>
> We have an ofed cluster with a Mellanox interconnect.
>
> Mellanox Messaging Accelerator (MXM) is a Mellanox product and Open-MPI
> supports it too [ http://www.open-
> mpi.org//faq/?category=openfabrics#mxm ]
>
> It includes
>
> "enhancements that significantly increase the scalability and performance
> of message communications in the network, alleviating bottlenecks within
> the parallel communication libraries."
>
>
> Anyway, we at our super computer center installed mxm v1.1
> (mxm_1.1.1341)
> [ http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar ] and
> Open-MPI v1.6.1 [ http://www.open-
> mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.1.tar.bz2 ].
> We compiled all of this with gcc 4.6.1.
>
> To test the installation., we then compiled latency_checker v0.0.1
> [ https://github.com/sebhtml/latency_checker/tarball/v0.0.1 ]
>
>
>
> Our interconnect is:
>
> Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR /
> 10GigE] (rev a0)
>
> It is the old a0 revision, not the newer b0 revision.
>
>
> The problem is that mxm hangs with only 2 processes, (I pressed CONTROL +
> C to send the signal SIGINT to the processes after 5 minutes):
>
>
> $ mpiexec -n 2 -tag-output -mca mtl_mxm_np 0 --mca mtl_base_verbose
> 999999 \
> --mca mtl_mxm_verbose 999999 \
> ./latency_checker/latency_checker -exchanges 100000 -message-size 4000
> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: Looking for
> mtl components
> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: Looking for
> mtl components
> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: opening mtl
> components
> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: found
> loaded component mxm
> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: component
> mxm register function successful
> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: opening mtl
> components
> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: found
> loaded component mxm
> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: component
> mxm register function successful
> [1,0]<stddiag>:[colosse1:12814]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm_component.c:91 -
> ompi_mtl_mxm_component_open() mxm component open
> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: component
> mxm open function successful
> [1,1]<stddiag>:[colosse1:12815]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm_component.c:91 -
> ompi_mtl_mxm_component_open() mxm component open
> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: component
> mxm open function successful
> [1,0]<stddiag>:[colosse1:12814] select: initializing mtl component mxm
> [1,0]<stddiag>:[colosse1:12814]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:194 -
> ompi_mtl_mxm_module_init() MXM support enabled
> [1,0]<stddiag>:[colosse1:12814]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:141 -
> ompi_mtl_mxm_create_ep() MXM version is old, consider to upgrade
> [1,1]<stddiag>:[colosse1:12815] select: initializing mtl component mxm
> [1,1]<stddiag>:[colosse1:12815]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:194 -
> ompi_mtl_mxm_module_init() MXM support enabled
> [1,1]<stddiag>:[colosse1:12815]
> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:141 -
> ompi_mtl_mxm_create_ep() MXM version is old, consider to upgrade
> [1,0]<stddiag>:[colosse1:12814] select: init returned success
> [1,0]<stddiag>:[colosse1:12814] select: component mxm selected
> [1,1]<stddiag>:[colosse1:12815] select: init returned success
> [1,1]<stddiag>:[colosse1:12815] select: component mxm selected
> [1,0]<stdout>:Initialized process with rank 0 (size is 2)
> [1,0]<stdout>:rank 0 is starting the test
> [1,1]<stdout>:Initialized process with rank 1 (size is 2)
>
> mpiexec: killing job...
>
> [1,1]<stderr>:MXM: Got signal 2 (Interrupt)
>
> First, it says 'MXM version is old, consider to upgrade'. But we have the latest
> MXM version.
>
>
> It seems that the problem is in MXM since MXM got the signal 2, which
> corresponds to SIGINT according to 'kill -l'.
>
>
> The thing runs fine with 1 process though, but 1 process sending messages to
> itself
> is silly and not scalable.
>
>
> Here, I presume that mxm uses shared memory because my 2 processes
> are on the same machine. But I don't know for sure.
>
> latency_checker works just fine with Open-MPI BTLs (self, sm, tcp, openib)
> implemented with Obi-Wan Kenobi (Open-MPI's PML ob1) on an array of
> machines.
>
> latency_checker also works just fine with the Open-MPI MTL psm from
> Intel/QLogic, implemented with Connor MacLeod (Open-MPI PML cm).
>
> And latency_checker also works fine with MPICH2 too.
>
> So clearly something is wrong with the way the Open-MPI MTL called mxm
> interacts with the Open-MPI PML cm, either in the proprietary code from
> Mellanox or in the mxm code that ships with Open-MPI.
>
> To figure out the problem further, I ran the same command with a patched
> latency_checker that includes additional stuff in the standard output.
>
>
> The log of process with rank=0:
>
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Isend destination: 0 tag: MESSAGE_TAG_BEGIN_TEST (5)
> DEBUG MPI_Isend destination: 1 tag: MESSAGE_TAG_BEGIN_TEST (5)
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Recv source: 0 tag: MESSAGE_TAG_BEGIN_TEST (5)
> DEBUG MPI_Isend destination: 1 tag: MESSAGE_TAG_TEST_MESSAGE (0)
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> (followed by a endless string of MPI_Iprobe calls)
>
>
> => So rank 0 tells ranks 0 and 1 to begin the test
> (MESSAGE_TAG_BEGIN_TEST)
> => rank 0 receives the message that tells it to begin the test
> (MESSAGE_TAG_BEGIN_TEST)
> => rank 0 sends a test message to rank 1 (MESSAGE_TAG_TEST_MESSAGE)
> => rank 0 busy waits for a reply which never comes.
>
>
> The log of process with rank=1
>
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
> (followed by a endless string of MPI_Iprobe calls)
>
>
> => It seems that rank 1 is unable to probe correctly anything at all.
>
>
> The bottom line is that rank 1 is not probing any message although there are
> two messages waiting to be received:
>
> - one with tag MESSAGE_TAG_BEGIN_TEST from source 0 and
> - one with tag MESSAGE_TAG_TEST_MESSAGE from source 0.
>
>
>
>
> There is this thread on the Open-MPI mailing list describing a related issue:
>
> [OMPI users] application with mxm hangs on startup
> http://www.open-mpi.org/community/lists/users/2012/08/19980.php
>
>
>
> There are also a few hits about mxm and hangs in the Open-MPI tickets:
>
> https://svn.open-mpi.org/trac/ompi/search?q=mxm+hang
>
>
> Namely, this patch will be included in Open-MPI v1.6.2:
>
> https://svn.open-mpi.org/trac/ompi/ticket/3273
>
>
> But it is about a hang after unloading the MXM component, so it is
> unrelated to what we see with mxm v1.1 (mxm_1.1.1341) and Open-MPI
> v1.6.1
> because components are unloaded once MPI_Finalize is called I think.
>
>
> So any pointers as to where we can go from here regarding MXM ?
>
>
>
> Cheers, Sébastien
>
> ***
>
> Sébastien Boisvert
> Ph.D. student in Physiology-endocrinology
> Université Laval
> http://boisvert.info
More information about the Users
mailing list