[Users] A simple MPI application hangs with MXM

Alina Sklarevich alinas at mellanox.com
Tue Oct 9 06:32:32 PDT 2012


Hello Sébastien,

We are looking into this issue and will update you soon.

Thanks,
Alina.

-----Original Message-----
From: Sébastien Boisvert [mailto:Sebastien.Boisvert at clumeq.ca] 
Sent: Tuesday, September 25, 2012 11:07 PM
To: Mike Dubman
Cc: OpenFabrics Software users; Todd Wilde; Florent Parent; Jean-Francois Le Fillatre; Susan Coulter; sebastien.boisvert.3 at ulaval.ca; Aleksey Senin; Alina Sklarevich
Subject: Re: A simple MPI application hangs with MXM

Hello,


Several changes are related to Mellanox MXM in
  http://svn.open-mpi.org/svn/ompi/branches/v1.6/NEWS:

1.6.2
-----

- Fix issue with MX MTL.  Thanks to Doug Eadline for raising the issue.  
- Fix singleton MPI_COMM_SPAWN when the result job spans multiple nodes.
- Fix MXM hang, and update for latest version of MXM.
- Update to support Mellanox FCA 2.5.
- Fix startup hang for large jobs.
- Ensure MPI_TESTANY / MPI_WAITANY properly set the empty status when
  count==0.
- Fix MPI_CART_SUB behavior of not copying periods to the new
  communicator properly.  Thanks to John Craske for the bug report.
- Add btl_openib_abort_not_enough_reg_mem MCA parameter to cause Open
  MPI to abort MPI jobs if there is not enough registered memory
  available on the system (vs. just printing a warning).  Thanks to
  Brock Palen for raising the issue.
- Minor fix to Fortran MPI_INFO_GET: only copy a value back to the
  user's buffer if the flag is .TRUE.
- Fix VampirTrace compilation issue with the PGI compiler suite.




With Open-MPI 1.6.2 that was released today the simple code still hangs.

$ type mpiexec
mpiexec is hashed (/rap/clumeq/Seb-Boisvert/Open-MPI-1.6.2+MXM/build-mxm/bin/mpiexec)

$ mpiexec --mca mtl_mxm_np 0 -n 2 ./latency_checker Initialized process with rank 0 (size is 2) Initialized process with rank 1 (size is 2) rank 0 is starting the test
MXM: Got signal 2 (Interrupt)
MXM: Got signal 2 (Interrupt)
mpiexec: killing job...


The code runs just fine with Open-MPI BTLs:

$ mpiexec -n 2 ./latency_checker 

(runs OK)


As I said in the other email, it seems that MPI_Iprobe never see any incoming messages
when using MXM (the flag remains 0). Thus the hang.


Let me know if there is something I can do to help.


Sébastien


On 21/09/12 11:58 AM, Mike Dubman wrote:
> Hi Sebasien,
> Thank you for your report. Could you please check that latest mxm is indeed installed on all nodes (rpm -qi mxm) ?
> Also, please make sure that mpiexec you use refers to the newly compiled ompi? (please run with a full path and add "--display-map")
> 
> Aleksey - please follow this report.
> 
> Thanks
> M
> 
>> -----Original Message-----
>> From: Sébastien Boisvert [mailto:Sebastien.Boisvert at clumeq.ca]
>> Sent: 21 September, 2012 18:26
>> To: OpenFabrics Software users; Mike Dubman; Todd Wilde; Florent Parent;
>> Jean-Francois Le Fillatre; Susan Coulter; sebastien.boisvert.3 at ulaval.ca
>> Subject: A simple MPI application hangs with MXM
>>
>> Hello everyone,
>>
>> We have an ofed cluster with a Mellanox interconnect.
>>
>> Mellanox Messaging Accelerator (MXM) is a Mellanox product and Open-MPI
>> supports it too [ http://www.open-
>> mpi.org//faq/?category=openfabrics#mxm ]
>>
>> It includes
>>
>>    "enhancements that significantly increase the scalability and performance
>>    of message communications in the network, alleviating bottlenecks within
>>    the parallel communication libraries."
>>
>>
>> Anyway, we at our super computer center installed mxm v1.1
>> (mxm_1.1.1341)
>>     [ http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar ] and
>> Open-MPI v1.6.1 [  http://www.open-
>> mpi.org/software/ompi/v1.6/downloads/openmpi-1.6.1.tar.bz2 ].
>> We compiled all of this with gcc 4.6.1.
>>
>> To test the installation., we then compiled latency_checker v0.0.1
>>      [  https://github.com/sebhtml/latency_checker/tarball/v0.0.1  ]
>>
>>
>>
>> Our interconnect is:
>>
>>      Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR /
>> 10GigE] (rev a0)
>>
>> It is the old a0 revision, not the newer b0 revision.
>>
>>
>> The problem is that mxm hangs with only 2 processes, (I pressed CONTROL +
>> C to send the signal SIGINT to the processes after 5 minutes):
>>
>>
>> $ mpiexec -n 2 -tag-output  -mca mtl_mxm_np 0  --mca mtl_base_verbose
>> 999999 \
>> --mca mtl_mxm_verbose  999999 \
>> ./latency_checker/latency_checker -exchanges 100000 -message-size 4000
>> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: Looking for
>> mtl components
>> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: Looking for
>> mtl components
>> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: opening mtl
>> components
>> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: found
>> loaded component mxm
>> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: component
>> mxm register function successful
>> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: opening mtl
>> components
>> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: found
>> loaded component mxm
>> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: component
>> mxm register function successful
>> [1,0]<stddiag>:[colosse1:12814]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm_component.c:91 -
>> ompi_mtl_mxm_component_open() mxm component open
>> [1,0]<stddiag>:[colosse1:12814] mca: base: components_open: component
>> mxm open function successful
>> [1,1]<stddiag>:[colosse1:12815]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm_component.c:91 -
>> ompi_mtl_mxm_component_open() mxm component open
>> [1,1]<stddiag>:[colosse1:12815] mca: base: components_open: component
>> mxm open function successful
>> [1,0]<stddiag>:[colosse1:12814] select: initializing mtl component mxm
>> [1,0]<stddiag>:[colosse1:12814]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:194 -
>> ompi_mtl_mxm_module_init() MXM support enabled
>> [1,0]<stddiag>:[colosse1:12814]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:141 -
>> ompi_mtl_mxm_create_ep() MXM version is old, consider to upgrade
>> [1,1]<stddiag>:[colosse1:12815] select: initializing mtl component mxm
>> [1,1]<stddiag>:[colosse1:12815]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:194 -
>> ompi_mtl_mxm_module_init() MXM support enabled
>> [1,1]<stddiag>:[colosse1:12815]
>> ../../../../../source/ompi/mca/mtl/mxm/mtl_mxm.c:141 -
>> ompi_mtl_mxm_create_ep() MXM version is old, consider to upgrade
>> [1,0]<stddiag>:[colosse1:12814] select: init returned success
>> [1,0]<stddiag>:[colosse1:12814] select: component mxm selected
>> [1,1]<stddiag>:[colosse1:12815] select: init returned success
>> [1,1]<stddiag>:[colosse1:12815] select: component mxm selected
>> [1,0]<stdout>:Initialized process with rank 0 (size is 2)
>> [1,0]<stdout>:rank 0 is starting the test
>> [1,1]<stdout>:Initialized process with rank 1 (size is 2)
>>
>> mpiexec: killing job...
>>
>> [1,1]<stderr>:MXM: Got signal 2 (Interrupt)
>>
>> First, it says 'MXM version is old, consider to upgrade'. But we have the latest
>> MXM version.
>>
>>
>> It seems that the problem is in MXM since MXM got the signal 2, which
>> corresponds to SIGINT according to 'kill -l'.
>>
>>
>> The thing runs fine with 1 process though, but 1 process sending messages to
>> itself
>> is silly and not scalable.
>>
>>
>> Here, I presume that mxm uses shared memory because my 2 processes
>> are on the same machine. But I don't know for sure.
>>
>> latency_checker works just fine with Open-MPI BTLs (self, sm, tcp, openib)
>> implemented with Obi-Wan Kenobi (Open-MPI's PML ob1) on an array of
>> machines.
>>
>> latency_checker also works just fine with the Open-MPI MTL psm from
>> Intel/QLogic, implemented with Connor MacLeod (Open-MPI PML cm).
>>
>> And latency_checker also works fine with MPICH2 too.
>>
>> So clearly something is wrong with the way the Open-MPI MTL called mxm
>> interacts with the Open-MPI PML cm, either in the proprietary code from
>> Mellanox or in the mxm code that ships with Open-MPI.
>>
>> To figure out the problem further, I ran the same command with a patched
>> latency_checker that includes additional stuff in the standard output.
>>
>>
>> The log of process with rank=0:
>>
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Isend destination: 0 tag: MESSAGE_TAG_BEGIN_TEST (5)
>> DEBUG MPI_Isend destination: 1 tag: MESSAGE_TAG_BEGIN_TEST (5)
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Recv source: 0 tag: MESSAGE_TAG_BEGIN_TEST (5)
>> DEBUG MPI_Isend destination: 1 tag: MESSAGE_TAG_TEST_MESSAGE (0)
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> (followed by a endless string of MPI_Iprobe calls)
>>
>>
>>     => So rank 0 tells ranks 0 and 1 to begin the test
>> (MESSAGE_TAG_BEGIN_TEST)
>>     => rank 0 receives the message that tells it to begin the test
>> (MESSAGE_TAG_BEGIN_TEST)
>>     => rank 0 sends a test message to rank 1 (MESSAGE_TAG_TEST_MESSAGE)
>>     => rank 0 busy waits for a reply which never comes.
>>
>>
>> The log of process with rank=1
>>
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> DEBUG MPI_Iprobe source: MPI_ANY_SOURCE tag: MPI_ANY_TAG
>> (followed by a endless string of MPI_Iprobe calls)
>>
>>
>>     => It seems that rank 1 is unable to probe correctly anything at all.
>>
>>
>> The bottom line is that rank 1 is not probing any message although there are
>> two messages waiting to be received:
>>
>>  - one with tag MESSAGE_TAG_BEGIN_TEST from source 0 and
>>  - one with tag MESSAGE_TAG_TEST_MESSAGE from source 0.
>>
>>
>>
>>
>> There is this thread on the Open-MPI mailing list describing a related issue:
>>
>> [OMPI users] application with mxm hangs on startup
>> http://www.open-mpi.org/community/lists/users/2012/08/19980.php
>>
>>
>>
>> There are also a few hits about mxm and hangs in the Open-MPI tickets:
>>
>>    https://svn.open-mpi.org/trac/ompi/search?q=mxm+hang
>>
>>
>> Namely, this patch will be included in Open-MPI v1.6.2:
>>
>>    https://svn.open-mpi.org/trac/ompi/ticket/3273
>>
>>
>> But it is about a hang after unloading the MXM component, so it is
>> unrelated to what we see with mxm v1.1 (mxm_1.1.1341) and Open-MPI
>> v1.6.1
>> because components are unloaded once MPI_Finalize is called I think.
>>
>>
>> So any pointers as to where we can go from here regarding MXM ?
>>
>>
>>
>> Cheers, Sébastien
>>
>> ***
>>
>> Sébastien Boisvert
>> Ph.D. student in Physiology-endocrinology
>> Université Laval
>> http://boisvert.info
> 




More information about the Users mailing list