[libfabric-users] MPI_Barrier hang with sockets provider

Ilango, Arun arun.ilango at intel.com
Thu Mar 21 15:28:20 PDT 2019


Hi John,

Please let us know if you see the hang with "tcp;ofi_rxm" as well.

Yes, the sockets provider is feature complete but the plan is to replace it with "tcp;ofi_rxm" or "udp;ofi_rxd" stack in the future once they have all features. Also, future performance improvements and feature additions would target the above two stacks. I believe issues in the sockets provider are fixed on a severity basis but I'll let Sean comment on that.

> What would you recommend as an example implementation
> for someone who wants to understand how to develop a provider?

What feature set are you looking to implement in your provider? Depending on that, you can look at one of existing providers for example: the tcp provider supports a FI_EP_MSG endpoint type and the udp provider supports FI_EP_DGRAM endpoint type over sockets. They make use of common utility code available in prov/util/* to implement many libfabric features. ofi_rxm and ofi_rxd "utility providers" build FI_EP_RDM endpoint support on top of any provider that implements FI_EP_MSG and FI_EP_DGRAM respectively.

There's also a developer guide here: https://github.com/ofiwg/ofi-guide/blob/master/OFIGuide.md

Thanks,
Arun.

From: Wilkes, John [mailto:John.Wilkes at amd.com]
Sent: Thursday, March 21, 2019 2:01 PM
To: Ilango, Arun <arun.ilango at intel.com>; libfabric-users at lists.openfabrics.org
Cc: Hefty, Sean <sean.hefty at intel.com>; Franques, Antonio <Antonio.Franques at amd.com>; Wilkes, John <John.Wilkes at amd.com>
Subject: RE: MPI_Barrier hang with sockets provider

Hi Arun,

Thank you!  This was driving me crazy!  I'd started out with ompi-3.0 and libfabric-1.3 running on a 4-node cluster of QEMU VMs and had no problems, but when I moved to real hardware, it hung.  So I tried the latest ompi and libfabric and had the same problem.  OpenMPI-4.0.0 and libfabric-1.7.0 also hangs on the QEMU virutal cluster.

My observation is that libfabric is very complex with many options, almost like an operating system all by itself.  I have found lots of great documentation about libfabric internals and theory of operation, but it's difficult for the unitiated to get started.

I was under the impression that the sockets provider implemented most or all of libfabric's feature set and therefore is a good reference and starting point.  Maybe that was true in the past, but these days not so much?

What would you recommend as an example implementation for someone who wants to understand how to develop a provider?

Thanks again for your help!

John

From: Ilango, Arun <arun.ilango at intel.com<mailto:arun.ilango at intel.com>>
Sent: Thursday, March 21, 2019 11:23 AM
To: Wilkes, John <John.Wilkes at amd.com<mailto:John.Wilkes at amd.com>>; libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Cc: Hefty, Sean <sean.hefty at intel.com<mailto:sean.hefty at intel.com>>
Subject: RE: MPI_Barrier hang with sockets provider

Hi John,

I'm not sure what's going wrong with sockets provider but can you try your test with "tcp;ofi_rxm" provider stack? Sockets provider is currently in maintenance mode and "tcp;ofi_rxm" is preferred for running apps over TCP sockets. Please let me know if you face any issues.

Your configure and mpirun command line seems fine to me.

> Is there a basic "how to" for configuring and running MPI with
> the libfabric sockets provider? Getting started with libfabric is not easy!

If it isn't available already, adding a readme for running MPI with libfabric would be a good idea. Let me check. Are there any other issues you see when getting started with libfabric? There are various resources available in the Readme on libfabric GitHub page (https://github.com/ofiwg/libfabric).

Thanks,
Arun.

From: Libfabric-users [mailto:libfabric-users-bounces at lists.openfabrics.org] On Behalf Of Wilkes, John
Sent: Thursday, March 21, 2019 7:03 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] MPI_Barrier hang with sockets provider

I have hardly any experience with libfabric, and I would greatly appreciate some suggestions!

Am I not configuring libfabric properly? Am I missing something on the mpirun command line?

Is there a basic "how to" for configuring and running MPI with the libfabric sockets provider? Getting started with libfabric is not easy!

John

From: Wilkes, John
Sent: Wednesday, March 20, 2019 8:16 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org>
Cc: Wilkes, John <John.Wilkes at amd.com<mailto:John.Wilkes at amd.com>>
Subject: MPI_Barrier hang with sockets provider

When I run the XSBench proxy app on 4 nodes, it finishes successfully, but when I run it with the libfabric sockets provider, it hangs.  After
the simulation is complete, there are calls to MPI_Barrier(), MPI_Reduce(), and MPI_Finalize().

command line:
$ mpirun -np 4 --map-by node --hostfile /nfs/mpi/etc/mpi-hostfile --mca mtl_ofi_provider_include sockets /nfs/software/proxy_apps/XSBench-14/src/XSBench -t 1 -s small

XSBench with the sockets provider runs to completion (does not hang) with -np 3.

OpenMPI-4.0.0
$ ./configure --prefix=/nfs/mpi --with-libfabric=/nfs/mpi --enable-orterun-prefix-by-default --disable-verbs-sshmem --without-verbs --enable-debug CFLAGS="-I/nfs/mpi/include -g -L/nfs/mpi/lib -ggdb -O0"

libfabric-1.7.0
$ ./configure --prefix=/nfs/mpi --enable-sockets=yes --enable-verbs=no --enable-debug=yes CFLAGS="-ggdb -O0"

A gdb stack trace on each node shows that node0 (where mpirun was run) is stuck in MPI_Reduce().  Node1 and node2 are in MPI_Finalize(), and node3 is in MPI_Barrier(). This is one example; the node that hangs in MPI_Barrier varies from run to run.

Node3 stack trace:

#0 fi_gettime_ms
#1 sock_cq_sreadfrom
#2 sock_cq_readfrom
#3 sock_cq_read
#4 fi_cq_read
#5 ompi_mtl_ofi_progress
#6 ompi_mtl_ofi_progress_no_inline
#7 opal_progress
#8 ompi_request_wait_completion
#9 ompi_request_default_wait
#10 ompi_coll_base_sendrecv_zero
#11 ompi_coll_base_barrier_intra_recursivedoubling
#12 ompi_coll_tuned_barrier_intra_dec_fixed
#13 PMPI_Barrier
#14 print_results
#15 main

Node0 stack trace:

#0 ??
#1 gettimeofday
#2 fi_gettime_ms
#3 sock_cq_sreadfrom
#4 sock_cq_readfrom
#5 sock_cq_read
#6 fi_cq_read
#7 ompi_mtl_ofi_progress
#8 ompi_mtl_ofi_progress_no_inline
#9 opal_progress
#10 ompi_request_wait_completion
#11 mca_pml_cm_recv
#12 ompi_coll_base_reduce_intra_basic_linear
#13 ompi_coll_tuned_reduce_intra_dec_fixed
#14 PMPI_Reduce
#15 print_results
#16 main

Node1 stack trace:
#0 __GI___nanosleep
#1 usleep
#2 ompi_mpi_finalize
#3 PMPI_Finalize
#4 main

Node2 stack trace:
#0 __GI___nanosleep
#1 usleep
#2 ompi_mpi_finalize
#3 PMPI_Finalize
#4 main

--
John Wilkes | AMD Research |  john.wilkes at amd.com<mailto:john.wilkes at amd.com> | office: +1 425.586.6412 (x26412)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20190321/7c741a2d/attachment-0001.html>


More information about the Libfabric-users mailing list