[libfabric-users] OFI in combination with MPI

Wed Oct 19 10:35:49 PDT 2016

On Wed, Oct 19, 2016 at 3:57 AM, Peter Boyle <paboyle at ph.ed.ac.uk> wrote:

>
> Hi,
>
> I’m interested in trying to minimally modify a scientific library
>
>         www.github.com/paboyle/Grid/
>
> to tackle obtaining good dual rail OPA performance on KNL
> from a single process per node.
>
>
I will respond to you privately about platform-specific considerations.

> The code is naturally hybrid OpenMP + MPI and 1 process per node minimises
> internal
> communication in the node so would be the preferred us.
>
> The application currently has (compile time selection) SHMEM and MPI
> transport layers, but it appears
> that the MPI versions we have tried (OpenMPI, MVAPICH2, Intel MPI) have a
> big lock and no
> real multi-core concurrency from a single process.
>
>
Indeed, fat locks are the status quo.  You probably know Blue Gene/Q
supported fine-grain locking, and Cray MPI has something similar for KNL (
https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140-file2.pdf
, https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140.pdf)
but the implementation in Cray MPI is not using OFI (rather uGNI and DMAPP).

Unfortunately, even fine-grain locking isn't sufficient to get ideal
concurrency due to MPI semantics.  MPI-4 may be able to help here, but it
will be a while before this standard exists and likely more time before
endpoints and/or communicator info-assertions are implemented sufficient to
support ideal concurrency on networks that can do it.

> Is there any wisdom about either
>
> i) should SHMEM suffice to gain concurrency from single multithreaded
> process, in a way that MPI does not
>
>
SHMEM is not burdened with the semantics of MPI send-recv, but OpenSHMEM
currently does not support threads (we are in the process of trying to
define it).

As far as I know SHMEM for OFI (https://github.com/Sandia-OpenSHMEM/SOS/)
is not going to give you thread-oriented concurrency.  If I'm wrong,
someone on this list will correct me.

> ii) would it be better to drop to OFI (using the useful SHMEM tutorial on
> GitHub as an example)
>
>
That is the right path forward, modulo some platform-specific details I'll
share out-of-band.

> iii) Can job invocation rely on mpirun for job load, and use MPI (or
> OpenSHMEM) for exchange of network address, but then
>
>      a) safely fire up OFI endpoints and use them instead of MPI
>
>      b) safely fire up OFI endpoints and use them alongside and as well as
> MPI
>
> Advice appreciated.
>
>
You might find that the proposed OpenSHMEM extension for counting puts (
http://www.csm.ornl.gov/workshops/openshmem2013/documents/presentations_and_tutorials/Dinan_Reducing_Synchronization_Overhead_Through_Bundled_Communication.pdf
, http://rd.springer.com/chapter/10.1007/978-3-319-05215-1_12) that is
implemented in https://github.com/Sandia-OpenSHMEM/SOS is a reasonable
approximation to the HW-based send-recv that you probably used on Blue
Gene/Q via SPI.  You might be able to build off of that code base.

Best,

Jeff

> Peter
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20161019/7ecb523d/attachment.html>