[libfabric-users] OFI in combination with MPI

Peter Boyle paboyle at ph.ed.ac.uk
Tue Oct 25 16:02:56 PDT 2016


Thanks Jeff — this email and out of band were a gold mine for me!
Peter

> On 19 Oct 2016, at 18:35, Jeff Hammond <jeff.science at gmail.com> wrote:
> 
> 
> 
> On Wed, Oct 19, 2016 at 3:57 AM, Peter Boyle <paboyle at ph.ed.ac.uk <mailto:paboyle at ph.ed.ac.uk>> wrote:
> 
> Hi,
> 
> I’m interested in trying to minimally modify a scientific library
> 
>         www.github.com/paboyle/Grid/ <http://www.github.com/paboyle/Grid/>
> 
> to tackle obtaining good dual rail OPA performance on KNL
> from a single process per node.
> 
> 
> I will respond to you privately about platform-specific considerations.
>  
> The code is naturally hybrid OpenMP + MPI and 1 process per node minimises internal
> communication in the node so would be the preferred us.
> 
> The application currently has (compile time selection) SHMEM and MPI transport layers, but it appears
> that the MPI versions we have tried (OpenMPI, MVAPICH2, Intel MPI) have a big lock and no
> real multi-core concurrency from a single process.
> 
> 
> Indeed, fat locks are the status quo.  You probably know Blue Gene/Q supported fine-grain locking, and Cray MPI has something similar for KNL (https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140-file2.pdf <https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140-file2.pdf>, https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140.pdf <https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140.pdf>) but the implementation in Cray MPI is not using OFI (rather uGNI and DMAPP).
> 
> Unfortunately, even fine-grain locking isn't sufficient to get ideal concurrency due to MPI semantics.  MPI-4 may be able to help here, but it will be a while before this standard exists and likely more time before endpoints and/or communicator info-assertions are implemented sufficient to support ideal concurrency on networks that can do it.
>  
> Is there any wisdom about either
> 
> i) should SHMEM suffice to gain concurrency from single multithreaded process, in a way that MPI does not
> 
> 
> SHMEM is not burdened with the semantics of MPI send-recv, but OpenSHMEM currently does not support threads (we are in the process of trying to define it).
> 
> As far as I know SHMEM for OFI (https://github.com/Sandia-OpenSHMEM/SOS/ <https://github.com/Sandia-OpenSHMEM/SOS/>) is not going to give you thread-oriented concurrency.  If I'm wrong, someone on this list will correct me.
>  
> ii) would it be better to drop to OFI (using the useful SHMEM tutorial on GitHub as an example)
> 
> 
> That is the right path forward, modulo some platform-specific details I'll share out-of-band.
>  
> iii) Can job invocation rely on mpirun for job load, and use MPI (or OpenSHMEM) for exchange of network address, but then
> 
>      a) safely fire up OFI endpoints and use them instead of MPI
> 
>      b) safely fire up OFI endpoints and use them alongside and as well as MPI
> 
> Advice appreciated.
> 
> 
> You might find that the proposed OpenSHMEM extension for counting puts (http://www.csm.ornl.gov/workshops/openshmem2013/documents/presentations_and_tutorials/Dinan_Reducing_Synchronization_Overhead_Through_Bundled_Communication.pdf <http://www.csm.ornl.gov/workshops/openshmem2013/documents/presentations_and_tutorials/Dinan_Reducing_Synchronization_Overhead_Through_Bundled_Communication.pdf>, http://rd.springer.com/chapter/10.1007/978-3-319-05215-1_12 <http://rd.springer.com/chapter/10.1007/978-3-319-05215-1_12>) that is implemented in https://github.com/Sandia-OpenSHMEM/SOS <https://github.com/Sandia-OpenSHMEM/SOS> is a reasonable approximation to the HW-based send-recv that you probably used on Blue Gene/Q via SPI.  You might be able to build off of that code base.
> 
> Best,
> 
> Jeff
>  
> Peter
> 
> 
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org <mailto:Libfabric-users at lists.openfabrics.org>
> http://lists.openfabrics.org/mailman/listinfo/libfabric-users <http://lists.openfabrics.org/mailman/listinfo/libfabric-users>
> 
> 
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20161026/5858f594/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20161026/5858f594/attachment.ksh>


More information about the Libfabric-users mailing list