<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Thanks Jeff — this email and out of band were a gold mine for me!<div class="">Peter</div><div class=""><br class=""><div class=""><div><blockquote type="cite" class=""><div class="">On 19 Oct 2016, at 18:35, Jeff Hammond <<a href="mailto:jeff.science@gmail.com" class="">jeff.science@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><br class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Wed, Oct 19, 2016 at 3:57 AM, Peter Boyle <span dir="ltr" class=""><<a href="mailto:paboyle@ph.ed.ac.uk" target="_blank" class="">paboyle@ph.ed.ac.uk</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br class="">

Hi,<br class="">

<br class="">

I’m interested in trying to minimally modify a scientific library<br class="">

<br class="">

        <a href="http://www.github.com/paboyle/Grid/" rel="noreferrer" target="_blank" class="">www.github.com/paboyle/Grid/</a><br class="">

<br class="">

to tackle obtaining good dual rail OPA performance on KNL<br class="">

from a single process per node.<br class="">

<br class=""></blockquote><div class=""><br class=""></div><div class="">I will respond to you privately about platform-specific considerations.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

The code is naturally hybrid OpenMP + MPI and 1 process per node minimises internal<br class="">

communication in the node so would be the preferred us.<br class="">

<br class="">

The application currently has (compile time selection) SHMEM and MPI transport layers, but it appears<br class="">

that the MPI versions we have tried (OpenMPI, MVAPICH2, Intel MPI) have a big lock and no<br class="">

real multi-core concurrency from a single process.<br class="">

<br class=""></blockquote><div class=""><br class=""></div><div class="">Indeed, fat locks are the status quo.  You probably know Blue Gene/Q supported fine-grain locking, and Cray MPI has something similar for KNL (<a href="https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140-file2.pdf" class="">https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140-file2.pdf</a>, <a href="https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140.pdf" class="">https://cug.org/proceedings/cug2016_proceedings/includes/files/pap140.pdf</a>) but the implementation in Cray MPI is not using OFI (rather uGNI and DMAPP).</div><div class=""><br class=""></div><div class="">Unfortunately, even fine-grain locking isn't sufficient to get ideal concurrency due to MPI semantics.  MPI-4 may be able to help here, but it will be a while before this standard exists and likely more time before endpoints and/or communicator info-assertions are implemented sufficient to support ideal concurrency on networks that can do it.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Is there any wisdom about either<br class="">

<br class="">

i) should SHMEM suffice to gain concurrency from single multithreaded process, in a way that MPI does not<br class="">

<br class=""></blockquote><div class=""><br class=""></div><div class="">SHMEM is not burdened with the semantics of MPI send-recv, but OpenSHMEM currently does not support threads (we are in the process of trying to define it).</div><div class=""><br class=""></div><div class="">As far as I know SHMEM for OFI (<a href="https://github.com/Sandia-OpenSHMEM/SOS/" class="">https://github.com/Sandia-OpenSHMEM/SOS/</a>) is not going to give you thread-oriented concurrency.  If I'm wrong, someone on this list will correct me.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

ii) would it be better to drop to OFI (using the useful SHMEM tutorial on GitHub as an example)<br class="">

<br class=""></blockquote><div class=""><br class=""></div><div class="">That is the right path forward, modulo some platform-specific details I'll share out-of-band.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

iii) Can job invocation rely on mpirun for job load, and use MPI (or OpenSHMEM) for exchange of network address, but then<br class="">

<br class="">

     a) safely fire up OFI endpoints and use them instead of MPI<br class="">

<br class="">

     b) safely fire up OFI endpoints and use them alongside and as well as MPI<br class="">

<br class="">

Advice appreciated.<br class="">

<span class="gmail-HOEnZb"><font color="#888888" class=""><br class=""></font></span></blockquote><div class=""><br class=""></div><div class="">You might find that the proposed OpenSHMEM extension for counting puts (<a href="http://www.csm.ornl.gov/workshops/openshmem2013/documents/presentations_and_tutorials/Dinan_Reducing_Synchronization_Overhead_Through_Bundled_Communication.pdf" class="">http://www.csm.ornl.gov/workshops/openshmem2013/documents/presentations_and_tutorials/Dinan_Reducing_Synchronization_Overhead_Through_Bundled_Communication.pdf</a>, <a href="http://rd.springer.com/chapter/10.1007/978-3-319-05215-1_12" class="">http://rd.springer.com/chapter/10.1007/978-3-319-05215-1_12</a>) that is implemented in <a href="https://github.com/Sandia-OpenSHMEM/SOS" class="">https://github.com/Sandia-OpenSHMEM/SOS</a> is a reasonable approximation to the HW-based send-recv that you probably used on Blue Gene/Q via SPI.  You might be able to build off of that code base.</div><div class=""><br class=""></div><div class="">Best,</div><div class=""><br class=""></div><div class="">Jeff</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-HOEnZb"><font color="#888888" class="">

Peter<br class="">

<br class="">

<br class="">

--<br class="">

The University of Edinburgh is a charitable body, registered in<br class="">

Scotland, with registration number SC005336.<br class="">

<br class="">

______________________________<wbr class="">_________________<br class="">

Libfabric-users mailing list<br class="">

<a href="mailto:Libfabric-users@lists.openfabrics.org" class="">Libfabric-users@lists.<wbr class="">openfabrics.org</a><br class="">

<a href="http://lists.openfabrics.org/mailman/listinfo/libfabric-users" rel="noreferrer" target="_blank" class="">http://lists.openfabrics.org/<wbr class="">mailman/listinfo/libfabric-<wbr class="">users</a><br class="">

</font></span></blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div class="gmail_signature">Jeff Hammond<br class=""><a href="mailto:jeff.science@gmail.com" target="_blank" class="">jeff.science@gmail.com</a><br class=""><a href="http://jeffhammond.github.io/" target="_blank" class="">http://jeffhammond.github.io/</a></div>

</div></div>

</div></blockquote></div><br class=""></div></div></body></html>