<div dir="ltr">Hi John,<div><br></div><div>Okay I figured out the problem. I do not know if this will be important for your HPX work.</div><div>Basically the way SLURM is configured at NERSC, and apparently at CSCS is that</div><div>unless you suggest otherwise, each process launched by srun only gets 1/(total number of cores on node) network resources (Aries FMA descriptors, etc.). The Cray internal systems apparently aren't</div><div>configured this way. This results in the aborts in the GNI unit tests you were seeing.</div><div><br></div><div>A workaround for that is to add the following to the run_gnitest script:</div><div><br></div><div>
<p class="gmail-p1"><span class="gmail-s1"> </span><span class="gmail-s2">args</span><span class="gmail-s1">=</span><span class="gmail-s3">"</span><span class="gmail-s4">-N1 --exclusive --cpu_bind=none -t00:20:00 --ntasks=1 --cpus-per-task=X"</span></p><p class="gmail-p1"><span class="gmail-s4"><br></span></p><p class="gmail-p1"><span class="gmail-s4">where X is the number of cores on the nodes of piz daint.</span></p><p class="gmail-p1">The tests that are failing exercise are using multi FMA descriptors per process as they test support for scalable endpoints and shared tx contexts. So, if HPX is going to use either of these libfabric constructs, you will need to remember this --cpus-per-task SLURM argument.</p><p class="gmail-p1">I'll update the running criterion tests wiki.</p><p class="gmail-p1">Thanks,</p><p class="gmail-p1">Howard</p><p class="gmail-p1"><br></p></div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-02-16 14:29 GMT-07:00 Howard Pritchard <span dir="ltr"><<a href="mailto:hppritcha@gmail.com" target="_blank">hppritcha@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">HI John,<div><br></div><div>I'm seeing this same problem at NERSC/edison. I'll use that system to debug this problem.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Howard</div><div><br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2017-02-15 13:40 GMT-07:00 Biddiscombe, John A. <span dir="ltr"><<a href="mailto:biddisco@cscs.ch" target="_blank">biddisco@cscs.ch</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sung<br>
<br>
just fyi : I checked out the v2.2.0 branch of criterion and recompiled it and libfabric and got broadly the same results, slightly different number of fails, but the same pattern.<br>
<br>
daint103:/scratch/snx3000/bidd<wbr>isco/src/libfabric-cray (master *=)$ ~/apps/libfabric/bin/run_gnite<wbr>st<br>
[----] Warning! The test `api_cq::msg_send_only` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/cm.c:203: Assertion failed: fi_endpoint<br>
[FAIL] cm_basic::srv_setup: (0.44s)<br>
<div><div class="m_1682423121270233479h5">[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::inject` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::inject_write` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::inject_write_r<wbr>etrans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::inject_writeda<wbr>ta` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::inject_writeda<wbr>ta_retrans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::read` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::read_alignment<wbr>` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::readmsg` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::readv` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_alignmen<wbr>t` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_alignmen<wbr>t_retrans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_autoreg` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_autoreg_<wbr>uncached` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_error` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_fence` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_fence_re<wbr>trans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::write_retrans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writedata` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writedata_retr<wbr>ans` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writemsg` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writemsg_retra<wbr>ns` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writev` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
[----] Warning! The test `dgram_rma_stx::writev_retrans<wbr>` crashed during its setup or teardown.<br>
[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</div></div>[----] Warning! The test `rdm_rma_stx::inject` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::inject_write` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::inject_write_ret<wbr>rans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::inject_writedata<wbr>` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::inject_writedata<wbr>_retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::read` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::read_alignment` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::read_alignment_r<wbr>etrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::read_error` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::read_retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::readmsg` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::readmsg_retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::readv` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::readv_retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::trigger` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_alignment` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_alignment_<wbr>retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_autoreg` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_autoreg_un<wbr>cached` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_error` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_fence` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_fence_retr<wbr>ans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::write_retrans` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writedata` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writedata_retran<wbr>s` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writemsg` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writemsg_retrans<wbr>` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writev` crashed during its setup or teardown.<br>
<span>[----] prov/gni/test/rdm_dgram_stx.c:<wbr>165: Assertion failed: fi_endpoint<br>
</span>[----] Warning! The test `rdm_rma_stx::writev_retrans` crashed during its setup or teardown.<br>
[----] prov/gni/test/sep.c:2343: Assertion failed: fi_scalable_ep<br>
[FAIL] scalable::av_insert: (0.46s)<br>
[----] prov/gni/test/sep.c:177: Assertion failed: fi_scalable_ep<br>
[----] Warning! The test `scalablem::all` crashed during its setup or teardown.<br>
[----] prov/gni/test/sep.c:177: Assertion failed: fi_scalable_ep<br>
[----] Warning! The test `scalablem::misc` crashed during its setup or teardown.<br>
[----] prov/gni/test/sep.c:177: Assertion failed: fi_scalable_ep<br>
[----] Warning! The test `scalablet::all` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_inter_cm` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_inter_cm_pp` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_intra_cm` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_intra_cm_pp` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_self` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_auto::ep_connect<wbr>_self_pp` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_manual::ep_conne<wbr>ct_inter_cm_pp` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_manual::ep_conne<wbr>ct_intra_cm` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_manual::ep_conne<wbr>ct_intra_cm_pp` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_manual::ep_conne<wbr>ct_self` crashed during its setup or teardown.<br>
Unidentified node: Error detected by libibgni.so. Subsequent operation may be unreliable. IAA did not recognize this as an MPI process<br>
[----] prov/gni/test/vc.c:271: Assertion failed: fi_endpoint<br>
[----] Warning! The test `vc_conn_ping_manual::ep_conne<wbr>ct_self_pp` crashed during its setup or teardown.<br>
[====] Synthesis: Tested: 631 | Passing: 561 | Failing: 70 | Crashing: 68<br>
<br>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>