<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Sorry Howard, this went into my spam folder and I missed it.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I have run the gni test and it creates rather a lot of output when debug is enabled.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I’ve put the output (800MB) here
<a href="ftp://ftp.cscs.ch/out/biddisco/cray/gnitestout.txt">ftp://ftp.cscs.ch/out/biddisco/cray/gnitestout.txt</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">The synopsis is
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">[====] Synthesis: Tested: 631 | Passing: 573 | Failing: 58 | Crashing: 57<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">with the majority of errors being of the form<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">[ 240] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE) returned error - Invalid argument<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">with occasional<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">[ 240] JOB: GNI_CdmAttach: ioctl(GNI_IOC_NIC_SETATTR) NIC[0] returned error - No space left on device<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">(but from what I read the gnitest only runs on one node, so it may not be much use).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks for taking the time to investigate.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">PS. I forgot to ask - if the FI_EP_MSG or gni is due in 1.5.0 then what sort of timescale would one expect that to be in?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">JB<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Howard Pritchard [mailto:hppritcha@gmail.com]
<br>
<b>Sent:</b> 14 February 2017 00:08<br>
<b>To:</b> Biddiscombe, John A.<br>
<b>Cc:</b> libfabric-users@lists.openfabrics.org<br>
<b>Subject:</b> Re: [libfabric-users] FI_EP_MSG on cray<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi John,<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Could you try the <span style="font-size:9.5pt">run_gnitest script with this UGNI debug level set? I'd like to understand why that's failing for you.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">I cannot get fi_pingpong to work with FI_EP_MSG for GNI provider. It should work though. I filed an issue on the GNI downstream provider repo.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:9.5pt">Howard</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">2017-02-13 13:21 GMT-07:00 Biddiscombe, John A. <<a href="mailto:biddisco@cscs.ch" target="_blank">biddisco@cscs.ch</a>>:<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Howard, here’s some output …<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">The machine is the cray piz daint at CSCS,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Allocation as follows<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">salloc -N 2 -C mc --time=02:00:00 –exclusive<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">daint102:/scratch/snx3000/biddisco/build$ export UGNI_DEBUG=10<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">daint102:/scratch/snx3000/biddisco/build$ ./frun.sh ~/apps/fabtests/bin/fi_msg<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">running /users/biddisco/apps/fabtests/bin/fi_msg on nid00[722,724]<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">nid00722 is 148.187.34.215<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Generated command is srun -n 2 --ntasks-per-node=1 -l --multi-prog ./scalable.conf<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0 /users/biddisco/apps/fabtests/bin/fi_msg -p gni<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1 /users/biddisco/apps/fabtests/bin/fi_msg -p gni 148.187.34.215<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"> <o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: [ 44] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0 GNII_debug_mask: 0x0 GNII_debug_inst_id: 44<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: [ 44] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: [ 44] JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit: 123<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: [ 44] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: [ 44] JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit: 509<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">0: fi_getinfo(): common/shared.c:454, ret=-61 (No data available)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] GNII_DebugInit: GNII_debug_level: 10 GNII_subsys_debug: 0 GNII_debug_mask: 0x0 GNII_debug_inst_id: 36<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNI_GetJobResInfo: job resource: FMA (6) used: 0 limit: 123<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNI_GetJobResInfo: job resource: CQ (5) used: 0 limit: 509<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNII_GetKernelVersion: kgni version major = 0x0 minor 0x45 code 0xb9 built with major = 0x0 minor = 0x45 code 0x4e24<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] FMA: GNI_CdmAttach: FMA window size: 32768<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] FMA: GNI_CdmAttach: NOPRIV_ERR masked<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] JOB: GNI_CdmAttach: ptag = 36 inst_id = 13864961 fma_window = 0x0000000000000000 fma_ctrl = 0x0000000000000000<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted entries: 1395 alloc_count: 1396<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 1, mode = 2, rd_index_ptr = 0x2aaaaaad7ba0, queue = 0x2aaaaaad5000, intr_mask = (nil)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 1361 reqs: 1361 adjusted entries: 1395 alloc_count: 1396<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 394, mode = 20, rd_index_ptr = 0x2aaaaaadc000, queue = 0x2aaaaaad8000, intr_mask = (nil)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] FLBTE: GNII_FlbteInit: FLBTE: tx_counter 0x2aaaaaace008, chan 2, max_len -1, total 511<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted entries: 2559 alloc_count: 2560<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 395, mode = 4, rd_index_ptr = 0x2aaaaaae6000, queue = 0x2aaaaaade000, intr_mask = (nil)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 2048 reqs: 2048 adjusted entries: 2559 alloc_count: 2560<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 396, mode = 5, rd_index_ptr = 0x2aaaaaaef000, queue = 0x2aaaaaae7000, intr_mask = 0x2aaaaaacf000<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted entries: 16895 alloc_count: 16896<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE) returned error - Invalid argument<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed trying without PHYS_MEM<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] MR: GNI_MemRegister: Mem reg of 135168 length at addr 0x2aaaaaaf0000<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 397, mode = 0, rd_index_ptr = 0x2aaaaab11000, queue = 0x2aaaaaaf0000, intr_mask = (nil)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: entry_count: 16384 reqs: 16384 adjusted entries: 16895 alloc_count: 16896<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE) returned error - Invalid argument<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: GNI_CqCreate: GNI_IOC_CQ_CREATE with PHYS_MEM failed trying without PHYS_MEM<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] MR: GNI_MemRegister: Mem reg of 135168 length at addr 0x2aaaaab23000<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: ioctl(GNI_IOC_CQ_CREATE)<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] CQ: cq_create: #1 cq created, kern_cq_descr = 398, mode = 1, rd_index_ptr = 0x2aaaaab12000, queue = 0x2aaaaab23000, intr_mask = 0x2aaaaaacf004<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">1: [ 36] MR: GNI_MemRegister: Mem reg of 136314880 length at addr 0x2aaaae400000<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">srun: error: nid00722: task 0: Exited with exit code 61<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">srun: Terminating job step 789872.11<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">srun: Job step aborted: Waiting up to 32 seconds for job step to finish.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">srun: error: nid00724: task 1: Killed<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">daint102:/scratch/snx3000/biddisco/build$<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
Libfabric-users mailing list<br>
<a href="mailto:Libfabric-users@lists.openfabrics.org">Libfabric-users@lists.openfabrics.org</a><br>
<a href="http://lists.openfabrics.org/mailman/listinfo/libfabric-users" target="_blank">http://lists.openfabrics.org/mailman/listinfo/libfabric-users</a><o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>