<html><head></head><body><div class="ydpa62c17c6yahoo-style-wrap" style="font-family:lucida console, sans-serif;font-size:16px;"><div></div>
<div>PS: when I set the FI_LOG_LEVEL to debug, the program runs successfully but a lot of error messages are printed, below is an excerpt. Are these bad?</div><div><span></span><div><br></div><div><span>libfabric:verbs:core:ofi_check_ep_type():628<info> Unsupported endpoint type<br>libfabric:verbs:core:ofi_check_ep_type():629<info> Supported: FI_EP_DGRAM<br>libfabric:verbs:core:ofi_check_ep_type():629<info> Requested: FI_EP_MSG<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: No such device(19)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br><br></span><span>libfabric:ofi_rxm:ep_ctrl:rxm_eq_sread():575<warn> fi_eq_readerr: err: 111, prov_err: Unknown error -28 (-28)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:fabric:fi_ibv_create_ep():173<info> rdma_create_ep: Invalid argument(22)<br>libfabric:verbs:domain:fi_ibv_ep_close():169<info> EP 0x1d72e60 was closed<br>libfabric:ofi_rxm:ep_ctrl:rxm_eq_sread():575<warn> fi_eq_readerr: err: 111, prov_err: Unknown error -28 (-28)<br>libfabric:ofi_rxm:ep_ctrl:rxm_eq_sread():575<warn> fi_eq_readerr: err: 111, prov_err: Unknown error -28 (-28)<br>libfabric:ofi_rxm:ep_ctrl:rxm_eq_sread():575<warn> fi_eq_readerr: err: 111, prov_err: Unknown error -28 (-28)<br><br></span><div>Regards,</div><div>Mohammed<br></div></div><div><br></div><div><br></div></div><div><br></div>
</div><div id="yahoo_quoted_4494489678" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
Am Montag, 3. Dezember 2018, 13:44:36 MEZ hat Mohammed Shaheen via Libfabric-users <libfabric-users@lists.openfabrics.org> Folgendes geschrieben:
</div>
<div><br></div>
<div><br></div>
<div><div id="yiv9344534459"><div><div class="yiv9344534459ydp8c91ef2cyahoo-style-wrap" style="font-family:lucida console, sans-serif;font-size:16px;"><div></div>
<div>Hi Dmitry,</div><div><br clear="none"></div><div>the problem appeared only once. It did not appear again.</div><div><br clear="none"></div><div>Since you brought it up, do I have to set the FI_PROVIDER variable? I mean is it not enough to source the mpivars script?</div><div>if I source the mpivars script and set the debug level to 3, I get the right provider verbs;ofi_rxm as seen below.<br clear="none"></div><div><br clear="none"></div><div><br clear="none"></div><div><span>lce62:~ # mpirun -np 8 -perhost 4 --hostfile hosts ./test.e<br clear="none">[0] MPI startup(): libfabric version: 1.7.0a1-impi<br clear="none">[0] MPI startup(): libfabric provider: verbs;ofi_rxm<br clear="none">Hello world from process 7 of 8<br clear="none">Hello world from process 3 of 8<br clear="none">Hello world from process 5 of 8<br clear="none">Hello world from process 1 of 8<br clear="none">Hello world from process 4 of 8<br clear="none">Hello world from process 6 of 8<br clear="none">Hello world from process 2 of 8<br clear="none">[0] MPI startup(): Rank Pid Node name Pin cpu<br clear="none">[0] MPI startup(): 0 11242 lce63 {0,1,2,3,4,5,24,25,26,27,28,29}<br clear="none">[0] MPI startup(): 1 11243 lce63 {6,7,8,9,10,11,30,31,32,33,34,35}<br clear="none">[0] MPI startup(): 2 11244 lce63 {12,13,14,15,16,17,36,37,38,39,40,41}<br clear="none">[0] MPI startup(): 3 11245 lce63 {18,19,20,21,22,23,42,43,44,45,46,47}<br clear="none">[0] MPI startup(): 4 29238 lce62 {0,1,2,3,4,5,24,25,26,27,28,29}<br clear="none">[0] MPI startup(): 5 29239 lce62 {6,7,8,9,10,11,30,31,32,33,34,35}<br clear="none">[0] MPI startup(): 6 29240 lce62 {12,13,14,15,16,17,36,37,38,39,40,41}<br clear="none">[0] MPI startup(): 7 29241 lce62 {18,19,20,21,22,23,42,43,44,45,46,47}<br clear="none">Hello world from process 0 of 8<br clear="none"><br clear="none"></span><div><br clear="none"></div><div>Regards,</div><div>Mohammed Shaheen<br clear="none"></div><div><br clear="none"></div></div><div><br clear="none"></div>
</div><div class="yiv9344534459yahoo_quoted" id="yiv9344534459yahoo_quoted_4128958646">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
Am Montag, 3. Dezember 2018, 13:37:20 MEZ hat Gladkov, Dmitry <dmitry.gladkov@intel.com> Folgendes geschrieben:
</div>
<div><br clear="none"></div>
<div><br clear="none"></div>
<div class="yiv9344534459yqt7617112408" id="yiv9344534459yqt22869"><div><div id="yiv9344534459"><style>#yiv9344534459 --
filtered {panose-1:2 4 5 3 5 4 6 3 2 4;}
#yiv9344534459 filtered {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}
#yiv9344534459 filtered {panose-1:2 11 6 9 4 5 4 2 2 4;}
#yiv9344534459 filtered {panose-1:0 0 0 0 0 0 0 0 0 0;}
#yiv9344534459
p.yiv9344534459MsoNormal, #yiv9344534459 li.yiv9344534459MsoNormal, #yiv9344534459 div.yiv9344534459MsoNormal
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;font-family:New serif;}
#yiv9344534459 a:link, #yiv9344534459 span.yiv9344534459MsoHyperlink
{color:#0563C1;text-decoration:underline;}
#yiv9344534459 a:visited, #yiv9344534459 span.yiv9344534459MsoHyperlinkFollowed
{color:#954F72;text-decoration:underline;}
#yiv9344534459 span.yiv9344534459EmailStyle17
{font-family:sans-serif;color:#1F497D;}
#yiv9344534459 .yiv9344534459MsoChpDefault
{font-size:10.0pt;}
#yiv9344534459 filtered {margin:72.0pt 72.0pt 72.0pt 72.0pt;}
#yiv9344534459 div.yiv9344534459WordSection1
{}
#yiv9344534459 </style><div>
<div class="yiv9344534459WordSection1">
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US">Hi Mohammed,</span></p>
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US"> </span></p>
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US">Can we ask to submit a ticket in IDZ - https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology?</span></p>
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US">Did you use Verbs provider in the test? Please, set FI_PROVIDER=verbs to ensure it. Also if it possible, please send FI_LOG_LEVEL=debug
log, it can help for further investigations.</span></p>
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US">--</span></p>
<p class="yiv9344534459MsoNormal"><span style="font-size:11.0pt;" lang="EN-US">Dmitry</span></p>
<p class="yiv9344534459MsoNormal"><a rel="nofollow" shape="rect" name="_MailEndCompose"><span style="" lang="EN-US"> </span></a></p>
<div class="yiv9344534459yqt0174794633" id="yiv9344534459yqt23983"><div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm;">
<p class="yiv9344534459MsoNormal"><a rel="nofollow" shape="rect" name="_____replyseparator"></a><b><span style="font-size:11.0pt;" lang="EN-US">From:</span></b><span style="font-size:11.0pt;" lang="EN-US"> Mohammed Shaheen [mailto:m_shaheen1984@yahoo.com]
<br clear="none">
<b>Sent:</b> Thursday, November 29, 2018 2:34 PM<br clear="none">
<b>To:</b> libfabric-users@lists.openfabrics.org; ofiwg@lists.openfabrics.org; Ilango, Arun <arun.ilango@intel.com>; Gladkov, Dmitry <dmitry.gladkov@intel.com>; Hefty, Sean <sean.hefty@intel.com><br clear="none">
<b>Subject:</b> Re: [libfabric-users] intel mpi with libfabric</span></p>
</div>
</div>
<p class="yiv9344534459MsoNormal"><span lang="EN-US"> </span></p>
<div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US">Hi,</span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US"> </span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US">Now I am trying to use Intel MPI U1 with the libfabric that comes along with it. I get the following error</span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal" style="margin-bottom:12.0pt;"><span style="" lang="EN-US">lce62:~ # mpirun -np 8 -perhost 4 --hostfile hosts ./test.e<br clear="none">
Hello world from process 6 of 8<br clear="none">
Hello world from process 4 of 8<br clear="none">
Hello world from process 7 of 8<br clear="none">
Hello world from process 5 of 8<br clear="none">
Hello world from process 0 of 8<br clear="none">
Hello world from process 1 of 8<br clear="none">
Hello world from process 2 of 8<br clear="none">
Hello world from process 3 of 8<br clear="none">
Abort(809595151) on node 4 (rank 4 in comm 0): Fatal error in PMPI_Finalize: Other MPI error, error stack:<br clear="none">
PMPI_Finalize(356).............: MPI_Finalize failed<br clear="none">
PMPI_Finalize(266).............:<br clear="none">
MPID_Finalize(959).............:<br clear="none">
MPIDI_NM_mpi_init_hook(1299)...:<br clear="none">
MPIR_Reduce_intra_binomial(157):<br clear="none">
MPIC_Send(149).................:<br clear="none">
MPID_Send(256).................:<br clear="none">
MPIDI_OFI_send_normal(429).....:<br clear="none">
MPIDI_OFI_send_handler(733)....: OFI tagged send failed (ofi_impl.h:733:MPIDI_OFI_send_handler:Invalid argument)<br clear="none">
[cli_4]: readline failed</span></p>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US"> </span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US">However, when I set I_MPI_DEBUG to anything, the error disappears and it works successfully. Any thoughts?</span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US"> </span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US">Regards</span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US">Mohammed Shaheen</span></p>
</div>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="" lang="EN-US"> </span></p>
</div>
</div>
<div id="yiv9344534459yahoo_quoted_3815295816">
<div>
<div>
<p class="yiv9344534459MsoNormal"><span style="font-size:10.0pt;" lang="EN-US">Am Montag, 26. November 2018, 18:10:58 MEZ hat Hefty, Sean <</span><a rel="nofollow" shape="rect" ymailto="mailto:sean.hefty@intel.com" target="_blank" href="mailto:sean.hefty@intel.com"><span style="font-size:10.0pt;" lang="EN-US">sean.hefty@intel.com</span></a><span style="font-size:10.0pt;" lang="EN-US">>
Folgendes geschrieben: </span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="font-size:10.0pt;" lang="EN-US"> </span></p>
</div>
<div>
<p class="yiv9344534459MsoNormal"><span style="font-size:10.0pt;" lang="EN-US"> </span></p>
</div>
<div>
<div>
<p class="yiv9344534459MsoNormal"><span style="font-size:10.0pt;" lang="EN-US">> I have tried with the development version from the master branch. I<br clear="none">
> get the following errors while building the library (make)<br clear="none">
> prov/verbs/src/verbs_ep.c: In function<br clear="none">
> 'fi_ibv_msg_xrc_ep_atomic_write':<br clear="none">
> prov/verbs/src/verbs_ep.c:1770: error: unknown field 'qp_type'<br clear="none">
> specified in initializer<br clear="none">
<br clear="none">
These errors are coming from having an older verbs.h file. The qp_type field was added as part of XRC support, about 5 years ago. I think this maps to v13.<br clear="none">
<br clear="none">
> prov/verbs/src/verbs_ep.c:1770: error: unknown field 'qp_type'<br clear="none">
> specified in initializer<br clear="none">
<br clear="none">
This is the same error, which suggests that it is still picking up an old verbs.h file. Maybe Mellanox ships a different verbs.h file that what's upstream, but I doubt it would remove fields.</span></p>
<div id="yiv9344534459yqtfd50253">
<p class="yiv9344534459MsoNormal"><span style="font-size:10.0pt;" lang="EN-US"><br clear="none">
<br clear="none">
</span><span style="font-size:10.0pt;">- Sean</span></p>
</div>
</div>
</div>
</div>
</div></div>
</div>
<p><br clear="none">--------------------------------------------------------------------<br clear="none">Joint Stock Company Intel A/O<br clear="none">Registered legal address: Krylatsky Hills Business Park, <br clear="none">17 Krylatskaya Str., Bldg 4, Moscow 121614, <br clear="none">Russian Federation</p><p>This e-mail and any attachments may contain confidential material for<br clear="none">the sole use of the intended recipient(s). Any review or distribution<br clear="none">by others is strictly prohibited. If you are not the intended<br clear="none">recipient, please contact the sender and delete all copies.</p>
</div></div></div></div>
</div>
</div></div></div><div class="yqt7617112408" id="yqt49642">_______________________________________________<br clear="none">Libfabric-users mailing list<br clear="none"><a shape="rect" ymailto="mailto:Libfabric-users@lists.openfabrics.org" href="mailto:Libfabric-users@lists.openfabrics.org">Libfabric-users@lists.openfabrics.org</a><br clear="none"><a shape="rect" href="https://lists.openfabrics.org/mailman/listinfo/libfabric-users" target="_blank">https://lists.openfabrics.org/mailman/listinfo/libfabric-users</a><br clear="none"></div></div>
</div>
</div></body></html>