<html><head></head><body><div class="ydp24d799dfyahoo-style-wrap" style="font-family:lucida console, sans-serif;font-size:16px;"><div></div>
<div>Thanks Arun and Dmitry for your support.</div><div><br></div><div>Well, I am building my own libfabric, and I export the right variables and source intel mpi with <span>-ofi_internal=0. I figured out where the problem is:</span></div><div><span>1. If libfabric is built for all providers, i.e. run ./configure without including and exluding providers, it will build ibverbs among others; however, the mpi test program will hang during execution. <br></span></div><div><span>2. </span>If libfabric configured with only enabling ibverbs and setting all other providers, i.e. ./configure --enable-verbs=yes --enable-rxm=no --enable-rxd=no --enable-sockets=no --enable-tcp=no --enable-udp=no, mpi test program will run through</div><div><br></div><div>Another observation when I enable <span>debug, --enable-debug, I get the aforementioned message (here it is again):</span></div><div><span><span><span>prov/verbs/src/ep_rdm/verbs_rdm_cm.c:337:
fi_ibv_rdm_process_addr_resolved: Assertion `id->verbs ==
ep->domain->verbs' failed.</span></span></span><br></div><div>and the mpi test program runs through in case 2 above. I am not sure whether or not I should take this message seriously?</div><div><br></div><div>I did not see any difference in the test mpi program behaviour if I build ibverbs as a DSO (--enable-verbs=dl) or as the default which I suppose would be part of libfabric (--enable-verbs=yes) except in case of DSO, the FI_PROVIDER_PATH must be exported. However, worth mentioning as a bug (probably), when ibverbs (or any other provider I assume) is built as a DSO, the libfabric folder under which the provider DSOs are put has wrong permissions, which means if you build libfabric as a root and use default installation folders (/usr/local/lib), your mpi program would not run through if you launch it as some other user.</div><div><br></div><div><br></div><div>Regards,</div><div>Mohammed<br></div><div><br></div>
</div><div id="yahoo_quoted_3340555604" class="yahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
Am Mittwoch, 21. November 2018, 19:42:24 MEZ hat Ilango, Arun <arun.ilango@intel.com> Folgendes geschrieben:
</div>
<div><br></div>
<div><br></div>
<div><div dir="ltr">Mohammed,<br clear="none"><br clear="none">Just to add what Dmitry said, if you're using your own libfabric, please make sure it's the latest (i.e. v1.6.2). You can check the version by running fi_info --version.<br clear="none"><br clear="none">Other things to check:<br clear="none">1. Make sure you have librdmacm package installed.<br clear="none">2. Check if the IPoIB interface of the node has been configured with an IP address and is pingable from other nodes in the cluster.<br clear="none"><br clear="none">Thanks,<br clear="none">Arun.<br clear="none"><div class="yqt7997348047" id="yqtfd54283"><br clear="none">-----Original Message-----<br clear="none">From: Gladkov, Dmitry <br clear="none">Sent: Wednesday, November 21, 2018 10:31 AM<br clear="none">To: Hefty, Sean <<a shape="rect" ymailto="mailto:sean.hefty@intel.com" href="mailto:sean.hefty@intel.com">sean.hefty@intel.com</a>>; Mohammed Shaheen <<a shape="rect" ymailto="mailto:m_shaheen1984@yahoo.com" href="mailto:m_shaheen1984@yahoo.com">m_shaheen1984@yahoo.com</a>>; <a shape="rect" ymailto="mailto:libfabric-users@lists.openfabrics.org" href="mailto:libfabric-users@lists.openfabrics.org">libfabric-users@lists.openfabrics.org</a>; <a shape="rect" ymailto="mailto:ofiwg@lists.openfabrics.org" href="mailto:ofiwg@lists.openfabrics.org">ofiwg@lists.openfabrics.org</a><br clear="none">Cc: Ilango, Arun <<a shape="rect" ymailto="mailto:arun.ilango@intel.com" href="mailto:arun.ilango@intel.com">arun.ilango@intel.com</a>><br clear="none">Subject: RE: [libfabric-users] intel mpi with libfabric<br clear="none"><br clear="none">Hi Mohammed,<br clear="none"><br clear="none">Do you use your own version of libfabirc?<br clear="none"><br clear="none">IMPI 2019 U1 uses its internal libfabric by default.<br clear="none">If you use your libfabric, please, specify LD_LIBRABRY_PATH to your library and FI_PROVIDER_PATH to path to OFI DL providers (<ofi_install_dir>/lib/libfabric) if you use DL provider, or unset this variable (mpivars.sh sets it).<br clear="none"><br clear="none">--<br clear="none">Dmitry<br clear="none"><br clear="none">-----Original Message-----<br clear="none">From: Hefty, Sean<br clear="none">Sent: Wednesday, November 21, 2018 8:52 PM<br clear="none">To: Mohammed Shaheen <<a shape="rect" ymailto="mailto:m_shaheen1984@yahoo.com" href="mailto:m_shaheen1984@yahoo.com">m_shaheen1984@yahoo.com</a>>; <a shape="rect" ymailto="mailto:libfabric-users@lists.openfabrics.org" href="mailto:libfabric-users@lists.openfabrics.org">libfabric-users@lists.openfabrics.org</a>; <a shape="rect" ymailto="mailto:ofiwg@lists.openfabrics.org" href="mailto:ofiwg@lists.openfabrics.org">ofiwg@lists.openfabrics.org</a><br clear="none">Cc: Ilango, Arun <<a shape="rect" ymailto="mailto:arun.ilango@intel.com" href="mailto:arun.ilango@intel.com">arun.ilango@intel.com</a>>; Gladkov, Dmitry <<a shape="rect" ymailto="mailto:dmitry.gladkov@intel.com" href="mailto:dmitry.gladkov@intel.com">dmitry.gladkov@intel.com</a>><br clear="none">Subject: RE: [libfabric-users] intel mpi with libfabric<br clear="none"><br clear="none">Copying ofiwg and key developers for this issue.<br clear="none"><br clear="none">- Sean<br clear="none"><br clear="none">> I get the following error running a small mpi test program using intel <br clear="none">> mpi 2019 from intel parallel studio cluster edition update 1 (the<br clear="none">> newest) on Mellanox FDR Cluster:<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> test.e: prov/verbs/src/ep_rdm/verbs_rdm_cm.c:337:<br clear="none">> fi_ibv_rdm_process_addr_resolved: Assertion `id->verbs == ep->domain-<br clear="none">> >verbs' failed.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> The program hangs on this error message. I installed the newest <br clear="none">> release of libfabric and configured it with only ibverbs support. I <br clear="none">> used the inbox (sles 11 sp4 and sles 12 sp3) ibverbs and rdma <br clear="none">> libraries. I also tried with mellanox ofed to no avail.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Any ideas how to go about it?<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Regards,<br clear="none">> <br clear="none">> Mohammed<br clear="none"><br clear="none"></div></div></div>
</div>
</div></body></html>