[libfabric-users] CXI libraries present but can not be compiled

Raffenetti, Ken raffenet at anl.gov
Tue Oct 8 08:40:33 PDT 2024


I believe that means that your slingshot host software (i.e. libcxi) is not compatible with the version of the provider being built. Thomas’ branch reverts all the updates that require the upgraded SHS. When building his branch, you could try increasing the logging to see if there is any indication why the provider isn’t returned from fi_getinfo.

Ken

From: Marc Caubet Serrabou <marc.caubet at psi.ch>
Date: Tuesday, October 8, 2024 at 3:35 AM
To: Raffenetti, Ken <raffenet at anl.gov>
Cc: libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] CXI libraries present but can not be compiled
Hi Ken, Thanks a lot for your answer. I just tried to cherry-pick your commit into the v1. 22. 0 tag, but then the compilation crashes for a different reason: copying selected object files to avoid basename conflicts. . . CCLD util/fi_strerror CCLD
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi Ken,

Thanks a lot for your answer. I just tried to cherry-pick your commit into the v1.22.0 tag, but then the compilation crashes for a different reason:

copying selected object files to avoid basename conflicts...
  CCLD     util/fi_strerror
  CCLD     util/fi_info
  CCLD     util/fi_pingpong
  CCLD     prov/cxi/test/multinode/test_frmwk
  CCLD     prov/cxi/test/multinode/test_zbcoll
  CCLD     prov/cxi/test/multinode/test_coll
  CCLD     prov/cxi/test/multinode/test_barrier
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12746: util/fi_strerror] Error 1
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
make[1]: *** [Makefile:12740: util/fi_pingpong] Error 1
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12734: util/fi_info] Error 1
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12658: prov/cxi/test/multinode/test_zbcoll] Error 1
make[1]: *** [Makefile:12648: prov/cxi/test/multinode/test_frmwk] Error 1
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12638: prov/cxi/test/multinode/test_coll] Error 1
make[1]: *** [Makefile:12628: prov/cxi/test/multinode/test_barrier] Error 1
make[1]: Leaving directory '/var/tmp/caubet_m/libfabric-1.22.0/src'
make: *** [Makefile:6816: all] Error 2
libfabric/1.22.0: compilation failed!

Any ideas?

On the other hand, I also tried another proposed update from https://github.com/thomasgillis/libfabric/tree/dev-cxi<https://urldefense.us/v3/__https:/github.com/thomasgillis/libfabric/tree/dev-cxi__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFfPScbpX$>, and with that one I can compile correctly, but then something is wrong:

🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# fi_info -p cxi
fi_getinfo: -61 (No data available)

🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# ldd $(which fi_info) | grep cxi
        libcxi.so.1 => /usr/lib64/libcxi.so.1 (0x00007f122b977000)

🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# ldd $(which fi_info)
        linux-vdso.so.1 (0x00007ffcf52f6000)
        libfabric.so.1 => /opt/psi/Libraries/libfabric/1.22.0/lib64/libfabric.so.1 (0x00007f255102a000)
        libcxi.so.1 => /usr/lib64/libcxi.so.1 (0x00007f2551004000)
        libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007f2550c00000)
        libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007f2550f5a000)
        libjson-c.so.3 => /usr/lib64/libjson-c.so.3 (0x00007f2550800000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f2550ab4000)
        libuuid.so.1 => /usr/lib64/libuuid.so.1 (0x00007f2550f28000)
        libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f2550400000)
        libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00007f2550f1e000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f2550f14000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2550eee000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f2550ee9000)
        libxpmem.so.0 => /usr/lib64/libxpmem.so.0 (0x00007f2550ee6000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2550209000)
        libnghttp2.so.14 => /usr/lib64/libnghttp2.so.14 (0x00007f2550ebd000)
        libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007f254fe00000)
        libssh.so.4 => /usr/lib64/libssh.so.4 (0x00007f2550e4c000)
        libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00007f254fa00000)
        libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007f2550a15000)
        libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007f254f6c1000)
        libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007f25507ae000)
        libldap_r-2.4.so.2 => /usr/lib64/libldap_r-2.4.so.2 (0x00007f2550759000)
        liblber-2.4.so.2 => /usr/lib64/liblber-2.4.so.2 (0x00007f2550e3a000)
        libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00007f2550628000)
        libbrotlidec.so.1 => /usr/lib64/libbrotlidec.so.1 (0x00007f254f400000)
        libz.so.1 => /usr/lib64/libz.so.1 (0x00007f255060f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2551354000)
        libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007f254f000000)
        libjitterentropy.so.3 => /usr/lib64/libjitterentropy.so.3 (0x00007f2550e30000)
        libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007f255012f000)
        libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007f2550118000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f2550e2b000)
        libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007f2550109000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f25500f1000)
        libsasl2.so.3 => /usr/lib64/libsasl2.so.3 (0x00007f25500d3000)
        libbrotlicommon.so.1 => /usr/lib64/libbrotlicommon.so.1 (0x00007f254ec00000)
        libkeyutils.so.1 => /usr/lib64/libkeyutils.so.1 (0x00007f254e800000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f254e400000)
        libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f254e000000)

We are running

🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# rpm -qf /usr/lib64/libcxi.so.1
cray-libcxi-0.9-SSHOT2.1.3_20240529150829_3d1dc9246116.x86_64

And we're testing libfabric 1.22.0, I'm open to compiling a newer (or older) version if that makes it work, but I'm not sure how far other Cray systems have been able to get.

Thanks a lot,

Marc
On 07.10.24 17:59, Raffenetti, Ken wrote:
Hi Marc,

I believe those headers are not necessary for compiling the provider. I proposed removing the checks from configure in https://github.com/ofiwg/libfabric/pull/9793<https://urldefense.us/v3/__https:/github.com/ofiwg/libfabric/pull/9793__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFVSI6UMq$>. You could cherry-pick https://github.com/ofiwg/libfabric/pull/9793/commits/5793243aec20c4fee126aa3093ff07bb5889f154<https://urldefense.us/v3/__https:/github.com/ofiwg/libfabric/pull/9793/commits/5793243aec20c4fee126aa3093ff07bb5889f154__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFZJeP4Dv$> and try again.

Ken

From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org><mailto:libfabric-users-bounces at lists.openfabrics.org> on behalf of Marc Caubet Serrabou <marc.caubet at psi.ch><mailto:marc.caubet at psi.ch>
Date: Monday, October 7, 2024 at 6:03 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org><mailto:libfabric-users at lists.openfabrics.org>
Subject: [libfabric-users] CXI libraries present but can not be compiled
Hi, I already opened a ticked to ofiwg@ lists. openfabrics. org, but I also try here in the user list, in case that somebody found a similar issue and has an answer to it. I am trying to compile libfabrics 1. 22. 0 with CXI provider support. Despite
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Hi,

I already opened a ticked to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>, but I also try here in the user list, in case that somebody found a similar issue and has an answer to it.

I am trying to compile libfabrics 1.22.0 with CXI provider support. Despite the expected CXI provider header files are present, as well as the CXI library, I get the following errors:

configure: WARNING: The EFA provider requires rdma-core v31 or newer.
configure: efa provider: disabled
configure: *** Configuring cxi provider
checking cxi_prov_hw.h usability... no
checking cxi_prov_hw.h presence... yes
configure: WARNING: cxi_prov_hw.h: present but cannot be compiled
configure: WARNING: cxi_prov_hw.h:     check for missing prerequisite headers?
configure: WARNING: cxi_prov_hw.h: see the Autoconf documentation
configure: WARNING: cxi_prov_hw.h:     section "Present But Cannot Be Compiled"
configure: WARNING: cxi_prov_hw.h: proceeding with the compiler's result
configure: WARNING:     ## ------------------------------------------ ##
configure: WARNING:     ## Report this to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org> ##
configure: WARNING:     ## ------------------------------------------ ##
checking for cxi_prov_hw.h... no
checking uapi/misc/cxi.h usability... no
checking uapi/misc/cxi.h presence... yes
configure: WARNING: uapi/misc/cxi.h: present but cannot be compiled
configure: WARNING: uapi/misc/cxi.h:     check for missing prerequisite headers?
configure: WARNING: uapi/misc/cxi.h: see the Autoconf documentation
configure: WARNING: uapi/misc/cxi.h:     section "Present But Cannot Be Compiled"
configure: WARNING: uapi/misc/cxi.h: proceeding with the compiler's result
configure: WARNING:     ## ------------------------------------------ ##
configure: WARNING:     ## Report this to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org> ##
configure: WARNING:     ## ------------------------------------------ ##
checking for uapi/misc/cxi.h... no
checking libcxi/libcxi.h usability... yes
checking libcxi/libcxi.h presence... yes
checking for libcxi/libcxi.h... yes
configure: looking for library without search path
checking for cxil_open_device in -lcxi... yes
checking curl/curl.h usability... yes
checking curl/curl.h presence... yes
checking for curl/curl.h... yes
configure: looking for library without search path
checking for curl_global_init in -lcurl... yes
checking json-c/json.h usability... yes
checking json-c/json.h presence... yes
checking for json-c/json.h... yes
configure: looking for library without search path
checking for json_object_get_type in -ljson-c... yes
configure: cxi provider: disabled
configure: WARNING: cxi provider was requested, but cannot be compiled
configure: error: Cannot continue
libfabric/1.22.0: configure failed

The libraries are the following and come from Cray, and are in the standard directories (/usr for include files, /usr/lib64 for libraries)

🔥 [caubet_m at login001:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# rpm -qf /usr/include/uapi/misc/cxi.h /usr/include/cxi_prov_hw.h  /usr/lib64/libcxi.so
warning: Found NDB Packages.db database while attempting bdb backend: using ndb backend.
cray-cxi-driver-devel-0.9-61.9__g3000a93.SSHOT2.1.3.x86_64
cray-cassini-headers-user-1.0-SSHOT2.1.3_20240326210855_321db6bd57af.noarch
cray-libcxi-0.9-SSHOT2.1.3_20240529150829_3d1dc9246116.x86_64

The configure options are the simplest ones, which should enforce CXI only:

/var/tmp/caubet_m/libfabric-1.22.0/src/configure --prefix=/opt/psi/Libraries/libfabric/1.22.0/ --enable-cxi

What am I missing, and how shall I proceed? Is the compilation expecting a different set (or version) of CXI libraries?

Thanks a lot,

Marc

--

_________________________________________________________

Paul Scherrer Institut

High Performance Computing & Emerging Technologies

Marc Caubet Serrabou

Building/Room: OBBA/230

Forschungsstrasse, 111

5232 Villigen PSI

Switzerland



Telephone: +41 765 42 51 24 // +41 56 310 46 67

E-Mail: marc.caubet at psi.ch<mailto:marc.caubet at psi.ch>

--

_________________________________________________________

Paul Scherrer Institut

High Performance Computing & Emerging Technologies

Marc Caubet Serrabou

Building/Room: OBBA/230

Forschungsstrasse, 111

5232 Villigen PSI

Switzerland



Telephone: +41 765 42 51 24 // +41 56 310 46 67

E-Mail: marc.caubet at psi.ch<mailto:marc.caubet at psi.ch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20241008/22562f15/attachment-0001.htm>


More information about the Libfabric-users mailing list