[libfabric-users] CXI libraries present but can not be compiled
Shehata, Amir
shehataa at ornl.gov
Tue Oct 8 08:43:56 PDT 2024
I had a similar issue. Here is the patch I created to get around this problem:
diff --git a/prov/cxi/include/cxip.h b/prov/cxi/include/cxip.h
index 53e20f720..71a195447 100644
--- a/prov/cxi/include/cxip.h
+++ b/prov/cxi/include/cxip.h
@@ -3020,7 +3020,10 @@ int cxip_cmdq_emit_c_state(struct cxip_cmdq *cmdq,
static inline bool cxip_cmdq_empty(struct cxip_cmdq *cmdq)
{
- return cxi_cq_empty(cmdq->dev_cmdq);
+ uint64_t wp = cmdq->dev_cmdq->wp32 / 2;
+
+ return wp == cmdq->dev_cmdq->status->rd_ptr;
+ //return cxi_cq_empty(cmdq->dev_cmdq);
}
static inline bool cxip_cmdq_match(struct cxip_cmdq *cmdq, uint16_t vni,
diff --git a/prov/cxi/src/cxip_coll.c b/prov/cxi/src/cxip_coll.c
index 40ef8a60f..37dddbca6 100644
--- a/prov/cxi/src/cxip_coll.c
+++ b/prov/cxi/src/cxip_coll.c
@@ -421,7 +421,10 @@ static inline int flt_op_to_opcode(int op)
{
if (op != FI_SUM)
return _flt_op_to_opcode[op];
-
+ return (_MM_GET_FLUSH_ZERO_MODE()) ?
+ COLL_OPCODE_FLT_SUM_FTZ_RND0 :
+ COLL_OPCODE_FLT_SUM_NOFTZ_RND0;
+/*
switch (fegetround()) {
case FE_TONEAREST:
return (_MM_GET_FLUSH_ZERO_MODE()) ?
@@ -440,6 +443,7 @@ static inline int flt_op_to_opcode(int op)
COLL_OPCODE_FLT_SUM_FTZ_RND3 :
COLL_OPCODE_FLT_SUM_NOFTZ_RND3;
}
+*/
return -FI_EOPNOTSUPP;
}
________________________________
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org> on behalf of Raffenetti, Ken via Libfabric-users <libfabric-users at lists.openfabrics.org>
Sent: Tuesday, October 8, 2024 11:40 AM
To: Marc Caubet Serrabou <marc.caubet at psi.ch>
Cc: libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: [EXTERNAL] Re: [libfabric-users] CXI libraries present but can not be compiled
I believe that means that your slingshot host software (i.e. libcxi) is not compatible with the version of the provider being built. Thomas’ branch reverts all the updates that require the upgraded SHS. When building his branch, you could try increasing the logging to see if there is any indication why the provider isn’t returned from fi_getinfo.
Ken
From: Marc Caubet Serrabou <marc.caubet at psi.ch>
Date: Tuesday, October 8, 2024 at 3:35 AM
To: Raffenetti, Ken <raffenet at anl.gov>
Cc: libfabric-users at lists.openfabrics.org <libfabric-users at lists.openfabrics.org>
Subject: Re: [libfabric-users] CXI libraries present but can not be compiled
Hi Ken, Thanks a lot for your answer. I just tried to cherry-pick your commit into the v1. 22. 0 tag, but then the compilation crashes for a different reason: copying selected object files to avoid basename conflicts. . . CCLD util/fi_strerror CCLD
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi Ken,
Thanks a lot for your answer. I just tried to cherry-pick your commit into the v1.22.0 tag, but then the compilation crashes for a different reason:
copying selected object files to avoid basename conflicts...
CCLD util/fi_strerror
CCLD util/fi_info
CCLD util/fi_pingpong
CCLD prov/cxi/test/multinode/test_frmwk
CCLD prov/cxi/test/multinode/test_zbcoll
CCLD prov/cxi/test/multinode/test_coll
CCLD prov/cxi/test/multinode/test_barrier
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12746: util/fi_strerror] Error 1
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
/usr/bin/ld: src/.libs/libfabric.so: undefined reference to `cxi_cq_empty'
make[1]: *** [Makefile:12740: util/fi_pingpong] Error 1
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12734: util/fi_info] Error 1
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
/usr/bin/ld: src/.libs/libfabric.a(src_libfabric_la-cxip_dom.o): in function `cxip_domain_find_cmdq':
cxip_dom.c:(.text+0x436): undefined reference to `cxi_cq_empty'
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12658: prov/cxi/test/multinode/test_zbcoll] Error 1
make[1]: *** [Makefile:12648: prov/cxi/test/multinode/test_frmwk] Error 1
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:12638: prov/cxi/test/multinode/test_coll] Error 1
make[1]: *** [Makefile:12628: prov/cxi/test/multinode/test_barrier] Error 1
make[1]: Leaving directory '/var/tmp/caubet_m/libfabric-1.22.0/src'
make: *** [Makefile:6816: all] Error 2
libfabric/1.22.0: compilation failed!
Any ideas?
On the other hand, I also tried another proposed update from https://github.com/thomasgillis/libfabric/tree/dev-cxi<https://urldefense.us/v3/__https:/github.com/thomasgillis/libfabric/tree/dev-cxi__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFfPScbpX$>, and with that one I can compile correctly, but then something is wrong:
🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# fi_info -p cxi
fi_getinfo: -61 (No data available)
🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# ldd $(which fi_info) | grep cxi
libcxi.so.1 => /usr/lib64/libcxi.so.1 (0x00007f122b977000)
🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# ldd $(which fi_info)
linux-vdso.so.1 (0x00007ffcf52f6000)
libfabric.so.1 => /opt/psi/Libraries/libfabric/1.22.0/lib64/libfabric.so.1 (0x00007f255102a000)
libcxi.so.1 => /usr/lib64/libcxi.so.1 (0x00007f2551004000)
libnl-3.so.200 => /usr/lib64/libnl-3.so.200 (0x00007f2550c00000)
libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007f2550f5a000)
libjson-c.so.3 => /usr/lib64/libjson-c.so.3 (0x00007f2550800000)
libm.so.6 => /lib64/libm.so.6 (0x00007f2550ab4000)
libuuid.so.1 => /usr/lib64/libuuid.so.1 (0x00007f2550f28000)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00007f2550400000)
libatomic.so.1 => /usr/lib64/libatomic.so.1 (0x00007f2550f1e000)
librt.so.1 => /lib64/librt.so.1 (0x00007f2550f14000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2550eee000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f2550ee9000)
libxpmem.so.0 => /usr/lib64/libxpmem.so.0 (0x00007f2550ee6000)
libc.so.6 => /lib64/libc.so.6 (0x00007f2550209000)
libnghttp2.so.14 => /usr/lib64/libnghttp2.so.14 (0x00007f2550ebd000)
libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007f254fe00000)
libssh.so.4 => /usr/lib64/libssh.so.4 (0x00007f2550e4c000)
libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00007f254fa00000)
libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007f2550a15000)
libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007f254f6c1000)
libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007f25507ae000)
libldap_r-2.4.so.2 => /usr/lib64/libldap_r-2.4.so.2 (0x00007f2550759000)
liblber-2.4.so.2 => /usr/lib64/liblber-2.4.so.2 (0x00007f2550e3a000)
libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00007f2550628000)
libbrotlidec.so.1 => /usr/lib64/libbrotlidec.so.1 (0x00007f254f400000)
libz.so.1 => /usr/lib64/libz.so.1 (0x00007f255060f000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2551354000)
libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007f254f000000)
libjitterentropy.so.3 => /usr/lib64/libjitterentropy.so.3 (0x00007f2550e30000)
libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007f255012f000)
libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007f2550118000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f2550e2b000)
libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007f2550109000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f25500f1000)
libsasl2.so.3 => /usr/lib64/libsasl2.so.3 (0x00007f25500d3000)
libbrotlicommon.so.1 => /usr/lib64/libbrotlicommon.so.1 (0x00007f254ec00000)
libkeyutils.so.1 => /usr/lib64/libkeyutils.so.1 (0x00007f254e800000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f254e400000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00007f254e000000)
We are running
🔥 [caubet_m at login002:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# rpm -qf /usr/lib64/libcxi.so.1
cray-libcxi-0.9-SSHOT2.1.3_20240529150829_3d1dc9246116.x86_64
And we're testing libfabric 1.22.0, I'm open to compiling a newer (or older) version if that makes it work, but I'm not sure how far other Cray systems have been able to get.
Thanks a lot,
Marc
On 07.10.24 17:59, Raffenetti, Ken wrote:
Hi Marc,
I believe those headers are not necessary for compiling the provider. I proposed removing the checks from configure in https://github.com/ofiwg/libfabric/pull/9793<https://urldefense.us/v3/__https:/github.com/ofiwg/libfabric/pull/9793__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFVSI6UMq$>. You could cherry-pick https://github.com/ofiwg/libfabric/pull/9793/commits/5793243aec20c4fee126aa3093ff07bb5889f154<https://urldefense.us/v3/__https:/github.com/ofiwg/libfabric/pull/9793/commits/5793243aec20c4fee126aa3093ff07bb5889f154__;!!G_uCfscf7eWS!ddl49ZeHpPYwGSRVOo04A_tNEaCdZRFp20eq_PyWAAuqOz_KMJMzqnE81f5PQNDWo69E4EoJ0ZurFZJeP4Dv$> and try again.
Ken
From: Libfabric-users <libfabric-users-bounces at lists.openfabrics.org><mailto:libfabric-users-bounces at lists.openfabrics.org> on behalf of Marc Caubet Serrabou <marc.caubet at psi.ch><mailto:marc.caubet at psi.ch>
Date: Monday, October 7, 2024 at 6:03 AM
To: libfabric-users at lists.openfabrics.org<mailto:libfabric-users at lists.openfabrics.org> <libfabric-users at lists.openfabrics.org><mailto:libfabric-users at lists.openfabrics.org>
Subject: [libfabric-users] CXI libraries present but can not be compiled
Hi, I already opened a ticked to ofiwg@ lists. openfabrics. org, but I also try here in the user list, in case that somebody found a similar issue and has an answer to it. I am trying to compile libfabrics 1. 22. 0 with CXI provider support. Despite
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi,
I already opened a ticked to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org>, but I also try here in the user list, in case that somebody found a similar issue and has an answer to it.
I am trying to compile libfabrics 1.22.0 with CXI provider support. Despite the expected CXI provider header files are present, as well as the CXI library, I get the following errors:
configure: WARNING: The EFA provider requires rdma-core v31 or newer.
configure: efa provider: disabled
configure: *** Configuring cxi provider
checking cxi_prov_hw.h usability... no
checking cxi_prov_hw.h presence... yes
configure: WARNING: cxi_prov_hw.h: present but cannot be compiled
configure: WARNING: cxi_prov_hw.h: check for missing prerequisite headers?
configure: WARNING: cxi_prov_hw.h: see the Autoconf documentation
configure: WARNING: cxi_prov_hw.h: section "Present But Cannot Be Compiled"
configure: WARNING: cxi_prov_hw.h: proceeding with the compiler's result
configure: WARNING: ## ------------------------------------------ ##
configure: WARNING: ## Report this to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org> ##
configure: WARNING: ## ------------------------------------------ ##
checking for cxi_prov_hw.h... no
checking uapi/misc/cxi.h usability... no
checking uapi/misc/cxi.h presence... yes
configure: WARNING: uapi/misc/cxi.h: present but cannot be compiled
configure: WARNING: uapi/misc/cxi.h: check for missing prerequisite headers?
configure: WARNING: uapi/misc/cxi.h: see the Autoconf documentation
configure: WARNING: uapi/misc/cxi.h: section "Present But Cannot Be Compiled"
configure: WARNING: uapi/misc/cxi.h: proceeding with the compiler's result
configure: WARNING: ## ------------------------------------------ ##
configure: WARNING: ## Report this to ofiwg at lists.openfabrics.org<mailto:ofiwg at lists.openfabrics.org> ##
configure: WARNING: ## ------------------------------------------ ##
checking for uapi/misc/cxi.h... no
checking libcxi/libcxi.h usability... yes
checking libcxi/libcxi.h presence... yes
checking for libcxi/libcxi.h... yes
configure: looking for library without search path
checking for cxil_open_device in -lcxi... yes
checking curl/curl.h usability... yes
checking curl/curl.h presence... yes
checking for curl/curl.h... yes
configure: looking for library without search path
checking for curl_global_init in -lcurl... yes
checking json-c/json.h usability... yes
checking json-c/json.h presence... yes
checking for json-c/json.h... yes
configure: looking for library without search path
checking for json_object_get_type in -ljson-c... yes
configure: cxi provider: disabled
configure: WARNING: cxi provider was requested, but cannot be compiled
configure: error: Cannot continue
libfabric/1.22.0: configure failed
The libraries are the following and come from Cray, and are in the standard directories (/usr for include files, /usr/lib64 for libraries)
🔥 [caubet_m at login001:~/git/buildblocks/Libraries/libfabric(ofi_1.22.0)]# rpm -qf /usr/include/uapi/misc/cxi.h /usr/include/cxi_prov_hw.h /usr/lib64/libcxi.so
warning: Found NDB Packages.db database while attempting bdb backend: using ndb backend.
cray-cxi-driver-devel-0.9-61.9__g3000a93.SSHOT2.1.3.x86_64
cray-cassini-headers-user-1.0-SSHOT2.1.3_20240326210855_321db6bd57af.noarch
cray-libcxi-0.9-SSHOT2.1.3_20240529150829_3d1dc9246116.x86_64
The configure options are the simplest ones, which should enforce CXI only:
/var/tmp/caubet_m/libfabric-1.22.0/src/configure --prefix=/opt/psi/Libraries/libfabric/1.22.0/ --enable-cxi
What am I missing, and how shall I proceed? Is the compilation expecting a different set (or version) of CXI libraries?
Thanks a lot,
Marc
--
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OBBA/230
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland
Telephone: +41 765 42 51 24 // +41 56 310 46 67
E-Mail: marc.caubet at psi.ch<mailto:marc.caubet at psi.ch>
--
_________________________________________________________
Paul Scherrer Institut
High Performance Computing & Emerging Technologies
Marc Caubet Serrabou
Building/Room: OBBA/230
Forschungsstrasse, 111
5232 Villigen PSI
Switzerland
Telephone: +41 765 42 51 24 // +41 56 310 46 67
E-Mail: marc.caubet at psi.ch<mailto:marc.caubet at psi.ch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20241008/7133bfb0/attachment-0001.htm>
More information about the Libfabric-users
mailing list