[libfabric-users] Error allocating domain

Howard Pritchard hppritcha at gmail.com
Tue Jun 23 15:33:55 PDT 2020


HI John,

The gni provider only works for processes within a Cray/HPE thingy called a
PAGG, i.e. has to be in the process tree descending from a slurmstepd.
That's the reason for the cryptic key in use message.
Could you rerun using either srun or aprun (depending on where you're
working) with the
export UGNI_DEBUG=10



Am Di., 23. Juni 2020 um 13:22 Uhr schrieb Carns, Philip H. <
carns at mcs.anl.gov>:

> Hi John,
>
> You are quickly outpacing what little bit of knowledge I have here, but in
> our experience you do have to set up either protection domains or
> credentials to allow GNI RDMA between two process if they are launched
> manually.  aprun and srun do this step automatically so it's not something
> you usually have to think about for communication between the processes of
> a single MPI job.
>
> I have an example of how to do this with static protection domains and
> aprun, but this might not be what you need for your system:
>
>
> https://xgitlab.cels.anl.gov/sds/sds-tests/blob/master/perf-regression/theta/separate-ssg.qsub
>
> A more recent variant of this would be Cray's DRC library.  I do not have
> an example for DRC, though.  I think to use DRC you need to make some
> explicit run time calls outside of libfabric to set it up, while the older
> static protection domain system was mostly configured on the command line
> and then passed to the executable environment via aprun arguments.
>
> thanks,
> -Phil
> ------------------------------
> *From:* Biddiscombe, John A. <biddisco at cscs.ch>
> *Sent:* Tuesday, June 23, 2020 1:08 AM
> *To:* Carns, Philip H. <carns at mcs.anl.gov>; Howard Pritchard <
> hppritcha at gmail.com>
> *Cc:* libfabric-users at lists.openfabrics.org <
> libfabric-users at lists.openfabrics.org>
> *Subject:* Re: [libfabric-users] Error allocating domain
>
>
> I tried rebuilding libfabric using kdreg=no and unloading cray-mpich module
>
> ./configure --disable-verbs --disable-sockets --disable-usnic
> --disable-udp --disable-rxm --disable-rxd --disable-shm --disable-mrail
> --disable-tcp --disable-perf --disable-rstream --enable-gni
> --prefix=/apps/daint/UES/biddisco/gcc/8.3.0/libfabric --no-recursion
> --enable-debug --with-kdreg=n
>
>
> the binaries look sensible
>
>
> nid00023:/scratch/snx3000/biddisco/libfabric ((tags/v1.10.1^0))$ ldd
> /apps/daint/UES/biddisco/gcc/8.3.0/libfabric/bin/fi_pingpong
>         linux-vdso.so.1 (0x00002aaaaaad3000)
>
> /apps/daint/UES/xalt/xalt2/software/xalt/2.7.24/lib64/libxalt_init.so
> (0x00002aaaaacd3000)
>         libfabric.so.1 =>
> /apps/daint/UES/biddisco/gcc/8.3.0/libfabric/lib/libfabric.so.1
> (0x00002aaaaaee8000)
>         libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0
> (0x00002aaaab259000)
>         libudreg.so.0 => /opt/cray/udreg/default/lib64/libudreg.so.0
> (0x00002aaaab45c000)
>         libalpsutil.so.0 => /opt/cray/alps/default/lib64/libalpsutil.so.0
> (0x00002aaaab666000)
>         libalpslli.so.0 => /opt/cray/alps/default/lib64/libalpslli.so.0
> (0x00002aaaab869000)
>         libugni.so.0 => /opt/cray/ugni/default/lib64/libugni.so.0
> (0x00002aaaaba6f000)
>         libatomic.so.1 => /opt/gcc/8.3.0/snos/lib64/libatomic.so.1
> (0x00002aaaabcf3000)
>         librt.so.1 => /lib64/librt.so.1 (0x00002aaaabefb000)
>         libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaac103000)
>         libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaac321000)
>         libc.so.6 => /lib64/libc.so.6 (0x00002aaaac525000)
>         /lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)
>         libxmlrpc-epi.so.0 => /usr/lib64/libxmlrpc-epi.so.0
> (0x00002aaaac8df000)
>         libexpat.so.1 => /usr/lib64/libexpat.so.1 (0x00002aaaacaf2000)
>         libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaacd24000)
>         libelf.so.1 => /usr/lib64/libelf.so.1 (0x00002aaaacf27000)
>         libodbc.so.2 => /usr/lib64/libodbc.so.2 (0x00002aaaad13f000)
>         libwlm_detect.so.0 =>
> /opt/cray/wlm_detect/default/lib64/libwlm_detect.so.0 (0x00002aaaad3af000)
>         libz.so.1 => /lib64/libz.so.1 (0x00002aaaad5b2000)
>         libltdl.so.7 => /usr/lib64/libltdl.so.7 (0x00002aaaad7c9000)
>
> but running pingpond on two compute nodes gives the same memory
> registration error. I don't understand what has changed on our system and
> why what used to work, doesn't any more. Is it ok to launch jobs by hand,
> or do they have to be part of an srun script - here I am manually sshing
> into two compute nodes and executing
>
> pingpong from one and pingpong addr from the other. I'm suspecting some
> strange permission error because of the message
>
> libfabric:69550:gni:mr:__mr_cache_search_inuse():1205<debug> [69550:1]
> could not find key in inuse, key=2aaaadbd5000:c01000
>
>
> If anyone has any idea what might be wrong, please let me know. thanks.
>
>
> nid00023:/scratch/snx3000/biddisco/libfabric ((tags/v1.10.1^0))$
> /apps/daint/UES/biddisco/gcc/8.3.0/libfabric/bin/fi_pingpong
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> perf_cntr
> libfabric:69550:core:core:fi_param_get_():280<info> variable
> perf_cntr=<not set>
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var hook
> libfabric:69550:core:core:fi_param_get_():280<info> variable hook=<not set>
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> mr_cache_max_size
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> mr_cache_max_count
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> mr_cache_monitor
> libfabric:69550:core:core:fi_param_get_():280<info> variable
> mr_cache_max_size=<not set>
> libfabric:69550:core:core:fi_param_get_():280<info> variable
> mr_cache_max_count=<not set>
> libfabric:69550:core:core:fi_param_get_():280<info> variable
> mr_cache_monitor=<not set>
> libfabric:69550:core:mr:ofi_default_cache_size():56<info> default cache
> size=468306659
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> provider
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> fork_unsafe
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> universe_size
> libfabric:69550:core:core:fi_param_get_():280<info> variable provider=<not
> set>
> libfabric:69550:core:core:fi_param_define_():231<debug> registered var
> provider_path
> libfabric:69550:core:core:fi_param_get_():280<info> variable
> provider_path=<not set>
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:gni:fabric:__gnix_ccm_init():171<debug> [69550:1] Reading
> job info file /tmp/ccm_alps_info
> libfabric:69550:gni:fabric:__gnix_alps_init():284<warn> [69550:1] lli get
> response failed, alps_status=4(No such file or directory)
> libfabric:69550:gni:fabric:_gnix_nics_per_rank():672<warn> [69550:1]
> __gnix_app_init() failed, ret=-5(No such file or directory)
> libfabric:69550:gni:fabric:_gnix_nic_init():1414<warn> [69550:1]
> _gnix_nics_per_rank failed: -5
> libfabric:69550:core:core:ofi_register_provider():402<info> registering
> provider: gni (1.1)
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():396<debug> no provider
> structure or name
> libfabric:69550:core:core:ofi_register_provider():402<info> registering
> provider: ofi_hook_debug (110.10)
> libfabric:69550:core:core:ofi_register_provider():402<info> registering
> provider: ofi_hook_noop (110.10)
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():457<trace> [69550:1]
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():457<trace> [69550:1]
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():507<debug> [69550:1] Passed
> EP attributes check
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():522<debug> [69550:1] Passed
> mode check
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():532<debug> [69550:1] Passed
> caps check gnix_info->caps = 0x0f1c000000313f1e
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():547<debug> [69550:1] Passed
> TX attributes check
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():565<debug> [69550:1] Passed
> fabric name check
> libfabric:69550:gni:fabric:__gnix_getinfo_resolve_node():417<info>
> [69550:1] node: (null) service: (null)
> libfabric:69550:gni:fabric:__gnix_getinfo_resolve_node():422<info>
> [69550:1] src_pe: 0x17 src_port: 0x0
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():658<debug> [69550:1] Passed
> the domain attributes check
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():677<debug> [69550:1]
> Returning EP type: FI_EP_DGRAM
> libfabric:69550:gni:fabric:_gnix_ep_getinfo():457<trace> [69550:1]
> libfabric:69550:core:core:fi_getinfo_():967<debug> fi_getinfo: provider
> gni returned success
> libfabric:69550:gni:core:_gnix_ref_init():254<debug> [69550:1] 0x6111b8
> refs 1
> libfabric:69550:core:core:fi_fabric_():1154<info> Opened fabric: gni
> libfabric:69550:gni:eq:gnix_eq_open():380<trace> [69550:1]
> libfabric:69550:gni:eq:gnix_verify_eq_attr():103<trace> [69550:1]
> libfabric:69550:gni:core:_gnix_ref_init():254<debug> [69550:1] 0x6165c8
> refs 1
> libfabric:69550:gni:core:gnix_eq_open():398<debug> [69550:1] 0x6111b8 refs
> 2
> libfabric:69550:gni:eq:gnix_eq_set_wait():76<trace> [69550:1]
> libfabric:69550:gni:eq:gnix_wait_open():536<trace> [69550:1]
> libfabric:69550:gni:eq:gnix_verify_wait_attr():367<trace> [69550:1]
> libfabric:69550:gni:eq:gnix_init_wait_obj():387<trace> [69550:1]
> libfabric:69550:gni:core:gnix_wait_open():564<debug> [69550:1] 0x6111b8
> refs 3
> libfabric:69550:gni:ep_ctrl:__gnix_wait_start_progress():175<trace>
> [69550:1]
> libfabric:69550:gni:ep_ctrl:__gnix_wait_start_progress():179<trace>
> [69550:1]
> libfabric:69550:gni:fabric:gnix_write_proc_job():528<warn> [69550:1]
> write(disable_affinity_apply) failed, errno=Invalid argument
> libfabric:69550:gni:eq:__gnix_wait_start_progress():185<warn> [69550:1]
> _gnix_job_disable call returned -22
> libfabric:69550:gni:ep_ctrl:__gnix_wait_nic_prog_thread_fn():72<trace>
> [69550:2]
> libfabric:69550:gni:domain:gnix_domain_open():579<trace> [69550:1]
> libfabric:69550:gni:fabric:gnix_domain_open():591<info> [69550:1] failed
> to find authorization key, creating new authorization key
> libfabric:69550:gni:fabric:__gnix_ccm_init():171<debug> [69550:1] Reading
> job info file /tmp/ccm_alps_info
> libfabric:69550:gni:fabric:__gnix_alps_init():284<warn> [69550:1] lli get
> response failed, alps_status=4(No such file or directory)
> libfabric:69550:gni:fabric:gnixu_get_rdma_credentials():437<warn>
> [69550:1] __gnix_app_init() failed, ret=-5(No such file or directory)
> libfabric:69550:gni:domain:_gnix_auth_key_enable():347<info> [69550:1]
> pkey=00002aaa ptag=171 key_partition_size=0 key_offset=0 enabled
> libfabric:69550:gni:domain:gnix_domain_open():597<info> [69550:1]
> authorization key=0x619870 ptag 171 cookie 0x2aaa
> libfabric:69550:gni:core:gnix_domain_open():652<debug> [69550:1] 0x6111b8
> refs 4
> libfabric:69550:gni:core:_gnix_ref_init():254<debug> [69550:1] 0x6199b0
> refs 1
> libfabric:69550:gni:mr:_gnix_auth_key_enable():354<debug> [69550:1]
> authorization key already enabled, auth_key=0x619870
> libfabric:69550:gni:mr:_gnix_mr_reg():222<trace> [69550:1]
> libfabric:69550:gni:mr:_gnix_mr_reg():224<info> [69550:1] reg:
> buf=0x2aaaadbd5000 len=12587008
> libfabric:69550:gni:mr:_gnix_mr_cache_init():998<trace> [69550:1]
> libfabric:69550:gni:mr:_gnix_mr_cache_init():998<trace> [69550:1]
> libfabric:69550:gni:mr:_gnix_mr_cache_register():1541<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_search_inuse():1205<debug> [69550:1]
> could not find key in inuse, key=2aaaadbd5000:c01000
> libfabric:69550:gni:mr:__gnix_register_region():692<debug> [69550:1] addr
> 0x2aaaadbd5000 len 12587008 flags 0x0
> libfabric:69550:gni:ep_ctrl:gnix_nic_alloc():954<trace> [69550:1]
> libfabric:69550:gni:ep_ctrl:gnix_nic_alloc():1059<warn> [69550:1]
> GNI_CdmAttach returned GNI_RC_INVALID_PARAM
> libfabric:69550:gni:fabric:_gnix_dump_gni_res():729<warn> [69550:1] Device
> Resources:
> dev res:       MDD, avail: 4089 res: 409 held: 0 total: 4095
> dev res:        CQ, avail: 2042 res: 10 held: 0 total: 2047
> dev res:       FMA, avail: 126 res: 4 held: 0 total: 127
> dev res:        CE, avail: 4 res: 0 held: 0 total: 4
> dev res:       DLA, avail: 16384 res: 1024 held: 0 total: 16384
> dev res:       TCR, avail: 64984 res: 0 held: 0 total: 16
> dev res:       DVA, avail: 4398046511104 res: 1099511627776 held: 0 total:
> 4398046511104
> dev res:      VMDH, avail: 4 res: 0 held: 0 total: 4
> libfabric:69550:gni:fabric:_gnix_dump_gni_res():745<warn> [69550:1] Job
> Resources:
> libfabric:69550:gni:mr:__gnix_generic_register():609<info> [69550:1] could
> not allocate nic to do mr_reg, ret=-22
> libfabric:69550:gni:mr:__mr_cache_create_registration():1465<info>
> [69550:1] failed to register memory with callback
> fi_mr_reg(): util/pingpong.c:1329, ret=-12 (Cannot allocate memory)
> libfabric:69550:gni:eq:gnix_eq_close():452<trace> [69550:1]
> libfabric:69550:gni:core:gnix_eq_close():459<debug> [69550:1] 0x6165c8
> refs 0
> libfabric:69550:gni:core:__eq_destruct():243<debug> [69550:1] 0x6111b8
> refs 3
> libfabric:69550:gni:eq:gnix_wait_close():505<trace> [69550:1]
> libfabric:69550:gni:core:gnix_wait_close():520<debug> [69550:1] 0x6111b8
> refs 2
> libfabric:69550:gni:ep_ctrl:__gnix_wait_stop_progress():201<trace>
> [69550:1]
> libfabric:69550:gni:domain:gnix_domain_close():218<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1109<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1111<debug> [69550:1] starting
> flush on memory registration cache
> libfabric:69550:gni:mr:__mr_cache_flush():1109<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1111<debug> [69550:1] starting
> flush on memory registration cache
> libfabric:69550:gni:core:gnix_domain_close():265<debug> [69550:1] 0x6199b0
> refs 0
> libfabric:69550:gni:domain:__domain_destruct():77<trace> [69550:1]
> libfabric:69550:gni:mr:_gnix_mr_cache_destroy():1071<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1109<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1111<debug> [69550:1] starting
> flush on memory registration cache
> libfabric:69550:gni:mr:_gnix_mr_cache_destroy():1071<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1109<trace> [69550:1]
> libfabric:69550:gni:mr:__mr_cache_flush():1111<debug> [69550:1] starting
> flush on memory registration cache
> libfabric:69550:gni:core:__domain_destruct():103<debug> [69550:1] 0x6111b8
> refs 1
> libfabric:69550:gni:domain:gnix_domain_close():274<info> [69550:1]
> gnix_domain_close invoked returning 0
> libfabric:69550:gni:core:gnix_fabric_close():194<debug> [69550:1] 0x6111b8
> refs 0
>
> JB
>
> ------------------------------
> *From:* Libfabric-users <libfabric-users-bounces at lists.openfabrics.org>
> on behalf of Biddiscombe, John A. <biddisco at cscs.ch>
> *Sent:* 18 June 2020 00:00:30
> *To:* Carns, Philip H.; Howard Pritchard
> *Cc:* libfabric-users at lists.openfabrics.org
> *Subject:* Re: [libfabric-users] Error allocating domain
>
>
> Phil - thanks for this info.  I will experiment with it. I'm actually away
> for a few days from tomorrow, so it'll be next week before I get a chance,
> but I'll report back if I have success (or more problems).
>
>
> JB
> ------------------------------
> *From:* Carns, Philip H. <carns at mcs.anl.gov>
> *Sent:* 17 June 2020 19:52:42
> *To:* Biddiscombe, John A.; Howard Pritchard
> *Cc:* libfabric-users at lists.openfabrics.org
> *Subject:* Re: [libfabric-users] Error allocating domain
>
> Hi John,
>
> I know your question is aimed at Howard, but I can offer another data
> point and an example of a software stack working around this.  I've never
> gotten kdreg to work in executables that are also using Cray's MPI; they
> conflict.  If you want to use udreg as an alternative, then you'll need to
> do two things:
>
> a) disable kdreg support in libfabric at build time (as in this spack
> package here:
> https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/libfabric/package.py#L94
> )
>
> b) explicitly enable and configure udreg outside of libfabric (as in the
> Mercury libfabric plugin here:
> https://github.com/mercury-hpc/mercury/blob/master/src/na/na_ofi.c#L1778)
>
>
> This configuration is stable for us and works fine whether Cray MPI is
> present or not.  I'll defer to Howard about the technical implications,
> though 🙂
>
> thanks,
> -Phil
> ------------------------------
> *From:* Libfabric-users <libfabric-users-bounces at lists.openfabrics.org>
> on behalf of Biddiscombe, John A. <biddisco at cscs.ch>
> *Sent:* Wednesday, June 17, 2020 1:32 PM
> *To:* Howard Pritchard <hppritcha at gmail.com>
> *Cc:* libfabric-users at lists.openfabrics.org <
> libfabric-users at lists.openfabrics.org>
> *Subject:* Re: [libfabric-users] Error allocating domain
>
>
> Howard
>
>
> From the phrasing "You are hitting a limitation with the ancient
> kdreg device driver.  It may be best to not use it for your libfabric app."
> is there anything I can do about it. I can see that there is a udreg
> directory in /opt/cray - is there anything I can replace the kdreg stuff
> with?
>
>
> Thanks
>
>
> JB
>
>
> ------------------------------
> *From:* Libfabric-users <libfabric-users-bounces at lists.openfabrics.org>
> on behalf of Biddiscombe, John A. <biddisco at cscs.ch>
> *Sent:* 17 June 2020 17:26:29
> *To:* Howard Pritchard
> *Cc:* libfabric-users at lists.openfabrics.org
> *Subject:* Re: [libfabric-users] Error allocating domain
>
>
> my config line has always been this (apart from the debug). It has worked
> for several years until a recent system maintenance.change or something of
> that kind. (Nobody here claims to have changed anything significant)
>
>
> ./configure --disable-verbs --disable-sockets --disable-usnic
> --disable-udp --disable-rxm --disable-rxd --disable-shm --disable-mrail
> --disable-tcp --disable-perf --disable-rstream --enable-gni
> --prefix=/apps/daint/UES/biddisco/gcc/8.3.0/libfabric
> CC=/opt/cray/pe/craype/default/bin/cc CFLAGS=-fPIC LDFLAGS=-ldl
> --no-recursion --enable-debug
>
>
> JB
> ------------------------------
> *From:* Howard Pritchard <hppritcha at gmail.com>
> *Sent:* 17 June 2020 17:20:21
> *To:* Biddiscombe, John A.
> *Cc:* libfabric-users at lists.openfabrics.org
> *Subject:* Re: [libfabric-users] Error allocating domain
>
> Hi John,
>
> You are hitting a limitation with the ancient kdreg device driver.  It may
> be best to not use it for your libfabric app.  What are the configure
> options you're using for building libfabric?
>
> Howard
>
>
> Am Di., 16. Juni 2020 um 10:34 Uhr schrieb Biddiscombe, John A. <
> biddisco at cscs.ch>:
>
> I've got this log when I dump out my own messages, and also enable
> debugging in libfabric - can anyone tell what's wrong from the message.
> Code that used to work seems to have stopped. I upgraded to libfabric
> 1.10.1 tag and rebuilt, but it didn't change.
>
> The only thing that springs to mind is that the application is also using
> MPI on the cray at the same time, so when this code is called, mpi_init
> would have already been called, and perhaps somehow the nic is inaccessible
> - hence the error. I'm sure it used to work - and if I use ranks = 1 - it
> runs - so perhaps mpi detects just one rank and does no initialization, but
> when I use N>1 ranks, it dies. Any suggestions welcome. Thanks
>
> JB
>
>
> <DEB> 0000056511 0x2aaaaab2dec0 cpu 000 nid00219(0)   CONTROL Allocating
> domain
> libfabric:69061:gni:core:_gnix_ref_init():254<debug> [69061:1] 0x8579d8
> refs 1
> libfabric:69061:core:core:fi_fabric_():1154<info> Opened fabric: gni
> libfabric:69061:gni:domain:gnix_domain_open():579<trace> [69061:1]
> libfabric:69061:gni:fabric:gnix_domain_open():591<info> [69061:1] failed
> to find authorization key, creating new authorization key
> libfabric:69061:gni:domain:_gnix_auth_key_enable():347<info> [69061:1]
> pkey=dd920000 ptag=14 key_partition_size=0 key_offset=0 enabled
> libfabric:69061:gni:domain:gnix_domain_open():597<info> [69061:1]
> authorization key=0x857a10 ptag 14 cookie 0xdd920000
> libfabric:69061:gni:mr:_gnix_notifier_open():88<warn> [69061:1] kdreg
> device open failed: Device or resource busy
> <ERR> 0000056576 0x2aaaaab2dec0 cpu 000 nid00219(0)   ERROR__ fi_domain :
> Device or resource busy
>
> _______________________________________________
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
> https://lists.openfabrics.org/mailman/listinfo/libfabric-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200623/0949a2e6/attachment-0001.htm>


More information about the Libfabric-users mailing list