[libfabric-users] Error allocating domain

Biddiscombe, John A. biddisco at cscs.ch
Tue Jun 16 09:33:53 PDT 2020


I've got this log when I dump out my own messages, and also enable debugging in libfabric - can anyone tell what's wrong from the message. Code that used to work seems to have stopped. I upgraded to libfabric 1.10.1 tag and rebuilt, but it didn't change.

The only thing that springs to mind is that the application is also using MPI on the cray at the same time, so when this code is called, mpi_init would have already been called, and perhaps somehow the nic is inaccessible - hence the error. I'm sure it used to work - and if I use ranks = 1 - it runs - so perhaps mpi detects just one rank and does no initialization, but when I use N>1 ranks, it dies. Any suggestions welcome. Thanks

JB


<DEB> 0000056511 0x2aaaaab2dec0 cpu 000 nid00219(0)   CONTROL Allocating domain
libfabric:69061:gni:core:_gnix_ref_init():254<debug> [69061:1] 0x8579d8 refs 1
libfabric:69061:core:core:fi_fabric_():1154<info> Opened fabric: gni
libfabric:69061:gni:domain:gnix_domain_open():579<trace> [69061:1]
libfabric:69061:gni:fabric:gnix_domain_open():591<info> [69061:1] failed to find authorization key, creating new authorization key
libfabric:69061:gni:domain:_gnix_auth_key_enable():347<info> [69061:1] pkey=dd920000 ptag=14 key_partition_size=0 key_offset=0 enabled
libfabric:69061:gni:domain:gnix_domain_open():597<info> [69061:1] authorization key=0x857a10 ptag 14 cookie 0xdd920000
libfabric:69061:gni:mr:_gnix_notifier_open():88<warn> [69061:1] kdreg device open failed: Device or resource busy
<ERR> 0000056576 0x2aaaaab2dec0 cpu 000 nid00219(0)   ERROR__ fi_domain : Device or resource busy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20200616/e37aa34a/attachment.htm>


More information about the Libfabric-users mailing list