[libfabric-users] assertion failure, perhaps in registration cache, in 1.10.1 with verbs; ofi_rxm

Hefty, Sean sean.hefty at intel.com
Thu Jul 30 14:27:34 PDT 2020


I would open an github issue for this.

You can also try using the other cache monitor (MEMHOOKS). That will also capture non-malloc based allocations (e.g. mmap), though you would need to use a custom build off master to pickup coverage for some allocations (sbrk I think).
 
> I'm using the verbs;ofi_rxm provider with libfabric 1.10.1, on an IB-based Cray CS
> system.  I saw this in fi_mr(3):
> 
> 	As a general rule, if hardware requires the FI_MR_LOCAL mode bit described above,
> but this is not supported by the application, a memory registration cache may be in
> use.
> 
> 
> 
> I thought to myself, "Let's try it!"  I set FI_MR_CACHE_MONITOR=userfaultfd, because my
> application doesn't necessarily allocate all its memory through malloc() etc.  I
> removed FI_MR_LOCAL from my hints, while retaining (FI_MR_VIRT_ADDR | FI_MR_PROV_KEY |
> FI_MR_ALLOCATED).   My only other FI_* env var was FI_LOG_LEVEL=Warn.  I verified that
> I still got the verbs;ofi_rxm provider, and that FI_MR_LOCAL was clear in the returned
> info.  My 2-node test case ran properly, but then failed with the following assertion
> on both nodes, in the call stack for fi_close(&ofi_domain->fid) (where ofi_domain is
> the result of the fi_domain() call):
> 
> 	a.out: .../prov/util/src/util_buf.c:220: ofi_bufpool_destroy: Assertion `(pool-
> >attr.flags & OFI_BUFPOOL_NO_TRACK) || (buf_region->use_cnt == 0)' failed.
> 
> 
> 
> 
> In the resulting core file, I find that it's the second clause (buf_region->use_cnt ==
> 0) of the assertion that's false.  That use_cnt is 1 (one).  No output seemed to result
> from my having set FI_LOG_LEVEL=Warn.
> 
> 
> What's going on?  Do I need to do some other setup to use the registration cache?  Have
> I failed to fi_close() something?  (I looked and nothing jumped out at me, plus this
> exact binary runs fine if I include FI_MR_LOCAL in the hints and don't change anything
> else.)
> 
> 
> thanks,
> greg



More information about the Libfabric-users mailing list