[libfabric-users] assertion failure, perhaps in registration cache, in 1.10.1 with verbs; ofi_rxm
Hefty, Sean
sean.hefty at intel.com
Thu Jul 30 14:27:34 PDT 2020
I would open an github issue for this.
You can also try using the other cache monitor (MEMHOOKS). That will also capture non-malloc based allocations (e.g. mmap), though you would need to use a custom build off master to pickup coverage for some allocations (sbrk I think).
> I'm using the verbs;ofi_rxm provider with libfabric 1.10.1, on an IB-based Cray CS
> system. I saw this in fi_mr(3):
>
> As a general rule, if hardware requires the FI_MR_LOCAL mode bit described above,
> but this is not supported by the application, a memory registration cache may be in
> use.
>
>
>
> I thought to myself, "Let's try it!" I set FI_MR_CACHE_MONITOR=userfaultfd, because my
> application doesn't necessarily allocate all its memory through malloc() etc. I
> removed FI_MR_LOCAL from my hints, while retaining (FI_MR_VIRT_ADDR | FI_MR_PROV_KEY |
> FI_MR_ALLOCATED). My only other FI_* env var was FI_LOG_LEVEL=Warn. I verified that
> I still got the verbs;ofi_rxm provider, and that FI_MR_LOCAL was clear in the returned
> info. My 2-node test case ran properly, but then failed with the following assertion
> on both nodes, in the call stack for fi_close(&ofi_domain->fid) (where ofi_domain is
> the result of the fi_domain() call):
>
> a.out: .../prov/util/src/util_buf.c:220: ofi_bufpool_destroy: Assertion `(pool-
> >attr.flags & OFI_BUFPOOL_NO_TRACK) || (buf_region->use_cnt == 0)' failed.
>
>
>
>
> In the resulting core file, I find that it's the second clause (buf_region->use_cnt ==
> 0) of the assertion that's false. That use_cnt is 1 (one). No output seemed to result
> from my having set FI_LOG_LEVEL=Warn.
>
>
> What's going on? Do I need to do some other setup to use the registration cache? Have
> I failed to fi_close() something? (I looked and nothing jumped out at me, plus this
> exact binary runs fine if I include FI_MR_LOCAL in the hints and don't change anything
> else.)
>
>
> thanks,
> greg
More information about the Libfabric-users
mailing list