[libfabric-users] what does -FI_EALREADY mean?
swelch at systemfabricworks.com
Mon Jun 29 14:37:48 PDT 2020
Hi Greg, Sean,
> On Jun 29, 2020, at 3:02 PM, Hefty, Sean <sean.hefty at intel.com> wrote:
>> In running a test case at moderately large scale (64 nodes, 128 tx endpoints per node)
>> on a Cray CS system with libfabric 1.10.1 and the verbs;ofi_rxm provider, we saw a -
>> FI_EALREADY ("Operation already in progress") return value from a fi_write() call. Can
>> anyone out there give me more information as to what that error code might indicate is
>> going wrong? The man pages don't really contain anything except that error text.
> Searching through the code, I only see FI_EALREADY in a few places, all of which should only be for internal error handling. For example, RXM uses this to detect if a connection is already in progress, but I don't see that the error code can be returned to the user. Similarly, verbs has a couple of assertions that FI_EALREADY isn't returned as an error when inserting items into rbtrees. A free build could return that value back to the user.
> It's possible this is coming from lower level code (e.g. verbs), but I'm skeptical of that.
> Can you run with a debug build to see if you're going through one of the assert paths? Do you know if you're using XRC for the underlying transport? The verbs FI_EALREADY asserts are in XRC code paths.
Like Sean said, the XRC asserts are detecting duplicate insertions into RB trees. The one in the file verbs_domain_xrc.c should never happen, since a find is done prior to the insert while holding the Verbs EQ lock. The second in verbs_eq.c is in response to an accept of a shared XRC QP connection. Possibly there is an error condition allowing this. If you could run with logging level=warn and subsys=ep_ctrl then you could catch the output indicating if of these occur. Also, would you mind letting me know what kind of connectivity is required by your application (e.g. all-to-all, many-to-one…)?
> - Sean
> Libfabric-users mailing list
> Libfabric-users at lists.openfabrics.org
More information about the Libfabric-users