[libfabric-users] Sockets is writing two errors to cq on disconnects
Hefty, Sean
sean.hefty at intel.com
Wed Apr 1 15:19:31 PDT 2020
> I am currently using sockets;rdm and I've noticed that when I try to send data to an
> already closed connection
> two error completions will be generated when called for the first time.
Can you test with tcp;rxm and see if that works for your application?
> The first error completion is just a mostly empty one with no context and an error of
> FI_EIO.
> The second error is the expected one with proper context and buffer (, err is also
> FI_EIO).
>
> Is this intended?
This sounds like a bug.
> Is there a way to catch the first error without just checking if the context is set?
I would need to see understand why there are 2 error entries being written.
> My plan in general is to detect if there are stale requests and
> after a certain timeout I try to send a ping package to the other peer to check if the
> connection is still working.
Hmm... We should look at exposing some sort of keep-alive option to apps. I'll think about this and see what could be done.
- Sean
More information about the Libfabric-users
mailing list