[libfabric-users] Sockets is writing two errors to cq on disconnects

Hefty, Sean sean.hefty at intel.com
Wed Apr 1 15:19:31 PDT 2020


> I am currently using sockets;rdm and I've noticed that when I try to send data to an
> already closed connection
> two error completions will be generated when called for the first time.

Can you test with tcp;rxm and see if that works for your application?

> The first error completion is just a mostly empty one with no context and an error of
> FI_EIO.
> The second error is the expected one with proper context and buffer (, err is also
> FI_EIO).
> 
> Is this intended?

This sounds like a bug.

> Is there a way to catch the first error without just checking if the context is set?

I would need to see understand why there are 2 error entries being written.

> My plan in general is to detect if there are stale requests and
> after a certain timeout I try to send a ping package to the other peer to check if the
> connection is still working.

Hmm... We should look at exposing some sort of keep-alive option to apps.  I'll think about this and see what could be done.

- Sean


More information about the Libfabric-users mailing list