[libfabric-users] Sockets is writing two errors to cq on disconnects

Carsten Patzke carsten.patzke at desy.de
Thu Apr 2 04:57:27 PDT 2020


When using tcp;ofi_rxm, "No route to host" is being reported when calling fi_send.
No entry to the queue is made.

Which I guess is perfectly acceptable.


The reason why I use sockets intend of tcp;ofi_rxm is that I got RMA issues.
I used the same arguments as in verbs;ofi_rxm and sockets.

hints->domain_attr->mr_mode = FI_MR_ALLOCATED | FI_MR_VIRT_ADDR | FI_MR_PROV_KEY

libfabric:18695:tcp:ep_data:tcpx_validate_rx_rma_data():387<warn> invalid rma iov received
libfabric:18695:tcp:domain:tcpx_get_rx_entry_op_write():544<warn> invalid rma data

I probably have to debug the internal code again to know where the error lies.


----- Original Message -----
From: "Sean Hefty" <sean.hefty at intel.com>
To: "Carsten Patzke" <carsten.patzke at desy.de>, "libfabric-users" <libfabric-users at lists.openfabrics.org>
Sent: Thursday, April 2, 2020 12:19:31 AM
Subject: RE: Sockets is writing two errors to cq on disconnects

> I am currently using sockets;rdm and I've noticed that when I try to send data to an
> already closed connection
> two error completions will be generated when called for the first time.

Can you test with tcp;rxm and see if that works for your application?

> The first error completion is just a mostly empty one with no context and an error of
> FI_EIO.
> The second error is the expected one with proper context and buffer (, err is also
> FI_EIO).
> 
> Is this intended?

This sounds like a bug.

> Is there a way to catch the first error without just checking if the context is set?

I would need to see understand why there are 2 error entries being written.

> My plan in general is to detect if there are stale requests and
> after a certain timeout I try to send a ping package to the other peer to check if the
> connection is still working.

Hmm... We should look at exposing some sort of keep-alive option to apps.  I'll think about this and see what could be done.

- Sean


More information about the Libfabric-users mailing list