[libfabric-users] subtle difference between sockets and tcp; ofi_rxm

Hefty, Sean sean.hefty at intel.com
Fri Apr 30 09:24:58 PDT 2021


> When I use tcp;ofi_rxm if I start the client node first and send a message to the
> master, the message fails with (ret == -FI_EAGAIN) and unfortunately, if I keep
> retrying whilst starting the master node, the message does not ever complete.

This is a bug.  It's supposed to work.

> Is there anything I can do to make the tcp version behave the same way as the sockets
> one?

We need to figure out what the problem is any why the connection isn't retried.

- Sean


More information about the Libfabric-users mailing list