[libfabric-users] Address not reachable
Isaac Nuñez
isaacnez at outlook.com
Fri May 15 16:46:22 PDT 2020
> Am 16.05.2020 um 01:16 schrieb Hefty, Sean <sean.hefty at intel.com>:
>
>
>>
>> Is there a reason why an addr is not reachable? Following my case previously described.
>> One application runs on x node and it is started by srun (from another node), if then i
>> connect that with a single application (my test case was fi_rdm_rma_simple), it
>> connects, BUT, if connect fi_rdm_rma_simple to my process running in other node, it
>> says error -61. Anyone has any idea why would that happen? Keep in mind that my
>> application started by srun is just one process.
>
> It's hard to say. If the server wasn’t running by the time the client tried to connect, you might get this failure. -61 is connection refused. You can use fi_strerror() to convert errno into strings.
The server was running. It only happened when the server was launched using srun. If the server was launched as ./server and the client with srun, it would connect. I tested my application and fi_rdm_rma_simple and they presented the same behavior.
More information about the Libfabric-users
mailing list