[libfabric-users] fi_read questions

Arne Struck arnestruck at astruck.de
Mon Oct 19 13:57:46 PDT 2020


Am 19.10.2020 um 22:15 schrieb Hefty, Sean:
>> It seems I can get a libfabric version with enabled debuging somewhen
>> next week.
>>
>> I have set the FI_LOG_LEVEL to info, which generated this output just
>> before (or during?) the fi_read-call:
>>
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1879<warn> Unable to
>> find connection entry. Error in connecting: Resource temporarily unavailable
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1882<warn> Unable to
>> connect to: fi_sockaddr_in://127.0.0.1:33607
>>
>>
>> To me this looks like the socket endpoint was connected to localhost
>> (which should be not my endpoint, since that runs on 10.0.10.61:4711)
>> under an unused port. Is that something of the inner workings of
>> libfabric or what do I read here?
> Did you specify source and destination addresses to use?  If not, then it's possible that the loopback address was picked by default.  On the client, you would usually pass in the server's IP address as the node parameter to fi_getinfo.
>
> - Sean

This happens on the server side (the call is there). The endpoint is 
created with the fi_info from the clients connection request. In my 
understanding this should hold the information of the clients endpoint 
(modes, caps, provider, etc) and necessary address-parameter.

During the creation process a FI_CONNECTED-event should be present at 
both ends of the creation process which in my understanding signals that 
a connection was established.

I will double-check tomorrow whether the fi_info received by the server 
contains the necessary data (port, ip, caps, etc) and whether the an 
FI_CONNECTED is received.


Greetings, Arne.



More information about the Libfabric-users mailing list