[libfabric-users] fi_read questions

Arne arnestruck at astruck.de
Tue Oct 20 10:37:17 PDT 2020


Ok, I am confused. Totally to be honest.

First of all I think I have to explain something. In the program I build 
one connection for message based communication and another for rma based 
communication. Basically to be able to use different Provider for each 
communication type later (both allocate their own fabric, domain, eq and 
cqs, so no complications there).


Both endpoints have 127.0.0.1:RandomPort as dest-address.

Since I am using sockets provider for testing at the moment and message 
based communication worked on that endpoint, I tried to call the 
endpoint for message.

Long story short: Not everything is fine (there seems to be a problem 
with transmited data), but most of the data transfer seems to work.


Have you any idea why this is? Is it not possible to build such an 
infrastructure?


A question regarding the verbs-Provider: The mans mention that it does 
not support FI_SOURCE for FI_EP_MSG. Does that mean the capability or 
the flag for fi_getinfo? I assume it is the capability. If it is not, 
how to give a passive endpoint the address to listen to?


Greetings, Arne.



Am 19.10.20 um 22:15 schrieb Hefty, Sean:
>> It seems I can get a libfabric version with enabled debuging somewhen
>> next week.
>>
>> I have set the FI_LOG_LEVEL to info, which generated this output just
>> before (or during?) the fi_read-call:
>>
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1879<warn> Unable to
>> find connection entry. Error in connecting: Resource temporarily unavailable
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1882<warn> Unable to
>> connect to: fi_sockaddr_in://127.0.0.1:33607
>>
>>
>> To me this looks like the socket endpoint was connected to localhost
>> (which should be not my endpoint, since that runs on 10.0.10.61:4711)
>> under an unused port. Is that something of the inner workings of
>> libfabric or what do I read here?
> Did you specify source and destination addresses to use?  If not, then it's possible that the loopback address was picked by default.  On the client, you would usually pass in the server's IP address as the node parameter to fi_getinfo.
>
> - Sean




More information about the Libfabric-users mailing list