[libfabric-users] fi_read questions
Arne
arnestruck at astruck.de
Tue Oct 20 10:37:17 PDT 2020
Ok, I am confused. Totally to be honest.
First of all I think I have to explain something. In the program I build
one connection for message based communication and another for rma based
communication. Basically to be able to use different Provider for each
communication type later (both allocate their own fabric, domain, eq and
cqs, so no complications there).
Both endpoints have 127.0.0.1:RandomPort as dest-address.
Since I am using sockets provider for testing at the moment and message
based communication worked on that endpoint, I tried to call the
endpoint for message.
Long story short: Not everything is fine (there seems to be a problem
with transmited data), but most of the data transfer seems to work.
Have you any idea why this is? Is it not possible to build such an
infrastructure?
A question regarding the verbs-Provider: The mans mention that it does
not support FI_SOURCE for FI_EP_MSG. Does that mean the capability or
the flag for fi_getinfo? I assume it is the capability. If it is not,
how to give a passive endpoint the address to listen to?
Greetings, Arne.
Am 19.10.20 um 22:15 schrieb Hefty, Sean:
>> It seems I can get a libfabric version with enabled debuging somewhen
>> next week.
>>
>> I have set the FI_LOG_LEVEL to info, which generated this output just
>> before (or during?) the fi_read-call:
>>
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():594<warn> Connect
>> error, retrying - Operation now in progress - -1
>> libfabric:30377:sockets:ep_ctrl:sock_ep_connect():596<warn> Retry
>> connect to peer : fi_sockaddr_in://127.0.0.1:33607
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1879<warn> Unable to
>> find connection entry. Error in connecting: Resource temporarily unavailable
>> libfabric:30377:sockets:ep_ctrl:sock_ep_get_conn():1882<warn> Unable to
>> connect to: fi_sockaddr_in://127.0.0.1:33607
>>
>>
>> To me this looks like the socket endpoint was connected to localhost
>> (which should be not my endpoint, since that runs on 10.0.10.61:4711)
>> under an unused port. Is that something of the inner workings of
>> libfabric or what do I read here?
> Did you specify source and destination addresses to use? If not, then it's possible that the loopback address was picked by default. On the client, you would usually pass in the server's IP address as the node parameter to fi_getinfo.
>
> - Sean
More information about the Libfabric-users
mailing list