[libfabric-users] connection-less send/recv with verbs
drocco at di.unito.it
Sat Jul 22 04:09:59 PDT 2017
After spending some time playing with “control services” (fi_getinfo etc.), I concluded that all the “issues” I mentioned in the previous emails boiled down to the two following aspects:
1. the “verbs" provided seems to have a number of limitations (i.e., mismatches w.r.t. specification), whereas the “verbs;ofi_rxm" provider works in all cases I tested; in my code, I stress connectionless communication in the scenario of highly asynchronous processes (i.e., no initial handshake) with multiple node:port addresses for each process (i.e., multiple endpoints per domain);
2. from the documentation, it is quite hard (actually impossible for me) to understand how to bind an address to an endpoint; by looking at fabtests code and experimenting by myself, I concluded that the binding is realized either statically (at endpoint creation time, by passing a info structure where src_addr is set) or dynamically (by calling fi_setname); however, the documentation is at least vague about this aspect, for instance I could not find anything about the static binding.
If someone is still interested in debugging the verbs provider, I wrote some minimal tests ranging from a fabtests-equivalent pingpong to its multiple-endpoints-per-domain variant:
All of them work with “sockets" and “verbs;ofi_rxm” providers, whereas the “verbs” provider fails in most cases.
Thank you again for your support :)
University of Torino, department of Computer Science
Via Pessinetto 12, 10149 Torino - Italy
> On 19 Jul 2017, at 17:47, Hefty, Sean <sean.hefty at intel.com> wrote:
>> Another issue that I experience with the verbs provider is that I am
>> not able to instantiate two endpoints to a single domain.
>> Starting from a code working correctly on verbs with a single endpoint
>> e0, if I create an additional endpoint e1 to the same domain as e0,
>> the code does not work anymore, even if I do not access e1 at all
>> after having created it. By “the code does not work”, I mean it stalls
>> on the first call to fi_recv.
>> Should I add both mentioned issues to the GitHub issue tracker?
> Yes, please open as many issues as you find. :)
> Note that the intent is to replace the verbs RDM endpoint support with the ofi-rdm provider, assuming it provides equivalent performance. So, we'd like to understand why the ofi-rdm;verbs combo doesn't work for you.
> - Sean
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Libfabric-users