[libfabric-users] connection-less send/recv with verbs

Maurizio Drocco drocco at di.unito.it
Sat Jul 22 04:09:59 PDT 2017


Hello Sean,

After spending some time playing with “control services” (fi_getinfo etc.), I concluded that all the “issues” I mentioned in the previous emails boiled down to the two following aspects:

1. the “verbs" provided seems to have a number of limitations (i.e., mismatches w.r.t. specification), whereas the “verbs;ofi_rxm" provider works in all cases I tested; in my code, I stress connectionless communication in the scenario of highly asynchronous processes (i.e., no initial handshake) with multiple node:port addresses for each process (i.e., multiple endpoints per domain);

2. from the documentation, it is quite hard (actually impossible for me) to understand how to bind an address to an endpoint; by looking at fabtests code and experimenting by myself, I concluded that the binding is realized either statically (at endpoint creation time, by passing a info structure where src_addr is set) or dynamically (by calling fi_setname); however, the documentation is at least vague about this aspect, for instance I could not find anything about the static binding.

If someone is still interested in debugging the verbs provider, I wrote some minimal tests ranging from a fabtests-equivalent pingpong to its multiple-endpoints-per-domain variant:
http://alpha.di.unito.it:8080/drocco/fabspikes
All of them work with “sockets" and “verbs;ofi_rxm” providers, whereas the “verbs” provider fails in most cases.

Thank you again for your support :)

Maurizio
---
Maurizio Drocco
PhD Candidate
University of Torino, department of Computer Science
Via Pessinetto 12, 10149 Torino - Italy

> On 19 Jul 2017, at 17:47, Hefty, Sean <sean.hefty at intel.com> wrote:
> 
>> Another issue that I experience with the verbs provider is that I am
>> not able to instantiate two endpoints to a single domain.
>> Starting from a code working correctly on verbs with a single endpoint
>> e0, if I create an additional endpoint e1 to the same domain as e0,
>> the code does not work anymore, even if I do not access e1 at all
>> after having created it. By “the code does not work”, I mean it stalls
>> on the first call to fi_recv.
>> 
>> Should I add both mentioned issues to the GitHub issue tracker?
> 
> Yes, please open as many issues as you find.  :)
> 
> Note that the intent is to replace the verbs RDM endpoint support with the ofi-rdm provider, assuming it provides equivalent performance.  So, we'd like to understand why the ofi-rdm;verbs combo doesn't work for you.
> 
> - Sean

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/libfabric-users/attachments/20170722/a953588b/attachment.html>


More information about the Libfabric-users mailing list