<div dir="ltr">Hi Sean,<div><br></div><div>Thanks for answer!<br><div><br></div><div>> <span style="font-size:12.8px">Please look at the code in prov/util for help.  The socket code was designed around using it as a development tool, so I wouldn't recommend trying to copy its implementation.</span></div><div><br></div><div>I'm decided to analyze sockets provider, because it also implements full tx/rx processing on provider-level that we also need.<br></div><div><br></div><div>> <span style="font-size:12.8px">If you are attempting to implement reliable-datagram semantics, then the use of lists may be better than a queue.  Messages may complete out of order when targeting different peers.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">Yes, I'm working on FI_EP_RDM implementation. I didn't think about out of order messaging before your advice, b</span><span style="font-size:12.8px">ut the usage of lists fits really good in this case. Thank you!</span></div><div><span style="font-size:12.8px"><br></span></div><div>> <span style="font-size:12.8px">Depending on your provider, you may also be able to take advantage of the utility providers.</span></div><div><span style="font-size:12.8px"><br></span></div><div>I paid attention to the availability of a util provider and try to use its functions where possible.</div></div><div><br></div><div>> <span style="font-size:12.8px">Yes, the app calling cq_read needs to be sufficient to drive progress.  Note that this is expected by the app in the manual progress mode even if no completions are expected.</span></div><div><br></div><div>As far as I understand, this is sufficient, but not necessary to call <span style="font-size:12.8px">cq_read</span>, for application, since we can use other functions of libfabric API for data progress (fi_cntr_read) in FI_PROGRESS_MANUAL. </div><div>Maybe I should ask this question in MPICH/OpenMPI development mailing-lists, but is it possible to run these MPI implementations over provider, which has only fi_cq part of API for manual data progress?</div><div><br></div><div>BR,</div><div>Mikhail Khalilov</div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-03-16 19:50 GMT+03:00 Hefty, Sean <span dir="ltr"><<a href="mailto:sean.hefty@intel.com" target="_blank">sean.hefty@intel.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">copying ofiwg -- that mail list is better suited for your questions.<br>

<span class=""><br>

> My group works on implementing of new libfabric provider for our HPC<br>

> interconnect. Our current main goal is to run MPICH and OpenMPI over<br>

> this provider.<br>

<br>

</span>welcome!<br>

<span class=""><br>

> The problem is, that this NIC haven't any software and hardware rx/tx<br>

> queues for send/recv operations. We're decided to implement it on<br>

> libfabric provider-level. So, I'm looking for data structure for queue<br>

> store and processing.<br>

><br>

> I took a look in sockets provider code. As far as I understand, tx_ctx<br>

> stores pointers to all information (flags, data, src_address and etc.)<br>

> about every message to send in ring buffer, but rx_ctx stores every<br>

> rx_entry in double-linked list. What was the motivation for choosing<br>

> such data structures when implementing these queues are different used<br>

> to process tx and rx?<br>

<br>

</span>Please look at the code in prov/util for help.  The socket code was designed around using it as a development tool, so I wouldn't recommend trying to copy its implementation.<br>

<br>

The udp provider is a good place to start for how to construct a very simple software provider.  You may also want to scan the include/ofi_xxx.h files for helpful abstractions.  There's a slightly out of date document in docs/providers that describes what's available.  ofi_list.h and ofi_mem.h both have useful abstractions.<br>

<span class=""><br>

> Maybe you can give advice on the implementation of queues or give some<br>

> useful information on this topic?<br>

<br>

</span>If you are attempting to implement reliable-datagram semantics, then the use of lists may be better than a queue.  Messages may complete out of order when targeting different peers.<br>

<br>

Depending on your provider, you may also be able to take advantage of the utility providers.  RxM will implement reliable-datagram support over reliable-connections.  That is functional today.  RxD targets reliable-datagram over unreliable-datagram.  That is a work in progress, however.<br>

<span class=""><br>

> The second problem is about suitable way for progress model. For CPU<br>

> performance reasons I want to choose FI_PROGRESS_MANUAL as primary<br>

> mode for the processing of an asynchronous requests, but I do not<br>

> quite understand how an application thread provides data progress. For<br>

> example, is it enough to call fi_cq_read() from MPI implementation<br>

> always when it wants to make a progress?<br>

<br>

</span>Yes, the app calling cq_read needs to be sufficient to drive progress.  Note that this is expected by the app in the manual progress mode even if no completions are expected.<br>

<span class="HOEnZb"><font color="#888888"><br>

- Sean<br>

</font></span></blockquote></div><br></div>