[ofiwg] credits proposal
Reese Faucette (rfaucett)
rfaucett at cisco.com
Mon Dec 8 08:56:50 PST 2014
After a bit of off-list back and forth on the credit / send / recv API, here's my best shot:
First a definition: The "size" of a send/recv context (aka queue for many providers) is the number of "MAX size" operations that can be posted and pending. For IB, this is probably the number of WQEs/RQEs, for usnic it is the number of HW queue entries / iov_limit. So, with no API changes, we get to at least where verbs is now: easy accounting (just --x on post, ++x on completion), but providers like usnic pay a penalty in wasted queue space when iov_limit is large.
This "size" should be as conservative as possible, including injected data, immediate data, atomic this and that, everything. If size is N, an applicartion is guaranteed to be able to perform N sends/recvs/writes/reads with any legal parameters.
I propose adding two calls: fi_[rt]x_size_left() that return the number of "MAX size" operations that can be currently posted given what is currently posted and pending. Note that parameters of MAX size are already reported via fi_info (iov_limit, inject_size, max_msg_size, etc.). This eliminates the "divide queue depth by N" penalty paid by providers where send/recv operations consume variable amounts of hardware resources.
- apps that just want EAGAIN work fine
- apps that want to manage credits but care more about cycles than queue depth can just use --/++ on credits and be safe
- apps that want to manage credits and care about maximizing queue depth can use:
if (fi_tx_size_left(ep) > 0) fi_send(); else queue_it();
The reason the 3rd bullet works is that app may call fi_tx_size_left() and get back, say, 5, then do 3 "small" sends, and fi_tx_size_left() might then return 4 - size_left() will always return a number >= the value the app would have from simply counting the number of operations done vs completions. I cannot think of a provider whose internal accounting could not be represented this way.
I'm hoping we can use this to answer both:
- how do we define/specify/learn "size" of a tx/rx context?
- how can the app manage send/recv credits cleanly and efficiently?
size_t fi_rx_size_left(struct fid_ep *ep);
size_t fi_tx_size_left(struct fid_ep *ep);
Sean correctly points out that one weakness here is a provider where a single "max sized send/recv" may consume *all* resources, such that fi_tx_size_left() returns 1 on an empty queue. I think there are a couple of ways of handling that, "fall back to EAGAIN" being simplest. Or, worst case, if there is a compelling use case for this, we add an extended version of size_left() that takes some additional arguments to specify the operation to be something smaller than max size, but that additional complexity is what stalled us earlier and what this approach is trying to avoid.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the ofiwg