[ofiwg] send/recv "credits"

Tue Oct 7 23:04:05 PDT 2014

Following up on discussion in today's call, I propose two new classes of calls, one to check available credits and one to check the cost of operations.  I am going to keep calling them "credits" instead of "bytes" for now, but these are arbitrary units of "something", you get some number of them on an EP and each operation you do consumes some number of them, returned when the operation completed.

int fi_get_send_credits(ep);  // current number of send credits on EP
int fi_get_recv_credits(ep);  // current number of recv credits on EP

// how many credits would this send require?
int fi_sendv_cost(ep, struct iovec *iov, int iovcnt);  

// how many credits would this recv require?
int fi_recvv_cost(ep, struct iovec *iov, int iovcnt);  

(a non-SGL send/recv could be defined to use same # of credits as corresponding iovcnt=1 operation, but we could also add fi_send_cost(ep, buf, len);)

Then, user has a lot of flexibility in how to use these, though generally I would expect the app to call fi_xxx_cost() a few times at startup for different classes of sends and cache the results.

One way would be to do a sendv_cost or recv_cost on maximal size send app will do and always test:
   If (fi_get_send_credits(ep) > max_credits_per_send) { compose and send; }

If an app does not care about shrinking send queue depth, it is free to say:
  my_credits = fi_get_send_credits(ep) / max_credits_per_send; 
and then always increment/decrement my_credits by 1 on consumption/completion, independent of provider.

Often, sends go down different silos based on various criteria in an application anyhow, so the app could check the cost of a send for each silo and select its own send routine with constants for the credit values, avoiding the memory references.

And, of course, an app could ignore all of these calls and rely on EAGAIN.  Also, not all EPs necessarily support this mode of operation, for example EPs with multiple underlying HW resources (queues) where any given send/recv gets late-bound to one of the hidden resources.

This mode of operation would be requested via attribute (FI_SEND_CREDITS / FI_RECV_CREDITS ?) at endpoint open.  Open question of whether this implicitly turns off provider double-checking the credits on each send (I *think* I'd like it to), whether that mode is a separate flag (FI_NO_CREDIT_CHECK?) (I'm ok with that also), or whether it's not really worth it to get the provider to forgo the check.

-r

> -----Original Message-----
> From: Hefty, Sean [mailto:sean.hefty at intel.com]
> Sent: Monday, September 29, 2014 12:01 PM
> To: Jason Gunthorpe
> Cc: Reese Faucette (rfaucett); Sur, Sayantan; Doug Ledford; Jeff Squyres
> (jsquyres); ofiwg at lists.openfabrics.org
> Subject: RE: [ofiwg] send/recv "credits"
> 
> > > Independent from EAGAIN, Does the op_size / iov_size / op_alignment
> > > proposal work for apps that want to track send queue usage separate
> > > from the provider's tracking?
> >
> > I didn't follow it too closely, sorry.  How does an app adapt a
> > provider that is telling it to use sge entries to work with a wire
> > protocol that is defined in terms of wqes?
> 
> The size of the transmit queue is reported in bytes.  An app does this check
> to determine if it can queue an entry into the transmit queue.  (An app can
> simplify this check in certain cases.)
> 
> 	needed = ((op_size + iov_size * nsge) + op_alignment - 1) &
> ~(op_alignment - 1)
> 
> For providers that support WQEs, needed = op_alignment.
> For providers that support SGEs, needed = iov_size * nsge.
> 
> This should also support providers where the size of the queue is fixed, but
> the number of entries is not.
> 
> > The remote CQ doesn't overflow because every SQE and RQE is still
> > guarenteed by the app to have an available CQE before it is posted. So
> > you are guarenteed to hit RQ exhaustion before you hit CQ exhaustion.
> 
> Libfabric supports, but does not assume a 1:1 mapping between a posted
> receive buffer and a CQE.  This allows for more efficient use of receive
> buffering, but does require a more advanced form of flow control than
> current hardware supports.
> 
> I don't want the API to assume that even a CQ has a fixed number of entries.
> An app should be able to determine the minimum number of entries any
> queue may support, without restricting all providers or applications to using
> that same model.
> 
> - Sean