[ofiwg] send/recv "credits"

Sat Sep 27 09:59:07 PDT 2014

</on soapbox>
Honestly, I think an app designed for EAGAIN will be the best performing.  A credit based scheme introduces more branches, additional memory reads and writes, and arithmetic operations.  EAGAIN optimizes for a common case of having available queue space.
</off soapbox>

>From the perspective of an application, it submits an operation followed by some number of IO vectors.  (The only IOV defined now is an SGE.)  I think the following mechanism allows for generic transmit 'credit' handling.

A transmit context (see my other email) has these attributes:

size - This is the size of the transmit side window, in bytes.

iov_limit - The maximum number of IO vectors that a single operation can support.

op_alignment - Any alignment restriction on the amount of window space consumed for a single operation.

A domain has these attributes:

op_size - The amount of ctx_size window space consumed for a single data transfer operation.  Conceptually, this is the size of any command header required to track the request.

iov_size  - The amount of window space consumed for each IOV.

An advanced application can use these values to determine if it has sufficient window space available to issue an operation.  For IB/iWarp, the provider can set:

	op_size = ctx_size / nentries
	iov_size = 0
	op_alignment = op_size

an app check of:

	op_alignment >= op_size + iov_size * iov_limit

indicates that a simple credit based scheme will work, with credits = ctx_size / op_alignment.  For usnic, the provider can set:

	op_size = 0
	iov_size = ctx_size / nentries
	op_alignment = iov_size

These are just examples.  The only requirement is that an app be able to track the available window to prevent overruns.  Windowing can be handled generically:

	needed = ((op_size + iov_size * nsge) + op_alignment - 1) & ~(op_alignment - 1)

assuming that op_alignment is a power of 2, and my math doesn't suck.  An app can pre-calculate any common cases, such as nsge == 1. 

The handling of inject() (i.e. inline_data) is a derivation of this, where the needed window size is op_size + inject_size, subject to alignment restrictions.

Of course, all of this only works if the transit window is a fixed size.  :)  But it enables a provider to make more efficient use of command queue space.  Because of the various completion mechanisms, the app would be responsible for determining when and how much window space is freed.

My head hurts.

- Sean