[ofiwg] send/recv "credits"

Thu Oct 9 05:29:10 PDT 2014

On Oct 9, 2014, at 3:04 AM, Reese Faucette (rfaucett) <rfaucett at cisco.com> wrote:

>> Specifically, why does the app need to know if its next post operation will
>> return EAGAIN or not?
> 
> Take a look at the usnic BTL for OpenMPI.  Once it learns it has credits to send a packet, it goes to potentially a lot of work building the packet from SGL, updating counters and pointers, updating sequence numbers, possibly piggy-backing ACKs, etc. etc.  Pretty messy to unravel if you get EAGAIN.   A simple suggestion is "leave all that bookkeeping work done, don't unwind, just queue the send and re-post it when you get completions until it succeeds."   The problem with that approach is that the world may have changed in the interim, and this may no longer be the next send you really want to post, but you are committed since all the accounting has been done (or deferred rewind, ugh, lots of state).
> 
> It's not just usnic, either - the openib BTL also manages credits itself and blows up if ibv_post_send() fails.  The portals BTL is the same.
> 
>> Any credit based solution also needs to incorporate 'injected' (aka inline) data.  
> 
> Yes, I figured those "cost" functions could need more arguments / semantic richness to be effective, the function prototypes were conceptual.
> 
> I totally agree a simple scalar credit mechanism cannot be applied everywhere.  In the old Myricom days, we had a send semantic where you had a limited total number of send bytes pending, and also a limited number of send operations pending, so a single compare would not suffice - had to be two: (send_len <= send_space && desc_avail > 0).
> 
> That said, I think the number of provider interfaces where it cannot be applied seems small compared to where it can work - Infiniband?  yes,   usnic?  yes,  sockets? yes,   PSM?  Don't know
> 
> So far support this has been mostly Jason and me talking with app-writer hats on claiming that this is an app-driven request, and Sean telling us to learn to write better apps :).  Anyone else care to weigh in from app-writer perspective?  "No, I always eagerly try to send until EAGAIN, then queue it, works fine."  or "yes, I really need to know before I attempt the send because XXXX" ?
> 
> Thanks,
> -reese

I prefer not to manage credits/slots/foo. I assume that in the common case resources are available and I unwind for EAGAIN/ENOBUFS in the uncommon case.

I am skeptical of the USnic BTL example. The BTL has to provide reliability, which is not the case for most apps. If the libfabric provider provides reliability, then all I, as the app-writer, need to do is manage my state, which I assert is less complicated if I am not implementing a reliability protocol. Within the provider, it can check the send arguments, determine the resource requirements, and return failure if they are not available before updating counters, sequence numbers, acks, etc.

That said, having written apps using sockets, MX, Portals3, Verbs, and uGNI, I adapt to whatever the interface gives me.

> 
>> -----Original Message-----
>> From: Hefty, Sean [mailto:sean.hefty at intel.com]
>> Sent: Wednesday, October 08, 2014 4:26 PM
>> To: Reese Faucette (rfaucett); Jason Gunthorpe
>> Cc: Sur, Sayantan; Doug Ledford; Jeff Squyres (jsquyres);
>> ofiwg at lists.openfabrics.org
>> Subject: RE: [ofiwg] send/recv "credits"
>> 
>> It would be helpful to understand better the application usage model here.
>> Specifically, why does the app need to know if its next post operation will
>> return EAGAIN or not?
>> 
>> Exposing attributes to the app is relatively low burden on the providers.
>> Exposing an API that could be called at any point has the potential of
>> negatively impacting performance.  For example, it could result in
>> serialization between the posting of operations and their completion, with
>> a significant impact dealing with requests that will not generate a
>> completion.
>> 
>> Based on various application requirements, the libfabric API has evolved to
>> contain the following objects.
>> 
>> Endpoint - an endpoint is associated with a transport level address Transmit
>> context - i.e. send queue, only used by advanced apps Receive context - i.e.
>> receive queue, only used by advanced apps
>> 
>> (Transmit and receive contexts are exposed using the struct fid_ep.)
>> Ultimately, there will be a many-to-many relationship between contexts
>> and endpoints.  The semantic that transmit and receive contexts are fixed
>> sized queues is something that I was hoping to move away from.  (Though,
>> to be fair, I was also hoping to hide the transmit and receive contexts
>> completely.)
>> 
>> Any credit based solution also needs to incorporate 'injected' (aka inline)
>> data.  Injected buffers are not necessarily referenced using an IOV, but may
>> instead require copying the data directly into the transmit queue.  The size
>> of the consumed queue space may also be dependent on the operation, not
>> the IOV.  Some of the complex atomic operations that have been defined
>> could easily consume 2-3 times more space in a transmit queue than a
>> simpler operation.
>> 
>> And we haven't even touched on immediate data, which apps have
>> requested be larger than the current 32-bits.
>> 
>> - Sean
> _______________________________________________
> ofiwg mailing list
> ofiwg at lists.openfabrics.org
> http://lists.openfabrics.org/mailman/listinfo/ofiwg