[openib-general] [RFC] libibverbs completion event handling

Michael Krause krause at cup.hp.com
Thu Sep 22 08:38:44 PDT 2005


At 03:33 PM 9/21/2005, Caitlin Bestler wrote:
>I'm not sure I follow what a "completion channel" is.
>My understanding is that work completions are stored in
>user-accessible memory (typically a ring buffer). This
>enables fast-path reaping of work completions. The OS
>has no involvement unless notifications are enabled.
>
>The "completion vector" is used to report completion
>notifications. So is the completion vector a *single*
>resource used by the driver/verbs to report completions,
>where said notifications are then split into user
>context dependent "completion channels"?
>
>The RDMAC verbs did not define callbacks to userspace
>at all. Instead it is assumed that the proxy for user
>mode services will receive the callbacks, and how it
>relays those notifications to userspace is outside
>the scope of the verbs.

Correct.


>Both uDAPL and ITAPI define relays of notifications
>to AEVDS/CNOs and/or file descriptors. Forwarding
>a completion notification to userspace in order to
>make a callback in userspace so that it can kick
>an fd to wake up another thread doesn't make much
>sense. The uDAPL/ITAPI/whatever proxy can perform
>all of these functions without any device dependencies
>and in a way that is fully optimal for the usermode
>API that is being used.

Exactly. This was the intention.  Does not really matter what the API is 
but that there by an API that does this work on behalf of the consumer.

>For kernel clients, I don't see any need for anything beyond the already 
>defined
>callbacks direct from the device-dependent code.

This was the intention when we designed the verbs.

>Even in the typical case where the usermode application
>does an evd_wait() on the DAT or ITAPI endpoint, the
>DAT/ITAPI proxy will be able to determine which thread
>should be woken and could even do so optimally. It
>also allows the proxy to implemenet Access Layer features
>such as EVD thresholding without device-specific support.

Correct.


> > -----Original Message-----
> > From: openib-general-bounces at openib.org
> > [mailto:openib-general-bounces at openib.org] On Behalf Of Roland Dreier
> > Sent: Wednesday, September 21, 2005 12:22 PM
> > To: openib-general at openib.org
> > Subject: [openib-general] [RFC] libibverbs completion event handling
> >
> > While thinking about how to handle some of the issues raised
> > by Al Viro in <http://lkml.org/lkml/2005/9/16/146>, I
> > realized that our verbs interface could be improved to make
> > delivery of completion events more flexible.  For example,
> > Arlin's request for using one FD for each CQ can be
> > accomodated quite nicely.
> >
> > The basic idea is to create new objects that I call
> > "completion vectors" and "completion channels."  Completion
> > vectors refer to the interrupt generated when a completion
> > event occurs.  With the current drivers, there will always be
> > a single completion vector, but once we have full MSI-X
> > support, multiple completion vectors will be possible.

When I proposed the use of multiple completion handlers, it was based on 
the operating assumption that either MSI or MSI-X be used by the underlying 
hardware.  Either is possible - MSI limits it to a single address with 32 
data values which allows different handlers to be bound to each value 
though targeting a single processor.  MSI-X builds upon technology we've 
been shipping for nearly 20 years now and allows up to 2048 different 
addresses which may target or multiple processors.  Any API should be able 
to deal with both approaches thus should not assume anything about whether 
one or more handlers are bound to a given processor.

> > Orthogonal to this is the notion of a completion channel.
> > This is a FD used for delivering completion events to userspace.
> >
> > Completion vectors are handled by the kernel, and userspace
> > cannot change the number of vectors that available.  On the
> > other hand, completion channels are created at the request of
> > a userspace process, and userspace can create as many
> > channels as it wants.
> >
> > Every userspace CQ has a completion vector and a completion channel.
> > Multiple CQs can share the same completion vector and/or the
> > same completion channel.  CQs with different completion
> > vectors can still share a completion channel, and vice versa.
> >
> > The exact API would be something like the below.  Thoughts?

Why wouldn't it just be akin to the verbs interface - here are the event 
handler and callback routines to associate with a given CQ.  The handler 
might be nothing more than an index into a set of functions that are stored 
within the kernel - these functions are either device-specific (i.e. 
supplied by the IHV) or a OS-specific such as dealing with error events 
(might also have a device-specific component as well).  When the routine is 
invoked, it has basically has three parameters: CQ to target, number of CQE 
to reap, address to store CQE.  I do not see what more is required.

Mike

> >
> > Thanks,
> >   Roland
> >
> > struct ibv_comp_channel {
> >       int                     fd;
> > };
> >
> > /**
> >  * ibv_create_comp_channel - Create a completion event
> > channel  */ extern struct ibv_comp_channel
> > *ibv_create_comp_channel(struct ibv_context *context);
> >
> > /**
> >  * ibv_destroy_comp_channel - Destroy a completion event
> > channel  */ extern int ibv_destroy_comp_channel(struct
> > ibv_comp_channel *channel);
> >
> > /**
> >  * ibv_create_cq - Create a completion queue
> >  * @context - Context CQ will be attached to
> >  * @cqe - Minimum number of entries required for CQ
> >  * @cq_context - Consumer-supplied context returned for
> > completion events
> >  * @channel - Completion channel where completion events will
> > be queued.
> >  *     May be NULL if completion events will not be used.
> >  * @comp_vector - Completion vector used to signal completion events.
> >  *     Must be >= 0 and < context->num_comp_vectors.
> >  */
> > extern struct ibv_cq *ibv_create_cq(struct ibv_context
> > *context, int cqe,
> >                                   void *cq_context,
> >                                   struct ibv_comp_channel *channel,
> >                                   int comp_vector);
> >
> > /**
> >  * ibv_get_cq_event - Read next CQ event
> >  * @channel: Channel to get next event from.
> >  * @cq: Used to return pointer to CQ.
> >  * @cq_context: Used to return consumer-supplied CQ context.
> >  *
> >  * All completion events returned by ibv_get_cq_event() must
> >  * eventually be acknowledged with ibv_ack_cq_events().
> >  */
> > extern int ibv_get_cq_event(struct ibv_comp_channel *channel,
> >                           struct ibv_cq **cq, void
> > **cq_context); _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit
> > http://openib.org/mailman/listinfo/openib-general
> >
> >
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20050922/03b3251e/attachment.html>


More information about the general mailing list