[Openib-windows] RE: Reference count as a solution to the problem of an object life time

Wed Sep 14 15:20:29 PDT 2005

> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Wednesday, September 14, 2005 1:46 PM
> 
> This mail introduces a generic approach to solve this family of problems. As
> the subject of this mail suggests this is the method of reference counting.
> Reference counting is a very simple yet extremely effective mechanism for
> managing objects lifetime in a multithreaded environment‎. It is used in
> many programs and the most popular of them is probably in COM.
> The big advantage of this method is that in each use of the object one
> doesn’t have to think about the entire problem, rather it only has to think
> of what he is doing with this object. Following a relatively simple set of
> (per-function) rules ensures that the object is destroyed on the correct
> place. The idea is similar to the idea of “the last person leaving the room
> should turn the off the light”. Since the object is actually counting the
> references to it, one should only tell the object when it starts using it
> and when it has finished. Once finished, the object destroys it self.
> 
> Each object should implement three methods: 1) AddRef (), 2) Release() and
> 3) DestructMe() (In C++ this is usually the destructor). Any one who uses an
> object should call AddRef () before using the object, and call Release()
> when he has finished working with the object. The object has an integer
> filed that represents it reference count. The one who creates the object
> creates it with a reference count of 1. When the reference goes down to 0
> the DestructMe() function is called (implicitly) and the object is destroyed
> and it’s memory is being released. This happens without the caller having
> nothing to know about the object.

Hopefully the concept of reference counting isn't new to kernel developers.  One
thing you didn't note is that object creation should perform an implicit
AddRef() call so that an object returned is always valid.  Otherwise there is a
race between the object being allocated and the call that performed the
allocation unwinding to the user.  The reference must be taken as soon as the
object is allocated, even before it is returned to the user.

Simple reference counting doesn't work with callback driven objects - just
releasing a reference doesn't tell the object whether to stop invoking
callbacks.  Likewise, having released a reference doesn't tell the user that
there are no more callbacks outstanding.  For callback driven objects something
larger than reference counting must be used.

One solution is to not ever have callback driven objects, and require explicit
requests for notifications from the user.  However, this requires reciprocal
functionality to allow a user to cancel an event notification request.

Another solution (and what IBAL implements) is to provide reference counting,
but rather than using the reference count to trigger destruction, provide a
function that *initiates* destruction, and invokes a callback when said
destruction is complete.  Destruction would only proceed when the reference
count reaches zero, and the user would subsequently receive an asynchronous
notification that destruction is complete.  Only upon such notification is it
safe for a user to release their context information.

An alternative to what IBAL does, and one implemented by most IB stacks today
(as well as the HCA driver in Windows), is blocking destroy semantics.  In this
model, the user still initiated destruction via a function call.  While this
works well for user-mode clients, it is unsuitable for kernel clients where
execution context may not permit blocking operations.

Yet another solution is to have callback driven objects take an interface
representing the client's context, providing the object with the ability to take
and release references on the client's context.  This would allow a CQ, for
example, to hold a reference on its user's context as long as a callback could
be outstanding.  This isn't much different than the current implementation,
however, aside from providing the ability to take additional references on the
client's context.

What components in the current stack are lacking reference counting?  Are you
suggesting changing the way reference counting is performed?  Are you suggesting
that we change how objects are destroyed?  Is there a need to expose AddRef and
Deref functionality in the API?  Your mail makes it sound like there is a
rampant lack of reference counting, which isn't the case as far as I can tell.

- Fab