[Openib-windows] RE: Reference count as a solution to the problem of an object life time

Mon Sep 19 14:51:09 PDT 2005

> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
> Sent: Monday, September 19, 2005 1:45 PM
> 
> Hi Fab,
>
> Perhaps I'm wrong about this, but there is one difference between the two
> models.
>
> I believe that this difference is what forces one to wait for the destroy
> completion. This difference is when the number of call backs is not known to
> the user. In my approach the library can play with the reference count and
> make sure that it is increased even when there are more than one callback,
> while on your model (one removes the reference in the callback) one can not
> know when to remove the reference.

You do know when to remove the reference - in the destroy callback.  Once the
deref call is invoked, no further callbacks will ever occur for that object.

It's a slight change in logic.  Your model takes a reference when it needs to
invoke a callback, and releases it after the callback returns.  The IBAL object
model holds a reference for the lifetime of the object, and releases it when
that object is destroyed.  Obviously, this doesn't support sharing objects
between multiple users well, but this hasn't been a problem yet.

If you're sharing an object, waiting until that object is destroyed does keep a
user's context around longer than might be necessary.  But objects aren't shared
in any of the public APIs, so this isn't a problem as far as I know.

> A good example for this is the CM API's. If
> I understand correctly, one call CM_REQ and doesn't know how many call backs
> there will be. For example I have received a REP, and if I was too busy to
> answer I have received a DREQ. As I understood it, the only way to know that
> the last callback was sent is to wait for the completion of the destroy of the
> QP.

The current CM design invokes the CM callbacks on a QP, and you can't destroy
the QP until all its associated callbacks have been either delivered or
cancelled.  I don't see how your model would change that.  The user's QP context
can't be freed until the access layer notifies the user that it is safe to do
so.  I don't see how your model helps this.

I believe that in the current code, all pending (not in flight) events are
flushed as soon as you initiate destruction of the QP.  If you initiate QP
destruction from the REP callback, the REJ gets flushed, and when the REP
callback unwinds, reference counts get released and the queue pair's reference
count goes to zero and your destroy callback gets invoked.

> A similar example is a timer object that works automatically, that is you set
> it once and every second, you get a call back. A new callback, is started even
> if the previous wasn't. In this model, when I want to stop things, I just
> can't know when the last call back will happen. The only way to solve this is
> to call stop on timer, wait (event, callback or whatever) for the timer to
> stop and than remove my reference (I want to make this clear there might still
> be others using this object!!!).

So you want to have a timer that can invoke multiple user's callbacks?  This
introduces a new object model - currently, there is only one recipient of
callbacks.  For a multi-client object, having the ability to take references
would help let clients deregister without having to wait until all clients have
deregistered.  Shutdown/destroy/whatever wouldn't really destroy the timer, it
would just deregister the callback, and when there are no references left, would
implicitly destroy the timer.  This isn't a timer object anymore, as it
introduces the notion of event dispatching to allow multiplexing to multiple
clients.  For this, I agree that allowing the dispatcher to take references on
each client can be helpful.

This is only an issue for multi-client objects - single user objects don't need
the ability to take extra references.  I don't see any need for multi-client
objects being exposed in the API - maybe if you could explain what you're trying
to do it might make more sense to me.

> On the model that I propose, the timer will increase the reference count
> before each callback and will decrease the reference after the callback.
> As a result, after I call stop on the timer, I can safely remove me reference,
> and I DON'T HAVE TO WAIT.

When you say you don't have to wait, do you mean wait in the WaitForSingleObject
sense, or do you wait for the reference count to reach zero (i.e. your object
can immediately be freed)?  I think at a minimum, you must wait for the
reference count to reach zero, whether through a call to WaitForSingleObject or
letting the dereference handler work asynchronously.  If you are going to wait
for the reference count to reach zero, this isn't any different than waiting for
the destroy callback.

> Is there someway to solve this problem in your model?

I don't see what the problem is.  Can you be more specific about what it is you
want to change in the code - like APIs etc?  Your object model can work fine, I
just don't see why we should change the model throughout the API when there are
more important problems to solve.

- Fab