[Openib-windows] RE: Reference count as a solution to the problem of an object lif e time

Wed Sep 21 07:37:22 PDT 2005

See also my answer to Sean.

One more thing to notice is that I see a difference between "destroying" the
object and freeing the memory that it is using.

>-----Original Message-----
>From: Fab Tillier [mailto:ftillier at silverstorm.com]
>Sent: Tuesday, September 20, 2005 12:51 AM
>To: 'Tzachi Dar'; openib-windows at openib.org
>Subject: RE: Reference count as a solution to the problem of an object life
>time
>
>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> Sent: Monday, September 19, 2005 1:45 PM
>>
>> Hi Fab,
>>
>> Perhaps I'm wrong about this, but there is one difference between the two
>> models.
>>
>> I believe that this difference is what forces one to wait for the destroy
>> completion. This difference is when the number of call backs is not known
>to
>> the user. In my approach the library can play with the reference count
>and
>> make sure that it is increased even when there are more than one
>callback,
>> while on your model (one removes the reference in the callback) one can
>not
>> know when to remove the reference.
>
>You do know when to remove the reference - in the destroy callback.  Once
>the
>deref call is invoked, no further callbacks will ever occur for that
>object.
>
>It's a slight change in logic.  Your model takes a reference when it needs
>to
>invoke a callback, and releases it after the callback returns.  The IBAL
>object
>model holds a reference for the lifetime of the object, and releases it
>when
>that object is destroyed.  Obviously, this doesn't support sharing objects
>between multiple users well, but this hasn't been a problem yet.
>
>If you're sharing an object, waiting until that object is destroyed does
>keep a
>user's context around longer than might be necessary.  But objects aren't
>shared
>in any of the public APIs, so this isn't a problem as far as I know.
This is probably not a problem now, but I want to allow a more general model
in which our internal and external API's are consistent and can be used
better. Please note that we had a problem in which objects were destroyed
while they were busy. In my approach there is no problem in destroying an
object while it is busy.

>> A good example for this is the CM API's. If
>> I understand correctly, one call CM_REQ and doesn't know how many call
>backs
>> there will be. For example I have received a REP, and if I was too busy
>to
>> answer I have received a DREQ. As I understood it, the only way to know
>that
>> the last callback was sent is to wait for the completion of the destroy
>of the
>> QP.
>
>The current CM design invokes the CM callbacks on a QP, and you can't
>destroy
>the QP until all its associated callbacks have been either delivered or
>cancelled.  I don't see how your model would change that.  The user's QP
>context
>can't be freed until the access layer notifies the user that it is safe to
>do
>so.  I don't see how your model helps this.
In my model, the QP can be destroyed whenever you want. If there are CM
callbacks, than the QP is destroyed, but memory isn't freed. Once the CM has
finished doing his callbacks, (either doing them or not) he will release the
(last) reference and the memory will be freed. This is why I belive that
this method is so good.

>I believe that in the current code, all pending (not in flight) events are
>flushed as soon as you initiate destruction of the QP.  If you initiate QP
>destruction from the REP callback, the REJ gets flushed, and when the REP
>callback unwinds, reference counts get released and the queue pair's
>reference
>count goes to zero and your destroy callback gets invoked.
>
Please note that when I start the destroy of the QP I don't know it's state.
(It's state might be changing a micro second ago) so I don't know what
callbacks should happen. As a result I have to wait and it complicates the
code.

>> A similar example is a timer object that works automatically, that is you
>set
>> it once and every second, you get a call back. A new callback, is started
>even
>> if the previous wasn't. In this model, when I want to stop things, I just
>> can't know when the last call back will happen. The only way to solve
>this is
>> to call stop on timer, wait (event, callback or whatever) for the timer
>to
>> stop and than remove my reference (I want to make this clear there might
>still
>> be others using this object!!!).
>
>So you want to have a timer that can invoke multiple user's callbacks?
>This
>introduces a new object model - currently, there is only one recipient of
>callbacks.  For a multi-client object, having the ability to take
>references
>would help let clients deregister without having to wait until all clients
>have
>deregistered.  Shutdown/destroy/whatever wouldn't really destroy the timer,
>it
>would just deregister the callback, and when there are no references left,
>would
>implicitly destroy the timer.  This isn't a timer object anymore, as it
>introduces the notion of event dispatching to allow multiplexing to
>multiple
>clients.  For this, I agree that allowing the dispatcher to take references
>on
>each client can be helpful.
>
>This is only an issue for multi-client objects - single user objects don't
>need
>the ability to take extra references.  I don't see any need for multi-
>client
>objects being exposed in the API - maybe if you could explain what you're
>trying
>to do it might make more sense to me.
>
The problem is not multi-client objects but rather one object that has more
than one call back. If you look at the CM state diagram  page 687 in the IB
spec you will see that the numbers of callback possible at any state is
large. As a result I don't know when I have received the last one and I have
to wait. (see also bellow)

>> On the model that I propose, the timer will increase the reference count
>> before each callback and will decrease the reference after the callback.
>> As a result, after I call stop on the timer, I can safely remove me
>reference,
>> and I DON'T HAVE TO WAIT.
>
>When you say you don't have to wait, do you mean wait in the
>WaitForSingleObject
>sense, or do you wait for the reference count to reach zero (i.e. your
>object
>can immediately be freed)?  I think at a minimum, you must wait for the
>reference count to reach zero, whether through a call to
>WaitForSingleObject or
>letting the dereference handler work asynchronously.  If you are going to
>wait
>for the reference count to reach zero, this isn't any different than
>waiting for
>the destroy callback.
I mean wait in both meaning. In my model I don't have to wait for anything -
the object will destroy itself once it's reference has reached zero.

>> Is there someway to solve this problem in your model?
>
>I don't see what the problem is.  Can you be more specific about what it is
>you
>want to change in the code - like APIs etc?  Your object model can work
>fine, I
>just don't see why we should change the model throughout the API when there
>are
>more important problems to solve.
As was descried above and in my answer to Sean Hefty my model makes life
simpler (especially I don't like the fact that I have to wait on a QP
destroy (more than this the problem is probably that I don't know what to
wait for, If I new exactly how many call backs there will be it would be
probably fine)). 
I believe that if we are doing a revolution in our API (and our
implementation) than we should consider doing this change as well.

>
>- Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050921/e7ecdc2a/attachment.html>