[Openib-windows] RE: Reference count as a solution to the pro blem of an object lif e time

Wed Sep 21 13:27:19 PDT 2005

Hi Fab, 
It seems that I still don't get your model (sorry for that)
Just a small question that might help me understand it. Why do you call your
model a reference counting model. If I understand correctly you only take
the reference once, and when you get the callback you remove it.

Is their any chance that the reference will ever reach 5 for example? (or 10
or 20). It still seems to me as an exist don't exist one.

Thanks
Tzachi

>-----Original Message-----
>From: Fabian Tillier [mailto:ftillier at silverstorm.com]
>Sent: Wednesday, September 21, 2005 9:11 PM
>To: Tzachi Dar
>Cc: openib-windows at openib.org
>Subject: Re: [Openib-windows] RE: Reference count as a solution to the
>problem of an object lif e time
>
>On 9/21/05, Tzachi Dar <tzachid at mellanox.co.il> wrote:
>>
>>
>> See also my answer to Sean.
>>
>> One more thing to notice is that I see a difference between "destroying"
>the
>> object and freeing the memory that it is using.
>>
>> >-----Original Message-----
>> >From: Fab Tillier [mailto:ftillier at silverstorm.com]
>> >Sent: Tuesday, September 20, 2005 12:51 AM
>> >
>> >> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> >> Sent: Monday, September 19, 2005 1:45 PM
>> >>
>> >> Hi Fab,
>> >>
>> >> Perhaps I'm wrong about this, but there is one difference between the
>two
>> >> models.
>> >>
>> >> I believe that this difference is what forces one to wait for the
>destroy
>> >> completion. This difference is when the number of call backs is not
>known
>> >> to
>> >> the user. In my approach the library can play with the reference count
>> >> and
>> >> make sure that it is increased even when there are more than one
>> >> callback,
>> >> while on your model (one removes the reference in the callback) one
>can
>> >> not
>> >> know when to remove the reference.
>> >
>> >You do know when to remove the reference - in the destroy callback.
>Once
>> >the
>> >deref call is invoked, no further callbacks will ever occur for that
>> >object.
>> >
>> >It's a slight change in logic.  Your model takes a reference when it
>needs
>> >to
>> >invoke a callback, and releases it after the callback returns.  The IBAL
>> >object
>> >model holds a reference for the lifetime of the object, and releases it
>> >when
>> >that object is destroyed.  Obviously, this doesn't support sharing
>objects
>> >between multiple users well, but this hasn't been a problem yet.
>> >
>> >If you're sharing an object, waiting until that object is destroyed does
>> >keep a
>> >user's context around longer than might be necessary.  But objects
>aren't
>> >shared
>> >in any of the public APIs, so this isn't a problem as far as I know.
>>
>> This is probably not a problem now, but I want to allow a more general
>model
>> in which our internal and external API's are consistent and can be used
>> better.
>
>I don't see the value in this, as IB resources are not designed to be
>shared, and sharing them will require a lot more work to properly
>synchronize things - what if a shared QP is driven into two states by
>two different clients simulatenously?
>
>As I've said before, your model forces a policy on the clients - the
>API forces users to implement reference counting and use a destroy
>method in which the last deref cleans the object up.  This may or may
>not fit into their model.  The current model allows a client to use
>reference counting to control the lifetime of an object, but does not
>force an API on the client for how to do so.
>
>Beyond the thoretical, I don't see any client needing this.  Until
>then, I would delay any implementation changes to do this as it will
>just slow other more important things down.
>
>> Please note that we had a problem in which objects were destroyed
>> while they were busy. In my approach there is no problem in destroying an
>> object while it is busy.
>
>Was this problem in the access layer?  In either case, it sounds like
>a bug to me and needs to be fixed.  I don't think fixing the bug
>requires changing the object model - the buggy implementation is
>clearly not using the destroy callback properly.
>
>> >> A good example for this is the CM API's. If
>> >> I understand correctly, one call CM_REQ and doesn't know how many call
>> >> backs
>> >> there will be. For example I have received a REP, and if I was too
>busy
>> >> to
>> >> answer I have received a DREQ. As I understood it, the only way to
>know
>> >> that
>> >> the last callback was sent is to wait for the completion of the
>destroy
>> >> of the
>> >> QP.
>> >
>> >The current CM design invokes the CM callbacks on a QP, and you can't
>> >destroy
>> >the QP until all its associated callbacks have been either delivered or
>> >cancelled.  I don't see how your model would change that.  The user's QP
>> >context
>> >can't be freed until the access layer notifies the user that it is safe
>to
>> >do
>> >so.  I don't see how your model helps this.
>>
>> In my model, the QP can be destroyed whenever you want. If there are CM
>> callbacks, than the QP is destroyed, but memory isn't freed. Once the CM
>has
>> finished doing his callbacks, (either doing them or not) he will release
>the
>> (last) reference and the memory will be freed. This is why I belive that
>> this method is so good.
>
>What if your CM callbacks try to perform state changes on the QP?  Now
>you need to add logic in the CM callbacks to make sure that you
>haven't destroyed the QP already.  Further, you need to add logic in
>the access layer to check that you're not trying to use a destroyed
>object, since the client's check in their CM callback could race with
>the destruction.
>
>What is the harm in keeping the QP around until all callbacks have
>completed?  I don't see the value in destroying the QP sooner - what
>are you accomplishing with this, aside from complicating the logic at
>all levels in the stack?
>
>> >I believe that in the current code, all pending (not in flight) events
>are
>> >flushed as soon as you initiate destruction of the QP.  If you initiate
>QP
>> >destruction from the REP callback, the REJ gets flushed, and when the
>REP
>> >callback unwinds, reference counts get released and the queue pair's
>> >reference
>> >count goes to zero and your destroy callback gets invoked.
>> >
>> Please note that when I start the destroy of the QP I don't know it's
>state.
>> (It's state might be changing a micro second ago) so I don't know what
>> callbacks should happen. As a result I have to wait and it complicates
>the
>> code.
>
>You should have a reference on your context because the QP exists.
>When that QP is destroyed, the reference will be released.  This is
>independent of what callbacks are pending, the QP state, etc.  If the
>CM was in the process of invoking a callback, you would have to wait
>in your object model too.  Your solution doesn't avoid the need to
>wait.
>
>> >> A similar example is a timer object that works automatically, that is
>you
>> >> set
>> >> it once and every second, you get a call back. A new callback, is
>started
>> >> even
>> >> if the previous wasn't. In this model, when I want to stop things, I
>just
>> >> can't know when the last call back will happen. The only way to solve
>> >> this is
>> >> to call stop on timer, wait (event, callback or whatever) for the
>timer
>> >> to
>> >> stop and than remove my reference (I want to make this clear there
>might
>> >> still
>> >> be others using this object!!!).
>> >
>> >So you want to have a timer that can invoke multiple user's callbacks?
>> >This
>> >introduces a new object model - currently, there is only one recipient
>of
>> >callbacks.  For a multi-client object, having the ability to take
>> >references
>> >would help let clients deregister without having to wait until all
>clients
>> >have
>> >deregistered.  Shutdown/destroy/whatever wouldn't really destroy the
>timer,
>> >it
>> >would just deregister the callback, and when there are no references
>left,
>> >would
>> >implicitly destroy the timer.  This isn't a timer object anymore, as it
>> >introduces the notion of event dispatching to allow multiplexing to
>> >multiple
>> >clients.  For this, I agree that allowing the dispatcher to take
>references
>> >on
>> >each client can be helpful.
>> >
>> >This is only an issue for multi-client objects - single user objects
>don't
>> >need
>> >the ability to take extra references.  I don't see any need for multi-
>> >client
>> >objects being exposed in the API - maybe if you could explain what
>you're
>> >trying
>> >to do it might make more sense to me.
>>
>> The problem is not multi-client objects but rather one object that has
>more
>> than one call back. If you look at the CM state diagram  page 687 in the
>IB
>> spec you will see that the numbers of callback possible at any state is
>> large. As a result I don't know when I have received the last one and I
>have
>> to wait. (see also bellow)
>
>You have to wait as long as your object has a reference on it.  This
>is true of any reference counting model.  It is only safe to release
>the reference in the destroy callback, no matter what CM callbacks are
>pending.
>
>Also, while the number of events that can be generated from any state
>in the CM state diagram is large, multiple events cannot be generated
>from any state - events are associated with a state change.  Further
>events could happen from the new state, and a new event notification
>be queued, but there's nothing that would prevent that - it's just a
>fact of the state machine.
>
>> >> On the model that I propose, the timer will increase the reference
>count
>> >> before each callback and will decrease the reference after the
>callback.
>> >> As a result, after I call stop on the timer, I can safely remove me
>> >reference,
>> >> and I DON'T HAVE TO WAIT.
>> >
>> >When you say you don't have to wait, do you mean wait in the
>> >WaitForSingleObject
>> >sense, or do you wait for the reference count to reach zero (i.e. your
>> >object
>> >can immediately be freed)?  I think at a minimum, you must wait for the
>> >reference count to reach zero, whether through a call to
>> >WaitForSingleObject or
>> >letting the dereference handler work asynchronously.  If you are going
>to
>> >wait
>> >for the reference count to reach zero, this isn't any different than
>> >waiting for
>> >the destroy callback.
>>
>> I mean wait in both meaning. In my model I don't have to wait for
>anything -
>> the object will destroy itself once it's reference has reached zero.
>
>No, in your model you have to wait for the reference count to reach
>zero.  In the existing model you have to wait for the reference count
>to reach zero.  In the existing model, you decrement your reference
>count from the destroy callback.  There is no more or less waiting
>required than in your model than the existing one.
>
>THE DESTROY CALLBACK IS EQUIVALENT TO A DEREF CALLBACK!!!
>
>> >> Is there someway to solve this problem in your model?
>> >
>> >I don't see what the problem is.  Can you be more specific about what it
>is
>> >you
>> >want to change in the code - like APIs etc?  Your object model can work
>> >fine, I
>> >just don't see why we should change the model throughout the API when
>there
>> >are
>> >more important problems to solve.
>>
>> As was descried above and in my answer to Sean Hefty my model makes life
>> simpler (especially I don't like the fact that I have to wait on a QP
>> destroy (more than this the problem is probably that I don't know what to
>> wait for, If I new exactly how many call backs there will be it would be
>> probably fine)).
>
>You don't care how many callbacks it will be - the only callback from
>which it is safe to release your reference is the destroy callback.
>Period, end of story.  Anything else is a bug.
>
>As to destruction being async, this is a requirement to allow clients
>to initiate destruction from DISPATCH.  This will not go away, just
>the opposite.
>
>> I believe that if we are doing a revolution in our API (and our
>> implementation) than we should consider doing this change as well.
>
>I think it's clear that I've considered the suggestion from the length
>of this exchange.  I still don't see the value in changing the APIs to
>do this, and I think your desire to do so comes from a
>misunderstanding of how to use the current API to achieve exactly the
>same results as what you are looking for.
>
>The current object model is pretty simple - take a reference on your
>context when you create a QP, and release that reference from the QP's
>destroy callback.  It's irrelevant how many CM callbacks are invoked
>in between these two events, a reference on your context should exist
>as long as the QP is alive.  If you do this, taking an extra reference
>for each CM callback accomplishes nothing.
>
>- Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050921/447bf0b7/attachment.html>