[Openib-windows] RE: Reference count as a solution to the problem of an object lif e time

Tzachi Dar tzachid at mellanox.co.il
Sun Sep 18 12:37:46 PDT 2005


Hi Fab,

It seems that the only place in which I'm missing you is why there is a need
for a "destroy complete" call-back. According to the method that I mention,
there is no need for that I just call destroy and forget about the object. I
haven't yet figured why the destroy call back is needed on your approach.

Some more small comments inside. 

>-----Original Message-----
>From: Fab Tillier [mailto:ftillier at silverstorm.com]
>Sent: Friday, September 16, 2005 8:07 PM
>To: 'Tzachi Dar'
>Subject: RE: Reference count as a solution to the problem of an object life
>time
>
>> From: Tzachi Dar [mailto:tzachid at mellanox.co.il]
>> Sent: Friday, September 16, 2005 6:51 AM
>>
>> The way that I would like to see, is an improvement of one of the
>suggestions
>> that you have mentioned. I'm adding it bellow.
>>
>> "Yet another solution is to have callback driven objects take an
>interface
>> representing the client's context, providing the object with the ability
>to
>> take and release references on the client's context.  This would allow a
>CQ,
>> for example, to hold a reference on its user's context as long as a
>callback
>> could be outstanding.  This isn't much different than the current
>> implementation, however, aside from providing the ability to take
>additional
>> references on the client's context."
>>
>> The change that I would like to see, is not "providing the object with
>the
>> ability to take reference" but rather *forcing* the object to take the
>> reference.
>
>Just like the access layer must take a reference on an object before
>returning
>that object to a user, the user must take a reference on its context before
>supplying that context to an object.  Putting code in the callee to take
>the
>reference leaves a timing window that shouldn't be there.  It is up to the
>provider of the object to take the reference on its object.  This is
>similar to
>how the IRP_MN_QUERY_INTERFACE process works, where the provider of an
>interface
>takes a reference on that interface on behalf of the caller.
>
There is a difference between the case of one "returning an object" to the
case of one "supplying an object". The difference is when I return an object
I might not want to hold the object anymore. If I remove me reference this
might be the last reference and the object might be destroyed. On the other
hand when I supply an object to someone (by calling his function with a
pointer to the object) I must be having at least one reference and that
reference can not be removed by me until the function returns, so there is
no "window" in which the reference can go back to zero.


>So with the destroy callback we currently have (which can be used to
>dereference
>the object), I believe we have exactly the behavior you are looking for.
>The
>only difference is when the destroy callback is supplied - in the destroy
>function rather than at creation time.
>
As I said, I don't understand why I need a destroy callback and more than
that, why should I wait for one.

>> As my original mail said, everyone who is holding a pointer to the
>> object and would like to use it *must* increase the reference count.
>
>The reference count must be increased before the object pointer is given to
>anyone.  The user therefore doesn't need to increase the reference count at
>all.
>
>> There are
>> other two things that should also apply (those are probably obvious but I
>> would like to make sure that we have a common language): 1) Each object
>should
>> have a Shutdown() function (probably something like a destroy today) 2)
>Each
>> object must have a lock. One of the things that shutdown will do will be
>to
>> flag that the object is in shutdown state. So every thing that will be
>done on
>> the object will be: 1) take the lock (nothing new). 2) Check that the
>object
>> wasn't destroyed yet (again I believe that this is already the case. I'm
>not
>> sure although that this is done when the lock is taken - and if not this
>is a
>> mistake.) The last thing that each object should have is a function that
>will
>> do the entire cleaning, once the reference count reaches 0 ( in the C++
>> methodology, this is the destructor).
>
>I don't think you need to check the state all the time.  If you have a
>reference
>on the object, then it is safe to manipulate it.  The functions
>manipulating the
>object can check the state if appropriate.
>
>For example, there is no need to check the state for calls to ref_al_obj
>and
>deref_al_obj.  Assertions here are sufficient - anyone calling ref_al_obj
>*must*
>already be holding a reference, otherwise the object pointer could be
>stale.
>
I agree that for ref_al and dref_al there is no need to check if an object
is destroyed or not, but for all other cases there is probably a need to
check. This is because the model that I'm referring to allow everyone that
has a reference to the object to call destroy when needed (error, shutdown
or any other reason).

>As an optimization, we should try to use atomic operations for the state
>variables, allowing the code to eliminate the locking around state checks
>and
>changes.  Note that not all state machines can work properly with atomics,
>though, so some cases will still need locks.
>
I agree on the places that it is possible. 

>> Here is an example for a timer object: this is the interface that I think
>that
>> a timer object should have. I'm writing this in C++, as I believe that
>this is
>> a better language, if we agree on the concepts we can always stay in C
>and
>> have the same effects.
>>
>> So here it is:
>>
>> Enum reason {
>>         TimeArrived,
>>         Cancelled,
>>         Shutdown
>>
>> } // There are three reasons why a shutdown might happen: The time
>arrived,
>> the operation was cancalled, or the object (driver) is shutting down.
>>
>> Interface TimerCallback {
>>         Void AddRef();
>>         Void Release();
>>         Void TimerAction(Reason reason);
>> }
>>
>> What I mean here is that if one wants to use the timer he should pass
>three
>> functions: AddRef(), Release() and TimerAction that will be called when
>the
>> time comes. The object itself should probably else be passed, if we stay
>in C.
>>
>> Class Timer {
>>         HREULT Init();
>>         HREULT Schedule(DWORD time, TimerCallBack * pTimer);
>>         HRESULT Cancel();
>>         VOID Shutdown();
>>         /// And very likely also addref, release
>> }
>>
>> And so if I have an object for example an sdp connection, and I want to
>make
>> sure that a function will be called, than all I have to do is this:
>>
>> Connection pConnection; // This is the connection that I have
>>
>> Timer.Schdule(pConnection, 50000);
>>
>> What will happen is this:
>> 1) If I don't want to use pConnection any more I can release it's
>reference
>> and forget it.
>> 2) The call back function will always be called. If the timer is
>cancelled,
>> than the flag will tell that it was cancelled and the object can decide
>to
>> behave in a different way (for example do nothing). The same thing is
>true
>> about shutdown of the system.
>>
>> What I like about this system is that one can do an Async or non Async
>> operation and just forget about it. If you want an object destroyed you
>can
>> always do it, and you don't have to think about the influence of this on
>other
>> parts of the program.
>
>Your Timer example is identical in behavior to how SA queries and other
>asynchronous operations behave in IBAL today - the callback is always
>invoked,
>regardless of the reason.  This gives a consistent usage model for users,
>rather
>than forcing them to handle callback notifications for some cases, and not
>for
>others.  The difference lies on what object gets referenced.  For the SA
>queries, a reference is taken on the AL instance.  The AL instance can
>therefore
>not complete destruction until the SA query callback is invoked and
>unwinds.
>
>The complib objects don't support asynchronous destruction, and I don't
>know if
>they should.  As we develop new drivers, I think we should use the native
>Windows APIs rather than the complib abstraction.  For example, in user-
>mode,
>the timer objects used by the cl_timer abstraction don't support a callback
>driven asynchronous model, but rather an event driven one (see
>DeleteTimerQueueTimer).  Changing this object model be support callback
>driven
>destruction notifications will require an internal thread, and the destroy
>call
>to create an event that this thread waits on.  The thread would then invoke
>the
>callback when that event is set by the Win32 subsystem.  This adds a lot of
>extra complexity that I don't think is necessary, as well as introducing
>more
>error paths that must be handled (what if event creation fails due to
>limited
>resources?)  Rather than creating more abstractions, I would like to see us
>interface to the native API.
>
I agree on this, the "sad" thing is that we are not creating new drivers we
are changing existing ones, so we have to face this more complicated
situation.

>Lastly, the callback model has a few limitations, in that the callbacks
>invoke
>client code.  The reason ib_close_al is currently a synchronous call is
>that
>callback driven completion notifications don't work properly for the last
>call
>the user makes (before unloading).  If the callback release the last
>reference
>on the driver object, the driver object can be unloaded before the callback
>unwinds (I have seen this happen).  The callback then creates an access
>violation and BSOD due to referencing freed memory when it unwinds.  I plan
>on
>extending ib_close_al to be asynchronous.  In user-mode, this would be done
>similarly to DeleteTimerQueueTimer, with the user passing in an event that
>would
>be signaled when the AL instance is destroyed.  In kernel-mode, ib_open_al
>would
>take as input parameter the client's DEVICE_OBJECT.  Because the AL
>instance
>itself does not perform any callbacks (only child objects), the ib_close_al
>function would call ObDereferenceObject once the instance is destroyed.
>Both
>these solutions solve the unwind problem in the best possible manner for
>their
>environment.
>
If we will agree that destroy call backs are not needed for all other
reasons except for shutdown I'll send a description of a mechanism that
allows to solve this problems as well (without interfering with other code)


>It sounds to me like we are in agreement.  Are there specific instances
>where
>reference counting is lacking?  Can we focus the discussion on the actual
>areas
>that need fixing?
>
>Thanks,
>
>- Fab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20050918/971b5a2d/attachment.html>


More information about the ofw mailing list