[ofw] IBAL CEP reference counting is... interesting

Fab Tillier ftillier at windows.microsoft.com
Fri Jan 9 15:17:36 PST 2009


> So, it's not a mad count either... maybe the fix is to change the name
> to outstanding_mads and initialize it to 0.  I have no idea if such a
> simple change will work, but I can look into it.  This should make the
> code cleaner, but won't fix the hang problem.

Ok, fine, it's a reference to trigger invoking the destroy callback when all MADs have completed.  Doesn't change the fact that there isn't anything wrong with this part of the code - it works as intended.

>>> For large clusters, the CM timeout can be huge.  My idea to fix
>>> this was to have the DREQ sent once without being tied to the cep
>>> if initiated from the destroy call.  Comments?
>>
>> Why not cancel the DREQ in __cleanup_cep if you're in the DREQ_SENT
>> state when destroying?  Any MAD that is sent via __cep_send_retry
>> (any MAD that is retried until the CEP manager cancels it) sets
>> p_cep->p_send_mad.  Use that to cancel in __cleanup_cep.  I think
>> this will give you the behavior you want: the DREQ process gets
>> aborted if
> the CEP is destroyed by the app.
>
> The __cleanup_cep() call is what sends the DREQ in the first place...
> The cep enters the function in the established state.

When it falls through to the DREQ_SENT state, cancel the MAD.  That way you don't send the DREQ just once if the user is going to wait for the DREP.

> I was looking at changing __dreq_cep() to use __cep_send_mad(), which
> doesn't take a reference on the cep.

Then you affect DREQ processing for all CEPs, even those where the caller will wait for the DREP (or DREQ timeout).  You'll effectively break that functionality.  What's wrong with canceling the DREQ in the DREQ_SENT case?

-Fab




More information about the ofw mailing list