[Openib-windows] races on __destrot_obj function

Yossi Leybovich sleybo at mellanox.co.il
Thu Jul 6 07:19:36 PDT 2006


Fab
 
We have ran test that move openSM up/down for several minutes.
I got ASSERT in the function __destory_cb :
".....
static void
__destroy_cb(
 IN    cl_async_proc_item_t  *p_item )
{
 cl_obj_t    *p_obj, *p_last_parent;
 
 CL_ASSERT( p_item );
 
 p_obj = PARENT_STRUCT( p_item, cl_obj_t, async_item );
 CL_ASSERT( !p_obj->ref_cnt );
 
...."
 
I tried to track which reference was not deref and found that one of the
mcast  join query still hold reference.
If we look at the code at __destrot_obj we can see that the code wait 10
sec ,then assume that the reference went down and move to destory_cb
 

 if( destroy_type == CL_DESTROY_SYNC )
 {
  if( ref_cnt )
  {
   /* Wait for all other references to go away. */
   cl_event_wait_on( &p_obj->event, 10000000, FALSE );
  }
  __destroy_cb( &p_obj->async_item );
 } 
 
unfortunately in the IPoIB registry configuration we were configured to
10 retries and timeout of 1000ms (10 sec)
So we end up going to the destroy_cb while we still hold reference.
Don't you think that it will be better to put infinite time in
cl_event_wait_on and wait till all the references will be returned?
I don't think any value can predict how much time it will take to
ref_cnt to go to 0.
We better find the problem and wait for ref_cnt to be 0, other then
cont. and get blue screen.(We ran this test in free version and got blue
screen)
 
BTW
Do we have way in our stack to track which client did not return its
reference ?
To debug this problem I create array that each index present a call to
the cl_obj_ref on IPoIB port obj.
Each call to ref increment the array and each call to deref decrement
the array in that way I found which client did not return reference.
(there are cases that one call to ref maps to few calls of deref )
Do you have another suggestion how to debug this problems? I know leonid
also chasing lost ref in the IBAL.
Do you think we should add the ref tracking code to the trunk ?
 
Yossi 
 
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060706/5c9bbda6/attachment.html>


More information about the ofw mailing list