<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.2873" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2>Fab</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>We have ran test
that move openSM up/down for several minutes.</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>I got ASSERT in the
function __destory_cb :</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2>".....</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>static
void<BR>__destroy_cb(<BR> IN cl_async_proc_item_t *p_item
)<BR>{<BR> cl_obj_t *p_obj,
*p_last_parent;</FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2> CL_ASSERT(
p_item );</FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2> p_obj =
PARENT_STRUCT( p_item, cl_obj_t, async_item );<BR><STRONG> CL_ASSERT(
!p_obj->ref_cnt );</STRONG></FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><STRONG><FONT face=Arial
size=2></FONT></STRONG></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2>...."</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><STRONG><FONT face=Arial
size=2></FONT></STRONG></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>I tried to track
which reference was not deref and found that one of the mcast join query
still hold reference.</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>If we look at the
code at __destrot_obj we can see that the code wait 10 sec ,then assume that the
reference went down and move to destory_cb</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006></SPAN> </DIV><SPAN
class=318342013-06072006><FONT face=Arial size=2>
<DIV><BR> if( destroy_type == CL_DESTROY_SYNC
)<BR> {<BR> if( ref_cnt
)<BR> {<BR> /* Wait for all other references to go
away. */<BR> cl_event_wait_on( &p_obj->event,
<STRONG>10000000</STRONG>, FALSE
);<BR> }<BR> __destroy_cb( &p_obj->async_item
);<BR> }</FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV></SPAN><SPAN class=318342013-06072006><FONT face=Arial
size=2>unfortunately in the IPoIB registry configuration we were
configured to 10 retries and timeout of 1000ms (10 sec)</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>So we end up going
to the destroy_cb while we still hold reference.</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>Don't you think that
it will be better to put infinite time in cl_event_wait_on and wait till all the
references will be returned?</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>I don't think any
value can predict how much time it will take to ref_cnt to go to
0.</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>We better find the
problem and wait for ref_cnt to be 0, other then cont. and get blue
screen.(</FONT></SPAN><SPAN class=318342013-06072006><FONT face=Arial size=2>We
ran this test in free version and got blue screen)</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2>BTW</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>Do we have way in
our stack to track which client did not return its reference
?</FONT></SPAN></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial size=2>To debug this
problem I create array that each index present a call to the cl_obj_ref on IPoIB
port obj.</FONT></SPAN></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=318342013-06072006>Each call to
ref increment the array and each call to deref decrement the array in that way I
found which client did not return reference.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=318342013-06072006>(there are
cases that one call to ref maps to few calls of deref
)</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=318342013-06072006>Do you have
another suggestion how to debug this problems? I know leonid also chasing lost
ref in the IBAL.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=318342013-06072006>Do you think
we should add the ref tracking code to the trunk ?</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN
class=318342013-06072006></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=318342013-06072006>Yossi
</SPAN></FONT></FONT></DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><FONT face=Arial
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=318342013-06072006><STRONG><FONT face=Arial
size=2></FONT></STRONG></SPAN> </DIV></BODY></HTML>