[Openib-windows] a race in IBAL while CQ destroying
Leonid Keller
leonid at mellanox.co.il
Mon May 15 09:19:52 PDT 2006
Hi Fab,
While running a test which creates/destroys a lot of CQs from several
threads, I get an assertion 'ref_cnt != 1' from ref_al_obj() in one
thread, while another thread is in the middle of destroy_cq. It seems
like a race between a process of asyncronous cq destroy and other
processes, using it.
I see also other failures like INVALID_CQ_HANDLE, which hints also on
using a released CQ.
The assertion in 'ref_cnt != 1' suggests that its an invalid state, but
ref_al_obj() doesn't tell that to the caller.
A patch below makes ref_al_obj() to return the ref_cnt and checks this
value in __process_comp_cb() in order to skip the handling of a CQ,
being destroyed.
I guess, there ore more functions, that need to check the results of
ref_al_obj().
Thoughts ?
Index: E:/svn.openib/gen1/trunk/core/al/al_common.c
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/al_common.c (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/al_common.c (working copy)
@@ -352,7 +352,7 @@
* Increment a reference count on an object. This object should not be
* an object's parent.
*/
-void
+int32_t
ref_al_obj(
IN al_obj_t * const p_obj )
{
@@ -365,6 +365,7 @@
CL_ASSERT( ref_cnt != 1 );
CL_EXIT( AL_DBG_AL_OBJ, g_al_dbg_lvl );
+ return ref_cnt;
}
Index: E:/svn.openib/gen1/trunk/core/al/al_common.h
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/al_common.h (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/al_common.h (working copy)
@@ -286,7 +286,7 @@
/*
* Increment the reference count on an AL object.
*/
-void
+AL_EXPORT int32_t AL_API
ref_al_obj(
IN al_obj_t * const p_obj );
Index: E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c (working copy)
@@ -690,13 +690,15 @@
IN comp_cb_ioctl_info_t* p_comp_cb_info )
{
ib_cq_handle_t h_cq;
+ uint32_t ref_cnt;
CL_ASSERT( p_comp_cb_info->cq_context );
h_cq = (ib_cq_handle_t)(p_comp_cb_info->cq_context);
- ref_al_obj( &h_cq->obj );
+ ref_cnt = ref_al_obj( &h_cq->obj );
CL_ASSERT( h_cq->pfn_user_comp_cb );
- h_cq->pfn_user_comp_cb( h_cq, (void*)h_cq->obj.context );
+ if (ref_cnt > 1)
+ h_cq->pfn_user_comp_cb( h_cq, (void*)h_cq->obj.context );
deref_al_obj( &h_cq->obj );
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060515/8193f98d/attachment.html>
More information about the ofw
mailing list