[Openib-windows] a race in IBAL while CQ destroying

Leonid Keller leonid at mellanox.co.il
Mon May 15 09:19:52 PDT 2006


Hi Fab,
 
While running a test which creates/destroys a lot of CQs from several
threads, I get an assertion 'ref_cnt != 1' from ref_al_obj() in one
thread, while another thread is in the middle of destroy_cq. It seems
like a race between a process of asyncronous cq destroy and other
processes, using it.
I see also other failures like INVALID_CQ_HANDLE, which hints also on
using a released CQ.
 
The assertion in 'ref_cnt != 1' suggests that its an invalid state, but
ref_al_obj() doesn't tell that to the caller.
 
A patch below makes ref_al_obj() to return the ref_cnt and checks this
value in __process_comp_cb() in order to skip the handling of a CQ,
being destroyed.
I guess, there ore more functions, that need to check the results of
ref_al_obj().
 
Thoughts ?
 
 
Index: E:/svn.openib/gen1/trunk/core/al/al_common.c
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/al_common.c (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/al_common.c (working copy)
@@ -352,7 +352,7 @@
  * Increment a reference count on an object.  This object should not be
  * an object's parent.
  */
-void
+int32_t
 ref_al_obj(
  IN    al_obj_t * const   p_obj )
 {
@@ -365,6 +365,7 @@
  CL_ASSERT( ref_cnt != 1 );
 
  CL_EXIT( AL_DBG_AL_OBJ, g_al_dbg_lvl );
+ return ref_cnt;
 }
 
Index: E:/svn.openib/gen1/trunk/core/al/al_common.h
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/al_common.h (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/al_common.h (working copy)
@@ -286,7 +286,7 @@
 /*
  * Increment the reference count on an AL object.
  */
-void
+AL_EXPORT int32_t AL_API
 ref_al_obj(
  IN    al_obj_t * const   p_obj );
 
Index: E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c
===================================================================
--- E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c (revision 347)
+++ E:/svn.openib/gen1/trunk/core/al/user/ual_mgr.c (working copy)
@@ -690,13 +690,15 @@
  IN  comp_cb_ioctl_info_t*   p_comp_cb_info )
 {
  ib_cq_handle_t h_cq;
+ uint32_t ref_cnt;
  CL_ASSERT( p_comp_cb_info->cq_context );
  h_cq = (ib_cq_handle_t)(p_comp_cb_info->cq_context);
 
- ref_al_obj( &h_cq->obj );
+ ref_cnt = ref_al_obj( &h_cq->obj );
 
  CL_ASSERT( h_cq->pfn_user_comp_cb );
- h_cq->pfn_user_comp_cb( h_cq, (void*)h_cq->obj.context );
+ if (ref_cnt > 1)
+  h_cq->pfn_user_comp_cb( h_cq, (void*)h_cq->obj.context );
 
  deref_al_obj( &h_cq->obj );
 }

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ofw/attachments/20060515/8193f98d/attachment.html>


More information about the ofw mailing list