[Openib-windows] Failures on ib_close_al

Fabian Tillier ftillier at silverstorm.com
Mon Aug 7 12:08:04 PDT 2006


Hi Leonid,

On 8/6/06, Leonid Keller <leonid at mellanox.co.il> wrote:
>
> Hi Fab,
> While preparing IBADM for release we saw one of it tools failing on exit.

What functionality is there in IBADM?  Will IBADM make it into SVN?

> In
> debug version we it produces several assertions:
>
>     Assertion Failed:ref_cnt != 1 || p_obj->type == AL_OBJ_TYPE_H_CQ
>     Assertion Failed:p_list_item->p_list != p_list
>
> As far as i can see, it happens, because, while performing of ib_close_al(),
> IBAL destroys POOL_KEY before QP.
>
> In more details:
>     - ib_close_al calls destroy() of AL object;
>     - destroy() of AL takes the first child - CA - and calls its destroy();

This should be the last child - child destruction goes from tail to
head.  Objects are inserted at the tail of the list (see
attach_al_obj).  When an object is destroyed, it processes items
starting at the tail.

>     - destroy() of CA takes the first child - PD - and calls its destroy();
>     - destroy() of PD takes the first child - AV - and calls its destroy();
>     - destroy() of PD takes the second child - POOL_KEY - and calls its
> destroy();    - destroy() of POOL_KEY decreases its ref_cnt to 0 and queues
> an object
>     for *asynchronous* destruction of the POOL_KEY. Then it returns to
> destroy() of PD;
>     - destroy() of PD takes the third child - QP - and calls its destroy();
>     - destroy() of QP calls destroying_qp(), which takes the reference on
> POOL_KEY before calling dereg_mad_poo();
>     The latter action causes the first assert:
>             Assertion Failed:ref_cnt != 1 || p_obj->type == AL_OBJ_TYPE_H_CQ

Why is this a CQ?  Something is amiss here.

>     - then dereg_mad_poo() starts second destroy of POOL_KEY, which causes
> the second assert
>             Assertion Failed:p_list_item->p_list != p_list
>     on queuing the same object for *asynchronous* destruction of the
> POOL_KEY.
>
> Do you have any idea how to fix that ?

The QP probably needs to take a reference on the pool key when it is
created, since it stores it.  Right now it takes a reference right
before destroying it, but it could already have been destroyed.

Here's a patch that fixes the pool key reference stuff.  Let me know
if this works or not.  If it works, you can check this in or you can
let me know and I'll check it in.

Thanks,

- Fab

Index: core/al/al_qp.c
===================================================================
--- core/al/al_qp.c	(revision 440)
+++ core/al/al_qp.c	(working copy)
@@ -448,7 +448,14 @@
 			status = ib_reg_mad_pool( gh_mad_pool, h_pd,
 				&((al_qp_alias_t*)h_qp)->pool_key );
 			if( status == IB_SUCCESS )
+			{
+				/*
+				 * Take a reference on the pool key since we don't have a
+				 * mechanism for the pool key to clear the QP's pointer to it.
+				 */
+				ref_al_obj( &((al_qp_alias_t*)h_qp)->pool_key->obj );
 				*p_pool_key = ((al_qp_alias_t*)h_qp)->pool_key;
+			}
 		}
 		break;

@@ -1063,12 +1070,13 @@
 		{
 			ib_api_status_t		status;
 			/* Deregister the pool_key. */
-			ref_al_obj( &p_qp_alias->pool_key->obj );
 			status = dereg_mad_pool( p_qp_alias->pool_key, AL_KEY_ALIAS );
 			if( status != IB_SUCCESS )
 			{
-				AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR, ("dereg_mad_pool
returned %s.\n",
+				AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR,
+					("dereg_mad_pool returned %s.\n",
 					ib_get_err_str(status)) );
+				/* Release the reference taken when we created the pool key. */
 				deref_al_obj( &p_qp_alias->pool_key->obj );
 			}
 			p_qp_alias->pool_key = NULL;
-------------- next part --------------
Index: core/al/al_qp.c
===================================================================
--- core/al/al_qp.c	(revision 440)
+++ core/al/al_qp.c	(working copy)
@@ -448,7 +448,14 @@
 			status = ib_reg_mad_pool( gh_mad_pool, h_pd,
 				&((al_qp_alias_t*)h_qp)->pool_key );
 			if( status == IB_SUCCESS )
+			{
+				/*
+				 * Take a reference on the pool key since we don't have a
+				 * mechanism for the pool key to clear the QP's pointer to it.
+				 */
+				ref_al_obj( &((al_qp_alias_t*)h_qp)->pool_key->obj );
 				*p_pool_key = ((al_qp_alias_t*)h_qp)->pool_key;
+			}
 		}
 		break;
 
@@ -1063,12 +1070,13 @@
 		{
 			ib_api_status_t		status;
 			/* Deregister the pool_key. */
-			ref_al_obj( &p_qp_alias->pool_key->obj );
 			status = dereg_mad_pool( p_qp_alias->pool_key, AL_KEY_ALIAS );
 			if( status != IB_SUCCESS )
 			{
-				AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR, ("dereg_mad_pool returned %s.\n",
+				AL_PRINT( TRACE_LEVEL_ERROR, AL_DBG_ERROR,
+					("dereg_mad_pool returned %s.\n",
 					ib_get_err_str(status)) );
+				/* Release the reference taken when we created the pool key. */
 				deref_al_obj( &p_qp_alias->pool_key->obj );
 			}
 			p_qp_alias->pool_key = NULL;


More information about the ofw mailing list