[Openib-windows] Failures on ib_close_al

Leonid Keller leonid at mellanox.co.il
Tue Aug 15 09:06:23 PDT 2006


Comitted in rev. 455, thank you.

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Monday, August 07, 2006 10:08 PM
> To: Leonid Keller
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] Failures on ib_close_al
> 
> Hi Leonid,
> 
> On 8/6/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> >
> > Hi Fab,
> > While preparing IBADM for release we saw one of it tools 
> failing on exit.
> 
> What functionality is there in IBADM?  Will IBADM make it into SVN?
> 
> > In
> > debug version we it produces several assertions:
> >
> >     Assertion Failed:ref_cnt != 1 || p_obj->type == AL_OBJ_TYPE_H_CQ
> >     Assertion Failed:p_list_item->p_list != p_list
> >
> > As far as i can see, it happens, because, while performing of 
> > ib_close_al(), IBAL destroys POOL_KEY before QP.
> >
> > In more details:
> >     - ib_close_al calls destroy() of AL object;
> >     - destroy() of AL takes the first child - CA - and calls its 
> > destroy();
> 
> This should be the last child - child destruction goes from 
> tail to head.  Objects are inserted at the tail of the list 
> (see attach_al_obj).  When an object is destroyed, it 
> processes items starting at the tail.
> 
> >     - destroy() of CA takes the first child - PD - and 
> calls its destroy();
> >     - destroy() of PD takes the first child - AV - and 
> calls its destroy();
> >     - destroy() of PD takes the second child - POOL_KEY - 
> and calls its
> > destroy();    - destroy() of POOL_KEY decreases its ref_cnt 
> to 0 and queues
> > an object
> >     for *asynchronous* destruction of the POOL_KEY. Then it 
> returns to
> > destroy() of PD;
> >     - destroy() of PD takes the third child - QP - and 
> calls its destroy();
> >     - destroy() of QP calls destroying_qp(), which takes 
> the reference 
> > on POOL_KEY before calling dereg_mad_poo();
> >     The latter action causes the first assert:
> >             Assertion Failed:ref_cnt != 1 || p_obj->type == 
> > AL_OBJ_TYPE_H_CQ
> 
> Why is this a CQ?  Something is amiss here.
> 
> >     - then dereg_mad_poo() starts second destroy of POOL_KEY, which 
> > causes the second assert
> >             Assertion Failed:p_list_item->p_list != p_list
> >     on queuing the same object for *asynchronous* 
> destruction of the 
> > POOL_KEY.
> >
> > Do you have any idea how to fix that ?
> 
> The QP probably needs to take a reference on the pool key 
> when it is created, since it stores it.  Right now it takes a 
> reference right before destroying it, but it could already 
> have been destroyed.
> 
> Here's a patch that fixes the pool key reference stuff.  Let 
> me know if this works or not.  If it works, you can check 
> this in or you can let me know and I'll check it in.
> 
> Thanks,
> 
> - Fab
> 
> Index: core/al/al_qp.c
> ===================================================================
> --- core/al/al_qp.c	(revision 440)
> +++ core/al/al_qp.c	(working copy)
> @@ -448,7 +448,14 @@
>  			status = ib_reg_mad_pool( gh_mad_pool, h_pd,
>  				&((al_qp_alias_t*)h_qp)->pool_key );
>  			if( status == IB_SUCCESS )
> +			{
> +				/*
> +				 * Take a reference on the pool 
> key since we don't have a
> +				 * mechanism for the pool key 
> to clear the QP's pointer to it.
> +				 */
> +				ref_al_obj( 
> &((al_qp_alias_t*)h_qp)->pool_key->obj );
>  				*p_pool_key = 
> ((al_qp_alias_t*)h_qp)->pool_key;
> +			}
>  		}
>  		break;
> 
> @@ -1063,12 +1070,13 @@
>  		{
>  			ib_api_status_t		status;
>  			/* Deregister the pool_key. */
> -			ref_al_obj( &p_qp_alias->pool_key->obj );
>  			status = dereg_mad_pool( 
> p_qp_alias->pool_key, AL_KEY_ALIAS );
>  			if( status != IB_SUCCESS )
>  			{
> -				AL_PRINT( TRACE_LEVEL_ERROR, 
> AL_DBG_ERROR, ("dereg_mad_pool
> returned %s.\n",
> +				AL_PRINT( TRACE_LEVEL_ERROR, 
> AL_DBG_ERROR,
> +					("dereg_mad_pool 
> returned %s.\n",
>  					ib_get_err_str(status)) );
> +				/* Release the reference taken 
> when we created the pool key. */
>  				deref_al_obj( 
> &p_qp_alias->pool_key->obj );
>  			}
>  			p_qp_alias->pool_key = NULL;
> 




More information about the ofw mailing list