[Openib-windows] Failures on ib_close_al
Leonid Keller
leonid at mellanox.co.il
Tue Aug 15 09:06:23 PDT 2006
Comitted in rev. 455, thank you.
> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com]
> On Behalf Of Fabian Tillier
> Sent: Monday, August 07, 2006 10:08 PM
> To: Leonid Keller
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] Failures on ib_close_al
>
> Hi Leonid,
>
> On 8/6/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> >
> > Hi Fab,
> > While preparing IBADM for release we saw one of it tools
> failing on exit.
>
> What functionality is there in IBADM? Will IBADM make it into SVN?
>
> > In
> > debug version we it produces several assertions:
> >
> > Assertion Failed:ref_cnt != 1 || p_obj->type == AL_OBJ_TYPE_H_CQ
> > Assertion Failed:p_list_item->p_list != p_list
> >
> > As far as i can see, it happens, because, while performing of
> > ib_close_al(), IBAL destroys POOL_KEY before QP.
> >
> > In more details:
> > - ib_close_al calls destroy() of AL object;
> > - destroy() of AL takes the first child - CA - and calls its
> > destroy();
>
> This should be the last child - child destruction goes from
> tail to head. Objects are inserted at the tail of the list
> (see attach_al_obj). When an object is destroyed, it
> processes items starting at the tail.
>
> > - destroy() of CA takes the first child - PD - and
> calls its destroy();
> > - destroy() of PD takes the first child - AV - and
> calls its destroy();
> > - destroy() of PD takes the second child - POOL_KEY -
> and calls its
> > destroy(); - destroy() of POOL_KEY decreases its ref_cnt
> to 0 and queues
> > an object
> > for *asynchronous* destruction of the POOL_KEY. Then it
> returns to
> > destroy() of PD;
> > - destroy() of PD takes the third child - QP - and
> calls its destroy();
> > - destroy() of QP calls destroying_qp(), which takes
> the reference
> > on POOL_KEY before calling dereg_mad_poo();
> > The latter action causes the first assert:
> > Assertion Failed:ref_cnt != 1 || p_obj->type ==
> > AL_OBJ_TYPE_H_CQ
>
> Why is this a CQ? Something is amiss here.
>
> > - then dereg_mad_poo() starts second destroy of POOL_KEY, which
> > causes the second assert
> > Assertion Failed:p_list_item->p_list != p_list
> > on queuing the same object for *asynchronous*
> destruction of the
> > POOL_KEY.
> >
> > Do you have any idea how to fix that ?
>
> The QP probably needs to take a reference on the pool key
> when it is created, since it stores it. Right now it takes a
> reference right before destroying it, but it could already
> have been destroyed.
>
> Here's a patch that fixes the pool key reference stuff. Let
> me know if this works or not. If it works, you can check
> this in or you can let me know and I'll check it in.
>
> Thanks,
>
> - Fab
>
> Index: core/al/al_qp.c
> ===================================================================
> --- core/al/al_qp.c (revision 440)
> +++ core/al/al_qp.c (working copy)
> @@ -448,7 +448,14 @@
> status = ib_reg_mad_pool( gh_mad_pool, h_pd,
> &((al_qp_alias_t*)h_qp)->pool_key );
> if( status == IB_SUCCESS )
> + {
> + /*
> + * Take a reference on the pool
> key since we don't have a
> + * mechanism for the pool key
> to clear the QP's pointer to it.
> + */
> + ref_al_obj(
> &((al_qp_alias_t*)h_qp)->pool_key->obj );
> *p_pool_key =
> ((al_qp_alias_t*)h_qp)->pool_key;
> + }
> }
> break;
>
> @@ -1063,12 +1070,13 @@
> {
> ib_api_status_t status;
> /* Deregister the pool_key. */
> - ref_al_obj( &p_qp_alias->pool_key->obj );
> status = dereg_mad_pool(
> p_qp_alias->pool_key, AL_KEY_ALIAS );
> if( status != IB_SUCCESS )
> {
> - AL_PRINT( TRACE_LEVEL_ERROR,
> AL_DBG_ERROR, ("dereg_mad_pool
> returned %s.\n",
> + AL_PRINT( TRACE_LEVEL_ERROR,
> AL_DBG_ERROR,
> + ("dereg_mad_pool
> returned %s.\n",
> ib_get_err_str(status)) );
> + /* Release the reference taken
> when we created the pool key. */
> deref_al_obj(
> &p_qp_alias->pool_key->obj );
> }
> p_qp_alias->pool_key = NULL;
>
More information about the ofw
mailing list