[Openib-windows] A problem in ib_close_al

Wed Jul 12 09:32:47 PDT 2006

Hi Leonid,

On 7/12/06, Leonid Keller <leonid at mellanox.co.il> wrote:
>
> > -----Original Message-----
> > From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com]
> > On Behalf Of Fabian Tillier
> > Sent: Tuesday, July 11, 2006 9:18 PM
> >
> > Hi Leo,
> >
> > On 7/10/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> > > Hi Fab,
> > > Our regression uses to run opensm and then kill it at some moment.
> > > Sometimes opensm enters "zombi" state and there is no way
> > > to kill it.
> > > I've investigated that and found that it is stuck on
> > > infinite loop in
> > > sync_destroy_obj while destroying opensm's instance of AL.
> > > I saw ref_cnt = 13 and mad_list of AL object contained 13 records.
> >
> > The proxy code should free all outstanding MADs before
> > calling ib_close_al - see the code path in al_dev_close
> > (al_dev.c at 371).  The call to __proxy_cleanup_map will free
> > all MADs queued internally.  Do you know what these MADs are
> > from?  Are they received MADs that were not delivered to the
> > client?  Or are they send MADs that are waiting to be sent?
> >
> I don't know so far how to reproduce that so:
> Is there a way to tell received MADs from sent ones apart from adding a
> mad_type field anywhere and setting it after every call to ib_get_mad()
> ?

You can look at the MAD header - the method should indicate whether it
is a send or a receive.  The ib_get_mad function cannot set whether
the MAD will be used for a send or receive, as that is not an input
parameter.  It cannot be an input parameter anyway as it is valid for
a received MAD to be used to send a response without putting the MAD
back into the pool.

> > > I think, the problem is in that, that mad_list is freed in
> > > free_al()
> > > which can't be called before ref_cnt = 0, which can become
> > > such only
> > > after calling free_al().
> >
> > Yes, we have a chicken and egg issue here.  The free_mads
> > function is there to do internal cleanup if we proceed with
> > destruction when the reference count doesn't hit zero.  Note
> > that this can only happen in a debug build (where destruction
> > will eventually timeout and blow the object away).
> >
> > > I've prepared a patch which calls __free_mads() from AL destroying
> > > function.
> >
> > We could call __free_mads from the destroying_al function,
> > but this wouldn't necessarilly prevent MADs from being queued
> > after that function is called, but before the AL object moves
> > to the cleanup phase of destruction (after all references are
> > released).

I just realized that we cannot call __free_mads from the destroying_al
function - it is never safe to reclaim a MAD that a user is using.

> I'm working now on a stuck during shutdown after regression, which (the
> stuck) we experiences constantly. The function stuck is
> ib_deregister_ca(), which stands in sync_destroy_obj(), waiting for CA
> to release it, which is in turn hold by its PD object.
> Today I revealed that the PD object is hold by several mads, taken once
> from the pool_key of CI_CA. It reminds the problem, being discussed here
> and I'll appreciate if you could suggest an idea, why wouldn't these
> mads be released by __cleanup_pool_key().

You can't pull MADs back from a user if that user is using it.  The
user *must* return the MADs.  The MAD tracking list in IBAL is for
debugging purposes only.  If a MAD is in the tracking list it means
that a client has pulled that MAD out of the pool.  It is not
technically safe to just put the MAD back in the pool, as the user may
still be accessing it.  If the user later tries to return the MAD, you
will very likely get a system crash.

> For now I have an idea that probably could solve both problems:
> What if we forbid inserting of new MADs while the AL object is being
> destroyed ?

The tracking list is there only to track MADs - it is there only to
allow debugging so that we know what instance of AL has what MADs
outstanding.  We can't reclaim these automatically - we need to figure
out why they aren't being returned to the pool.

> Say, by returning an error from ib_get_mad() when
> (pool_key->h_al->obj.type == CL_DESTROYING) ?
> I can prepare a patch if you don't mind.

If AL is destroying so will the pool_key object, so there's no need to
check AL.  Destruction is initiated on all child objects as soon as it
is initiated on the parent.  The ib_get_mad function already checks
the state of the pool, and fails if the pool is not in the
CL_INITIALIZED state.

> > A MAD completion that happens just after the destroying_al
> > function returns would be able to insert the MAD in the
> > tracking list if the AL instance still had some other reference held.
> >
> > __free_mads must be called from free_al though we could call
> > it from destroying_al too, but as I said that doesn't solve
> > the problem.
>
> If we can prevent adding new MADs while AL object destroying, it will be
> OK to call __free_mads from destroying_al.

Calling __free_mads is only possible in debug builds, after you've
ignored the assertion hit because destruction timed out.  __free_mads
is not a function that can be called to reclaim MADs while users are
using them.  What we need to do is find out who/why the MADs aren't
being returned to their pool, and fix that.

- Fab