[Openib-windows] A problem in ib_close_al

Tue Jul 18 11:08:18 PDT 2006

I'm still investigating the stuck on shutdown and see a lot of "strange"
things, look like bugs.
Sure, I'm missing something, but ...

Here are the questions:

1. mad_list in AL object: 
   usually protected by mad_lock, but ...
	sometimes - by h_al->obj.lock:
		- __cleanup_pool_key();
		- __proxy_cleanup_map();	 
	sometimes - not protected at all:
		- __proxy_mad_recv_cb;
		- __free_mads()

2. recv_queue in SPL_QP_SVC object: 
   usually protected by p_spl_qp_svc->obj.lock, but is not protected in:
	- create_spl_qp_svc() (call to spl_qp_svc_post_recvs());
	- spl_qp_svc_reset_cb()(call to spl_qp_svc_post_recvs());
	- free_spl_qp_svc();

3. send_queue in SPL_QP_SVC object: 
   usually protected by p_spl_qp_svc->obj.lock, but is not protected in:
	- spl_qp_svc_reset_cb();

4. may be the same problems exist with other link lists, like
	- recv_queue in AL_MAD_QP object;
	- send_queue in AL_MAD_QP object;
	- et al

See, please, also my comments below.

> -----Original Message-----
> From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] 
> On Behalf Of Fabian Tillier
> Sent: Wednesday, July 12, 2006 7:33 PM
> To: Leonid Keller
> Cc: openib-windows at openib.org
> Subject: Re: [Openib-windows] A problem in ib_close_al
> 
> Hi Leonid,
> 
> On 7/12/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> >
> > > -----Original Message-----
> > > From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] On 
> > > Behalf Of Fabian Tillier
> > > Sent: Tuesday, July 11, 2006 9:18 PM
> > >
> > > Hi Leo,
> > >
> > > On 7/10/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> > > > Hi Fab,
> > > > Our regression uses to run opensm and then kill it at 
> some moment.
> > > > Sometimes opensm enters "zombi" state and there is no 
> way to kill 
> > > > it.
> > > > I've investigated that and found that it is stuck on 
> infinite loop 
> > > > in sync_destroy_obj while destroying opensm's instance of AL.
> > > > I saw ref_cnt = 13 and mad_list of AL object contained 
> 13 records.
> > >
> > > The proxy code should free all outstanding MADs before calling 
> > > ib_close_al - see the code path in al_dev_close 
> (al_dev.c at 371).  The 
> > > call to __proxy_cleanup_map will free all MADs queued 
> internally.  
> > > Do you know what these MADs are from?  Are they received 
> MADs that 
> > > were not delivered to the client?  Or are they send MADs that are 
> > > waiting to be sent?
> > >
> > I don't know so far how to reproduce that so:
> > Is there a way to tell received MADs from sent ones apart 
> from adding 
> > a mad_type field anywhere and setting it after every call to 
> > ib_get_mad() ?
> 
> You can look at the MAD header - the method should indicate 
> whether it is a send or a receive.  The ib_get_mad function 
> cannot set whether the MAD will be used for a send or 
> receive, as that is not an input parameter.  It cannot be an 
> input parameter anyway as it is valid for a received MAD to 
> be used to send a response without putting the MAD back into the pool.
> 

Now, after SPL_QP_SVC has been destroyed, I see CI_CA object stuck 
with several IB_MAD_METHOD_GETTABLE_RESP MADs, which were taken 
yet in create_spl_qp_svc() (I mark on the MAD, in what function it was
taken).

> > > > I think, the problem is in that, that mad_list is freed in
> > > > free_al()
> > > > which can't be called before ref_cnt = 0, which can become such 
> > > > only after calling free_al().
> > >
> > > Yes, we have a chicken and egg issue here.  The free_mads 
> function 
> > > is there to do internal cleanup if we proceed with 
> destruction when 
> > > the reference count doesn't hit zero.  Note that this can only 
> > > happen in a debug build (where destruction will 
> eventually timeout 
> > > and blow the object away).
> > >
> > > > I've prepared a patch which calls __free_mads() from AL 
> destroying 
> > > > function.
> > >
> > > We could call __free_mads from the destroying_al 
> function, but this 
> > > wouldn't necessarilly prevent MADs from being queued after that 
> > > function is called, but before the AL object moves to the cleanup 
> > > phase of destruction (after all references are released).
> 
> I just realized that we cannot call __free_mads from the 
> destroying_al function - it is never safe to reclaim a MAD 
> that a user is using.
> 
> > I'm working now on a stuck during shutdown after regression, which 
> > (the
> > stuck) we experiences constantly. The function stuck is 
> > ib_deregister_ca(), which stands in sync_destroy_obj(), 
> waiting for CA 
> > to release it, which is in turn hold by its PD object.
> > Today I revealed that the PD object is hold by several mads, taken 
> > once from the pool_key of CI_CA. It reminds the problem, being 
> > discussed here and I'll appreciate if you could suggest an 
> idea, why 
> > wouldn't these mads be released by __cleanup_pool_key().
> 
> You can't pull MADs back from a user if that user is using 
> it.  The user *must* return the MADs.  The MAD tracking list 
> in IBAL is for debugging purposes only.  If a MAD is in the 
> tracking list it means that a client has pulled that MAD out 
> of the pool.  It is not technically safe to just put the MAD 
> back in the pool, as the user may still be accessing it.  If 
> the user later tries to return the MAD, you will very likely 
> get a system crash.
> 
> > For now I have an idea that probably could solve both problems:
> > What if we forbid inserting of new MADs while the AL object 
> is being 
> > destroyed ?
> 
> The tracking list is there only to track MADs - it is there 
> only to allow debugging so that we know what instance of AL 
> has what MADs outstanding.  We can't reclaim these 
> automatically - we need to figure out why they aren't being 
> returned to the pool.

It's not just for debugging, if you use it 
in __cleanup_pool_key() to release the pool_key.

> 
> > Say, by returning an error from ib_get_mad() when 
> > (pool_key->h_al->obj.type == CL_DESTROYING) ?
> > I can prepare a patch if you don't mind.
> 
> If AL is destroying so will the pool_key object, so there's 
> no need to check AL.  Destruction is initiated on all child 
> objects as soon as it is initiated on the parent.  The 
> ib_get_mad function already checks the state of the pool, and 
> fails if the pool is not in the CL_INITIALIZED state.
> 

It's not quite so. 
ib_deregister_ca() initiates destroy of CI_CA and consequently CA, but
not AL.
I saw calls to ib_get_mad() during CI_CA destroying. Her is an example
of debug print:

***** ib_get_mad: MAD taken during destroying. pool_key
FFFFFADFCCFBE010, al_mad FFFFFADFCD1C8010, ib_mad FFFFFADFCD1C8038,
pool_state 2, ci_ca_state 3

> > > A MAD completion that happens just after the 
> destroying_al function 
> > > returns would be able to insert the MAD in the tracking 
> list if the 
> > > AL instance still had some other reference held.
> > >
> > > __free_mads must be called from free_al though we could 
> call it from 
> > > destroying_al too, but as I said that doesn't solve the problem.
> >
> > If we can prevent adding new MADs while AL object 
> destroying, it will 
> > be OK to call __free_mads from destroying_al.
> 
> Calling __free_mads is only possible in debug builds, after 
> you've ignored the assertion hit because destruction timed 
> out.  __free_mads is not a function that can be called to 
> reclaim MADs while users are using them.  What we need to do 
> is find out who/why the MADs aren't being returned to their 
> pool, and fix that.
> 
> - Fab
>