[Openib-windows] A problem in ib_close_al

Wed Jul 19 09:32:57 PDT 2006

One more question.
Look at __cleanup_pool_key():

__cleanup_pool_key(
	IN				al_obj_t*
p_obj )
{
	...
	CL_ASSERT( !p_pool_key->mad_cnt );
	...
	if( p_pool_key->mad_cnt ) 
	{
		// release mad_list
	}
}

Questions:
	(minor) Why you expect mad_cnt=0 in ASSERT and then release the
MADs ?
	(major) Why do we need to release MADs *here* at all?
		( Because if there is at least one MAD, not released to
pool, 
		  __cleanup_pool_key() will be never called! )

> -----Original Message-----
> From: Leonid Keller 
> Sent: Tuesday, July 18, 2006 9:08 PM
> To: 'Fabian Tillier'
> Cc: openib-windows at openib.org
> Subject: RE: [Openib-windows] A problem in ib_close_al
> Importance: High
> 
> I'm still investigating the stuck on shutdown and see a lot 
> of "strange" things, look like bugs.
> Sure, I'm missing something, but ...
> 
> Here are the questions:
> 
> 1. mad_list in AL object: 
>    usually protected by mad_lock, but ...
> 	sometimes - by h_al->obj.lock:
> 		- __cleanup_pool_key();
> 		- __proxy_cleanup_map();	 
> 	sometimes - not protected at all:
> 		- __proxy_mad_recv_cb;
> 		- __free_mads()
> 
> 2. recv_queue in SPL_QP_SVC object: 
>    usually protected by p_spl_qp_svc->obj.lock, but is not 
> protected in:
> 	- create_spl_qp_svc() (call to spl_qp_svc_post_recvs());
> 	- spl_qp_svc_reset_cb()(call to spl_qp_svc_post_recvs());
> 	- free_spl_qp_svc();
> 
> 3. send_queue in SPL_QP_SVC object: 
>    usually protected by p_spl_qp_svc->obj.lock, but is not 
> protected in:
> 	- spl_qp_svc_reset_cb();
> 
> 4. may be the same problems exist with other link lists, like
> 	- recv_queue in AL_MAD_QP object;
> 	- send_queue in AL_MAD_QP object;
> 	- et al
> 
> See, please, also my comments below.
> 
> 
> > -----Original Message-----
> > From: ftillier.sst at gmail.com 
> [mailto:ftillier.sst at gmail.com] On Behalf 
> > Of Fabian Tillier
> > Sent: Wednesday, July 12, 2006 7:33 PM
> > To: Leonid Keller
> > Cc: openib-windows at openib.org
> > Subject: Re: [Openib-windows] A problem in ib_close_al
> > 
> > Hi Leonid,
> > 
> > On 7/12/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> > >
> > > > -----Original Message-----
> > > > From: ftillier.sst at gmail.com [mailto:ftillier.sst at gmail.com] On 
> > > > Behalf Of Fabian Tillier
> > > > Sent: Tuesday, July 11, 2006 9:18 PM
> > > >
> > > > Hi Leo,
> > > >
> > > > On 7/10/06, Leonid Keller <leonid at mellanox.co.il> wrote:
> > > > > Hi Fab,
> > > > > Our regression uses to run opensm and then kill it at
> > some moment.
> > > > > Sometimes opensm enters "zombi" state and there is no
> > way to kill
> > > > > it.
> > > > > I've investigated that and found that it is stuck on
> > infinite loop
> > > > > in sync_destroy_obj while destroying opensm's instance of AL.
> > > > > I saw ref_cnt = 13 and mad_list of AL object contained
> > 13 records.
> > > >
> > > > The proxy code should free all outstanding MADs before calling 
> > > > ib_close_al - see the code path in al_dev_close
> > (al_dev.c at 371).  The
> > > > call to __proxy_cleanup_map will free all MADs queued
> > internally.  
> > > > Do you know what these MADs are from?  Are they received
> > MADs that
> > > > were not delivered to the client?  Or are they send 
> MADs that are 
> > > > waiting to be sent?
> > > >
> > > I don't know so far how to reproduce that so:
> > > Is there a way to tell received MADs from sent ones apart
> > from adding
> > > a mad_type field anywhere and setting it after every call to
> > > ib_get_mad() ?
> > 
> > You can look at the MAD header - the method should indicate 
> whether it 
> > is a send or a receive.  The ib_get_mad function cannot set whether 
> > the MAD will be used for a send or receive, as that is not an input 
> > parameter.  It cannot be an input parameter anyway as it is 
> valid for 
> > a received MAD to be used to send a response without 
> putting the MAD 
> > back into the pool.
> > 
> 
> Now, after SPL_QP_SVC has been destroyed, I see CI_CA object 
> stuck with several IB_MAD_METHOD_GETTABLE_RESP MADs, which 
> were taken yet in create_spl_qp_svc() (I mark on the MAD, in 
> what function it was taken).
>  
> 
> 
> > > > > I think, the problem is in that, that mad_list is freed in
> > > > > free_al()
> > > > > which can't be called before ref_cnt = 0, which can 
> become such 
> > > > > only after calling free_al().
> > > >
> > > > Yes, we have a chicken and egg issue here.  The free_mads
> > function
> > > > is there to do internal cleanup if we proceed with
> > destruction when
> > > > the reference count doesn't hit zero.  Note that this can only 
> > > > happen in a debug build (where destruction will
> > eventually timeout
> > > > and blow the object away).
> > > >
> > > > > I've prepared a patch which calls __free_mads() from AL
> > destroying
> > > > > function.
> > > >
> > > > We could call __free_mads from the destroying_al
> > function, but this
> > > > wouldn't necessarilly prevent MADs from being queued after that 
> > > > function is called, but before the AL object moves to 
> the cleanup 
> > > > phase of destruction (after all references are released).
> > 
> > I just realized that we cannot call __free_mads from the 
> destroying_al 
> > function - it is never safe to reclaim a MAD that a user is using.
> > 
> > > I'm working now on a stuck during shutdown after 
> regression, which 
> > > (the
> > > stuck) we experiences constantly. The function stuck is 
> > > ib_deregister_ca(), which stands in sync_destroy_obj(),
> > waiting for CA
> > > to release it, which is in turn hold by its PD object.
> > > Today I revealed that the PD object is hold by several 
> mads, taken 
> > > once from the pool_key of CI_CA. It reminds the problem, being 
> > > discussed here and I'll appreciate if you could suggest an
> > idea, why
> > > wouldn't these mads be released by __cleanup_pool_key().
> > 
> > You can't pull MADs back from a user if that user is using it.  The 
> > user *must* return the MADs.  The MAD tracking list in IBAL is for 
> > debugging purposes only.  If a MAD is in the tracking list it means 
> > that a client has pulled that MAD out of the pool.  It is not 
> > technically safe to just put the MAD back in the pool, as 
> the user may 
> > still be accessing it.  If the user later tries to return 
> the MAD, you 
> > will very likely get a system crash.
> > 
> > > For now I have an idea that probably could solve both problems:
> > > What if we forbid inserting of new MADs while the AL object
> > is being
> > > destroyed ?
> > 
> > The tracking list is there only to track MADs - it is there only to 
> > allow debugging so that we know what instance of AL has what MADs 
> > outstanding.  We can't reclaim these automatically - we 
> need to figure 
> > out why they aren't being returned to the pool.
> 
> It's not just for debugging, if you use it in 
> __cleanup_pool_key() to release the pool_key.
> 
> > 
> > > Say, by returning an error from ib_get_mad() when 
> > > (pool_key->h_al->obj.type == CL_DESTROYING) ?
> > > I can prepare a patch if you don't mind.
> > 
> > If AL is destroying so will the pool_key object, so there's 
> no need to 
> > check AL.  Destruction is initiated on all child objects as 
> soon as it 
> > is initiated on the parent.  The ib_get_mad function already checks 
> > the state of the pool, and fails if the pool is not in the 
> > CL_INITIALIZED state.
> > 
> 
> It's not quite so. 
> ib_deregister_ca() initiates destroy of CI_CA and 
> consequently CA, but not AL.
> I saw calls to ib_get_mad() during CI_CA destroying. Her is 
> an example of debug print:
> 
> ***** ib_get_mad: MAD taken during destroying. pool_key 
> FFFFFADFCCFBE010, al_mad FFFFFADFCD1C8010, ib_mad 
> FFFFFADFCD1C8038, pool_state 2, ci_ca_state 3
> 
> 
> > > > A MAD completion that happens just after the
> > destroying_al function
> > > > returns would be able to insert the MAD in the tracking
> > list if the
> > > > AL instance still had some other reference held.
> > > >
> > > > __free_mads must be called from free_al though we could
> > call it from
> > > > destroying_al too, but as I said that doesn't solve the problem.
> > >
> > > If we can prevent adding new MADs while AL object
> > destroying, it will
> > > be OK to call __free_mads from destroying_al.
> > 
> > Calling __free_mads is only possible in debug builds, after you've 
> > ignored the assertion hit because destruction timed out.  
> __free_mads 
> > is not a function that can be called to reclaim MADs while 
> users are 
> > using them.  What we need to do is find out who/why the MADs aren't 
> > being returned to their pool, and fix that.
> > 
> > - Fab
> >