[openib-general] RFC: revert module ref counting patches (was Re: [PATCH] ipoib_flush_paths)

Mon Apr 10 02:44:57 PDT 2006

Quoting r. Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH] ipoib_flush_paths
> 
>     Michael> Actually, it turned out to be the simplest solution - and
>     Michael> quite elegant since there's no room for mistakes: if
>     Michael> query is going to be running this means module is still
>     Michael> loaded so we can take a reference to it without races.
> 
> Yes, this is suprisingly clean.

As it turns out, every problem has a simple, incorrect solution :(

>     Michael> As a bonus, and assertion inside __module_get increases
>     Michael> the chance to catch races where user forgets to cancel
>     Michael> the query - much nicer than crashing randomly.
> 
> Actually I think __module_get() will do the wrong thing if called
> during module unloading -- it doesn't test module_is_live().  In other
> words, calling __module_get() without already holding a ref has a
> race: __try_stop_module() can see the ref count as 0, then
> __module_get() can increment it, and then __try_stop_module() sets the
> module state to GOING and returns.
> 
> So the right thing to do is BUG_ON(!try_module_get(owner))

Ugh. I was wrong: this approach does not work at all.
Unfortunately we have:

asmlinkage long
sys_delete_module(const char __user *name_user, unsigned int flags)
{

....

        /* Stop the machine so refcounts can't move and disable module. */
        ret = try_stop_module(mod, flags, &forced);
        if (ret != 0)
                goto out;

        /* Never wait if forced. */
        if (!forced && module_refcount(mod) != 0)
                wait_for_zero_refcount(mod);

        /* Final destruction now noone is using it. */
        if (mod->exit != NULL) {
                up(&module_mutex);
                mod->exit();
                down(&module_mutex);
        }
        free_module(mod);

 out:
        up(&module_mutex);
        return ret;
}

----

This means that incrementing module reference count once the reference
count might have once reached 0 is useless *even if the module cleanup
did not finish yet*.

Worse, when a module e.g. cancels SA queries as part of module unload,
we get callbacks when module is not live anymore, so
BUG_ON(!try_module_get(owner)) will trigger things like

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at drivers/infiniband/core/sa_query.c:867

And since that's what ipoib does, I am actually seeing these :(

The patch I posted and tested does __module_get instead, so there's
no BUG_ON and thus I did not notice anything untoward, but the fact
remains that this play with reference counting around callback
seems to get us zero safety against module unloading.

I guess this was what people were saying all the time: we can't safely use
try_module_get/__module_get not in module context. Sigh.

At this point it seems to me that the only viable short-term approach is to
revert the module reference counting patches.

Opinions?

It also seems that all we need is an exported API to prevent modules from
unloading while we are inside the callback. Just adding 2 exported functions
module_mutex_lock();
module_mutex_unlock();
and calling them around callbacks will work.

Alternatively, we can go back to the original idea of adding API for flushing
WQs to ib_mad, ib_sa and ib_addr modules and calling that at module cleanup.

Comments on these ideas?
All this might be 2.6.18 material though.

-- 
MST