[ewg] Re: B/ipoib: Fix neigh destructor oops for kernels older than 2.6.21

Eli Cohen eli at dev.mellanox.co.il
Sun May 25 00:45:36 PDT 2008


On Fri, 2008-05-23 at 15:05 -0700, Dave Olson wrote:

> I don't think this patch actually fixes anything (or even really works
> around the problem).   As soon as somebody unloads the new module, exactlyr
> the same problem occurs.

This won't let you unload the module:

+static int __init ipoib_helper_init(void)
+{
+       if (!try_module_get(THIS_MODULE))
+               return -1;


> When testing for memory leaks, and other bugs, you want to unload all
> of the modules that you have loaded, over and over, and typically
> you automate that, and such tests are going to unload this module as well,
> since the new module is loaded by modprobe due to it's dependency, and
> therefore is in the list of modules loaded by the test.
Same as above. Did you actually try to unload the module, and have
success?

> 
> So I think it's better to do nothing at all, than a bandaid like this.
And what, have kernel oops every so often? This problem showed up at
RHAS5.0 kernels (2.6.18 based) and we have a few reports on this
failure.

> 
> The "right" fix is to do the cleanup such that the core networking code
> won't do the callback.  That may not be trivial or simple, but it's
> really the only viable fix for the long run.   Bandaiding over it
> doesn't seem very useful to me.
The upstream kernel already has a fix for that. If we can convince
distributors to integrate that fix into their kernel it would be much
better but till that happens we can't just leave this unanswered.

> By the way, when a patch is proposed for an bug in the openfabrics
> bugzilla (985/1021/1028) in this case), I think it would be helpful
> (i.e., good policy) to attach the proposed patch to the bug, so that
> people following the bug have some idea that work is being done (and
> probably mention that discussion is occurring, and on what list; I would
> have expected this discussion to be on the general list, rather than ewg).
I agree on the need to attach the patch in bugzilla. As for which
mailing list should be used, I thought ewg was the right place since it
is not a problem in the upstream kernel. But I can just CC both lists in
the future.

> 
> In summary, my preference would be to simply leave the bug open, and not
> fix the problem at all, rather than to bandaid over it in a fashion that
> itself doesn't really solve any problems, just moves them over one step.
> Of course, what I'd really like is a real fix, so that the neighbour
> code is correctly cleaned up...
> 
So if we agree that this patch does solve the problem -- at least will
prevent the kernel oops -- I think we would like to have it until we
have a better solution. Let's hear other opinions.




More information about the ewg mailing list