[ewg] Re: B/ipoib: Fix neigh destructor oops for kernels older than 2.6.21

Dave Olson dave.olson at qlogic.com
Tue May 27 08:43:39 PDT 2008


On Sun, 25 May 2008, Eli Cohen wrote:

| On Fri, 2008-05-23 at 15:05 -0700, Dave Olson wrote:
| 
| > I don't think this patch actually fixes anything (or even really works
| > around the problem).   As soon as somebody unloads the new module, exactlyr
| > the same problem occurs.
| 
| This won't let you unload the module:
| 
| +static int __init ipoib_helper_init(void)
| +{
| +       if (!try_module_get(THIS_MODULE))
| +               return -1;

OK, so at least the problem won't show up.  It's still
a pretty monumental kluge, and means that you can't really
test for memory leaks, since some resources won't be released.

I haven't had a chance to actually try the patch (and probably
won't for some time; yesterday was a holiday, and I'm taking
a short vacation starting Thursday).

| > So I think it's better to do nothing at all, than a bandaid like this.
| And what, have kernel oops every so often? This problem showed up at
| RHAS5.0 kernels (2.6.18 based) and we have a few reports on this
| failure.

Yeah, I filed one of the bugs.  There's no doubt it's a real
problem and needs a fix.

| > The "right" fix is to do the cleanup such that the core networking code
| > won't do the callback.  That may not be trivial or simple, but it's
| > really the only viable fix for the long run.   Bandaiding over it
| > doesn't seem very useful to me.
| The upstream kernel already has a fix for that. If we can convince
| distributors to integrate that fix into their kernel it would be much
| better but till that happens we can't just leave this unanswered.

That won't help for any released kernel/distro.   It just seems that
cleanup should be possible without core changes.   If "too much" effort
has been invested in that without results, I'll stop complaining, but
this really is a nasty kluge.

| > By the way, when a patch is proposed for an bug in the openfabrics
| > bugzilla (985/1021/1028) in this case), I think it would be helpful
| > (i.e., good policy) to attach the proposed patch to the bug, so that
| > people following the bug have some idea that work is being done (and
| > probably mention that discussion is occurring, and on what list; I would
| > have expected this discussion to be on the general list, rather than ewg).
|
| I agree on the need to attach the patch in bugzilla. As for which
| mailing list should be used, I thought ewg was the right place since it
| is not a problem in the upstream kernel. But I can just CC both lists in
| the future.

Could be my mistake, but I had thought ewg was more for the administrative
discussions, and that general was for non-administrative discussions,
regardless of which code base.  But if other people don't feel that way,
it's OK with me.


Dave Olson
dave.olson at qlogic.com



More information about the ewg mailing list