[ofa-general] Re: dst_ifdown breaks infiniband?
Michael S. Tsirkin
mst at dev.mellanox.co.il
Tue Mar 20 09:02:17 PDT 2007
> Quoting Alexey Kuznetsov <kuznet at ms2.inr.ac.ru>:
> Subject: Re: dst_ifdown breaks infiniband?
> > This might work. Could you post a patch to better show what you mean to do?
> Here it is.
> ->neigh_destructor() is killed (not used), replaced with ->neigh_cleanup(),
> which is called when neighbor entry goes to dead state. At this point
> everything is still valid: neigh->dev, neigh->parms etc.
> The device should guarantee that dead neighbor entries (neigh->dead != 0)
> do not get private part initialized, otherwise nobody will cleanup it.
OK, I stress-tested this for about 9 hours - apparently this resolves the issues
I was seeing both with hotplug device unregister and module removal.
This is an old bug, but somehow it did not trigger on
older kernels - some code restructuring in infiniband is probably
the reason - so from that POV it's a regression in 2.6.21.
So now several people are experiencing these crashes.
David, Alexey, what do you think about this patch? Is it right?
Could this patch be considered for 2.6.21?
Acked-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> I think this is enough for ipoib which is the only user of this thing.
> Initialization private part of neighbor entries happens in ipib
> start_xmit routine, which is not reached when device is down.
> But it would be better to add explicit test for neigh->dead
> in any case.
Additionally, ip over infiniband actually tests a separate flag
IPOIB_FLAG_ADMIN_UP before looking at an skb.
This flag is cleared before the device goes down.
Taken together this should be sufficient I think.
More information about the general