[ofa-general] Re: dst_ifdown breaks infiniband?
Michael S. Tsirkin
mst at dev.mellanox.co.il
Sun Mar 18 12:53:55 PDT 2007
Quoting Alexey Kuznetsov <kuznet at ms2.inr.ac.ru>:
Subject: Re: dst_ifdown breaks infiniband?
>
> Hello!
>
> > This is not new code, and should have triggered long time ago,
> > so I am not sure how come we are triggering this only now,
> > but somehow this did not lead to crashes in 2.6.20
>
> I see. I guess this was plain luck.
>
>
> > Why is neighbour->dev changed here?
>
> It holds reference to device and prevents its destruction.
> If dst is held somewhere, we cannot destroy the device and deadlock
> while unregister.
>
> We could not invalidate dst->neighbour but it looked safe to invalidate
> neigh->dev after quiescent state. Obviosuly, it is not and it never was safe.
> Was supposed to be repaired asap, but this did not happen. :-(
>
> > Can dst->neighbour be changed to point to NULL instead, and the neighbour
> > released?
>
> It should be cleared and we should be sure it will not be destroyed
> before quiescent state.
I'm confused. didn't you say dst_ifdown is called after quiescent state?
> Seems, this is the only correct solution, but to do this we have
> to audit all the places where dst->neighbour is dereferenced for
> RCU safety.
>
> Actually, it is very good you caught this eventually, the bug was
> so _disgusting_ that it was "forgotten" all the time, waiting for
> someone who will point out that the king is naked. :-)
>
> Alexey
This does not sound like something that's likely to be accepted in 2.6.21, right?
Any simpler ideas?
--
MST
More information about the general
mailing list