[ofa-general] Re: dst_ifdown breaks infiniband?
Alexey Kuznetsov
kuznet at ms2.inr.ac.ru
Sun Mar 18 12:12:38 PDT 2007
Hello!
> This is not new code, and should have triggered long time ago,
> so I am not sure how come we are triggering this only now,
> but somehow this did not lead to crashes in 2.6.20
I see. I guess this was plain luck.
> Why is neighbour->dev changed here?
It holds reference to device and prevents its destruction.
If dst is held somewhere, we cannot destroy the device and deadlock
while unregister.
We could not invalidate dst->neighbour but it looked safe to invalidate
neigh->dev after quiescent state. Obviosuly, it is not and it never was safe.
Was supposed to be repaired asap, but this did not happen. :-(
> Can dst->neighbour be changed to point to NULL instead, and the neighbour
> released?
It should be cleared and we should be sure it will not be destroyed
before quiescent state.
Seems, this is the only correct solution, but to do this we have
to audit all the places where dst->neighbour is dereferenced for
RCU safety.
Actually, it is very good you caught this eventually, the bug was
so _disgusting_ that it was "forgotten" all the time, waiting for
someone who will point out that the king is naked. :-)
Alexey
More information about the general
mailing list