[openib-general] possible oops when calling ipoib_neigh_destructor while ipoib mod ule is down.

Roland Dreier roland at topspin.com
Mon Nov 29 09:33:05 PST 2004


    Ido> As far as I understand, if the kernel calls
    Ido> "ipoib_neigh_destructor" after the ib_ipoib module has been
    Ido> taken down a kernel oops can occur. In most cases when a
    Ido> driver is taken down, the kernel cleanup has already
    Ido> destroyed all the ipoib driver relevant entries. We noticed
    Ido> that while applications such as NetPerf are running while
    Ido> ipoib is taken down, the neighbor entry may be held (by the
    Ido> kernel) after the module is taken down.  The destructor will
    Ido> only be called way after the application exits or being
    Ido> terminated.

This seems quite likely.  I didn't check this very carefully -- it
looked like the kernel killed all neighbours when an interface is
unregistered, but I guess when the neighbours are still in use they
can hang around for a while.

It does seem like IPoIB needs to keep track of all the neighbour
structures it has added path context to.  When unregistering an
interface, IPoIB should free the path context and reset the
destructure.  However I'm not sure exactly how to do this while making
sure the kernel hasn't freed the neighbour first (it seems there are
some tricky races here).

 - Roland



More information about the general mailing list