[ofa-general] Re: [ewg] OFED April 21 meeting summary

Roland Dreier rdreier at cisco.com
Mon Apr 28 16:07:35 PDT 2008


 > I just saw this bug report today, but we've had similar crashes. 
 > Looks like the problem is that in ipoib_neigh_cleanup() this is 
 > done (no locking):
 > 
 >     neigh = *to_ipoib_neigh(n);
 > 
 > then later:
 > 
 >       spin_lock_irqsave(&priv->lock, flags);
 >       if (neigh->ah)
 >                ah = neigh->ah;
 >       list_del(&neigh->list); <---- neigh may be stale now
 >       ipoib_neigh_free(n->dev, neigh);
 >       spin_unlock_irqrestore(&priv->lock, flags);
 > 
 > neigh wasn't re-read after acquiring the lock, so it may point
 > to an already freed data structure.

Ugh, looks delicate to fix properly, since we don't have a lock to take
until we find out whether the neighbour is attached to an IPoIB device.

 > Unable to handle kernel paging request at 0000000000100108
 >                                           ^^^^^^^^^^^^^^^^
 >                                           LIST_POISON1 + 0x8

strange that the ofa bugzilla entry has a different address it's
crashing at.



More information about the ewg mailing list