[ofa-general] Re: [ewg] OFED April 21 meeting summary
    Roland Dreier 
    rdreier at cisco.com
       
    Mon Apr 28 16:07:35 PDT 2008
    
    
  
 > I just saw this bug report today, but we've had similar crashes. 
 > Looks like the problem is that in ipoib_neigh_cleanup() this is 
 > done (no locking):
 > 
 >     neigh = *to_ipoib_neigh(n);
 > 
 > then later:
 > 
 >       spin_lock_irqsave(&priv->lock, flags);
 >       if (neigh->ah)
 >                ah = neigh->ah;
 >       list_del(&neigh->list); <---- neigh may be stale now
 >       ipoib_neigh_free(n->dev, neigh);
 >       spin_unlock_irqrestore(&priv->lock, flags);
 > 
 > neigh wasn't re-read after acquiring the lock, so it may point
 > to an already freed data structure.
Ugh, looks delicate to fix properly, since we don't have a lock to take
until we find out whether the neighbour is attached to an IPoIB device.
 > Unable to handle kernel paging request at 0000000000100108
 >                                           ^^^^^^^^^^^^^^^^
 >                                           LIST_POISON1 + 0x8
strange that the ofa bugzilla entry has a different address it's
crashing at.
    
    
More information about the general
mailing list