[ofa-general] [RFC] ipoib: avoid using stale ipoib_neigh* in ipoib_neigh_cleanup()

Or Gerlitz ogerlitz at Voltaire.com
Sat May 30 23:41:54 PDT 2009


akepner at sgi.com wrote @ http://lists.openfabrics.org/pipermail/general/2009-May/059730.html
> What would prevent a race between a tx completion (with an 
> error) and the cleanup of a neighbour? 

Okay, so maybe this code/design of using the stashed ipoib_neighbour at the tx
completion code is the root cause of all these troubles?! 

>From a quick look on the code and two patches that touched this area (f56bcd801... "Use separate CQ for UD send completions" and 57ce41d1... "Fix transmit queue stalling forever") - I see that the original tx cq handler - ipoib_ib_handle_tx_wc() doesn't touch the neigbour but today is called only from the drain timer & dev-stop flows. Now, ipoib_cm_handle_tx_wc() is called for "normal" flow both for datagram and connected modes, and this function touches he neighbour.

I am not sure why commit f56bcd801... made UD completions to go through ipoib_cm_handle_tx_wc() nor why this function must use the neighbor to access the data-structure it needs to, maybe Eli can comment on that?

Or.




More information about the general mailing list