[ofa-general] [RFC] ipoib: avoid using stale ipoib_neigh* in ipoib_neigh_cleanup()

Eli Cohen eli at dev.mellanox.co.il
Sun May 31 00:21:15 PDT 2009


On Sun, May 31, 2009 at 09:41:54AM +0300, Or Gerlitz wrote:
> akepner at sgi.com wrote @ http://lists.openfabrics.org/pipermail/general/2009-May/059730.html
> > What would prevent a race between a tx completion (with an 
> > error) and the cleanup of a neighbour? 
> 
> Okay, so maybe this code/design of using the stashed ipoib_neighbour at the tx
> completion code is the root cause of all these troubles?! 
> 
> >From a quick look on the code and two patches that touched this area (f56bcd801... "Use separate CQ for UD send completions" and 57ce41d1... "Fix transmit queue stalling forever") - I see that the original tx cq handler - ipoib_ib_handle_tx_wc() doesn't touch the neigbour but today is called only from the drain timer & dev-stop flows. Now, ipoib_cm_handle_tx_wc() is called for "normal" flow both for datagram and connected modes, and this function touches he neighbour.

Or, I don't follow on you - ipoib_cm_handle_tx_wc() called
ipoib_neigh_free() from the first commit. Also please note the
following designation of CQs:
recv_cq: used for all receives and for CM send
send_cq: used for UD send

Thus, since in ipoib_poll() we poll "recv_cq", any none receive must
be that of CM mode sends.

> 
> I am not sure why commit f56bcd801... made UD completions to go through ipoib_cm_handle_tx_wc() nor why this function must use the neighbor to access the data-structure it needs to, maybe Eli can comment on that?
> 
> Or.
> 
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list