[ofa-general] Re: [RFC][PATCH] last recv race patch
Roland Dreier
rdreier at cisco.com
Fri Jun 27 21:39:28 PDT 2008
> The race is between the list is taken off from reap_list for destory
> because of the timeout and readd to reap_list from this condition. So we
> need a condition check here.
I'm sorry, I still don't see what the bug is. What timeout removes
things from the rx_reap_list? And how could any context get added to
the rx_reap_list more than once? The code you are changing is:
> if (!--p->recv_count) {
> spin_lock_irqsave(&priv->lock, flags);
> - list_move(&p->list, &priv->cm.rx_reap_list);
> - spin_unlock_irqrestore(&priv->lock, flags);
> - queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
> + if (p->state == IPOIB_CM_RX_LIVE) {
> + list_move(&p->list, &priv->cm.rx_reap_list);
> + spin_unlock_irqrestore(&priv->lock, flags);
> + queue_work(ipoib_workqueue, &priv->cm.rx_reap_task);
> + } else
> + spin_unlock_irqrestore(&priv->lock, flags);
and I don't see how recv_count could ever be 0 twice: we never increment it.
In any case I don't see how testing against the LIVE state could be
correct, since for example we want to reap stale QPs that we moved to
IPOIB_CM_RX_ERROR after all the pending receives have completed.
- R.
More information about the general
mailing list