[openib-general] [PATCH/RFC] IB/mad: Fix race between cancel and receive
Sean Hefty
mshefty at ichips.intel.com
Tue Nov 7 09:08:24 PST 2006
Roland Dreier wrote:
> I don't believe we should generate receive callbacks for canceled
> sends, so I came up with the patch below (much simpler than the
> explanation that led up to it). I am no longer able to reproduce the
> IPoIB crash with this applied so I feel pretty good about this.
I agree that this should be the case. If you look in ib_find_send_mad(), it
checks that the wr->status for the send is still IB_WC_SUCCESS, but only in one
of the two return paths. I think that we either want to fix the problem in
ib_find_send_mad() or remove the check for status there.
struct ib_mad_send_wr_private*
ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
struct ib_mad_recv_wc *wc)
{
struct ib_mad_send_wr_private *wr;
struct ib_mad *mad;
mad = (struct ib_mad *)wc->recv_buf.mad;
list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
if ((wr->tid == mad->mad_hdr.tid) &&
rcv_has_same_class(wr, wc) &&
/*
* Don't check GID for direct routed MADs.
* These might have permissive LIDs.
*/
(is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
rcv_has_same_gid(mad_agent_priv, wr, wc)))
return wr;
*** Missing check for status == SUCCESS
}
/*
* It's possible to receive the response before we've
* been notified that the send has completed
*/
list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
if (is_data_mad(mad_agent_priv, wr->send_buf.mad) &&
wr->tid == mad->mad_hdr.tid &&
wr->timeout &&
rcv_has_same_class(wr, wc) &&
/*
* Don't check GID for direct routed MADs.
* These might have permissive LIDs.
*/
(is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
rcv_has_same_gid(mad_agent_priv, wr, wc)))
/* Verify request has not been canceled */
return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
*** Has check against canceled MADs
}
return NULL;
}
- Sean
More information about the general
mailing list