[openib-general] [PATCH/RFC] IB/mad: Fix race between cancel and receive

Sean Hefty mshefty at ichips.intel.com
Tue Nov 7 09:08:24 PST 2006


Roland Dreier wrote:
> I don't believe we should generate receive callbacks for canceled
> sends, so I came up with the patch below (much simpler than the
> explanation that led up to it).  I am no longer able to reproduce the
> IPoIB crash with this applied so I feel pretty good about this.

I agree that this should be the case.  If you look in ib_find_send_mad(), it 
checks that the wr->status for the send is still IB_WC_SUCCESS, but only in one 
of the two return paths.  I think that we either want to fix the problem in 
ib_find_send_mad() or remove the check for status there.

struct ib_mad_send_wr_private*
ib_find_send_mad(struct ib_mad_agent_private *mad_agent_priv,
		 struct ib_mad_recv_wc *wc)
{
	struct ib_mad_send_wr_private *wr;
	struct ib_mad *mad;

	mad = (struct ib_mad *)wc->recv_buf.mad;

	list_for_each_entry(wr, &mad_agent_priv->wait_list, agent_list) {
		if ((wr->tid == mad->mad_hdr.tid) &&
		    rcv_has_same_class(wr, wc) &&
		    /*
		     * Don't check GID for direct routed MADs.
		     * These might have permissive LIDs.
		     */
		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
			return wr;
*** Missing check for status == SUCCESS
	}

	/*
	 * It's possible to receive the response before we've
	 * been notified that the send has completed
	 */
	list_for_each_entry(wr, &mad_agent_priv->send_list, agent_list) {
		if (is_data_mad(mad_agent_priv, wr->send_buf.mad) &&
		    wr->tid == mad->mad_hdr.tid &&
		    wr->timeout &&
		    rcv_has_same_class(wr, wc) &&
		    /*
		     * Don't check GID for direct routed MADs.
		     * These might have permissive LIDs.
		     */
		    (is_direct(wc->recv_buf.mad->mad_hdr.mgmt_class) ||
		     rcv_has_same_gid(mad_agent_priv, wr, wc)))
			/* Verify request has not been canceled */
			return (wr->status == IB_WC_SUCCESS) ? wr : NULL;
*** Has check against canceled MADs
	}
	return NULL;
}

- Sean




More information about the general mailing list