[openib-general] [PATCH] Missing check for atomic_dec in ib_post_send_mad

Krishna Kumar krkumar at us.ibm.com
Mon Nov 1 17:40:56 PST 2004


Hi Sean,

I agree on the race between the threads, and this is something that I
had considered as a separate problem (but now it comes back to haunt
me :-).

An easier solution for this problem is to make sure that whoever
gets the agent (ib_mad_recv_done_handler) validate the mad_agent
before calling us. Basically find_mad_agent can hold a refcnt
on the agent. Is that correct ? If so, I can make a patch to handle
races on that front. This code is pretty complicated, so please let
me know if I have grossly mis-stated something (agents and agent_private,
and whatnots :-).

Thanks for your feedback,

- KK

On Mon, 1 Nov 2004, Sean Hefty wrote:

> On Mon, 1 Nov 2004 16:38:03 -0800 (PST)
> Krishna Kumar <krkumar at us.ibm.com> wrote:
>
> > Hi Sean,
> >
> > I think it is reasonable to have current senders racing with
> > unregister. The unregister is waiting for all references to drop to
> > zero before freeing up the resources. It killed the ones waiting for
> > responses(mad_cancel), killed the ones who are executing in callback
> > handlers, and finally after dropping the loader's module refcnt, it
> > waits for the refcnt to drop to zero. These can only be threads which
> > are actively receiving mad packets and those threads in the process of
> > sending mad packets while the unregister was going on (and the ones
> > which fail is the only cause of the problem). Essentially I think the
> > unregister will hang and not free up the resource.
>
> The difference here is that a client is calling into the API at the same
> time that they are trying to unregister.  The code, even with this
> change, cannot handle this condition.
>
> For example, if the thread calling ib_unregister_mad_agent executes
> completely before the thread calling ib_post_send_mad runs (or can take
> a reference on the mad_agent), the mad_agent is no longer valid, and the
> structure will have been freed.  The thread executing ib_post_send_mad
> can crash the system at this point.
>
> If we want to allow a client to call ib_unregister_mad_agent and
> ib_post_send_mad simultaneously, then ib_post_send_mad would need to
> perform some sort of lookup (likely in some global map) to validate the
> mad_agent.
>
> - Sean
>
>




More information about the general mailing list