[openib-general] [PATCH] Missing check for atomic_dec in ib_post_send_mad

Sean Hefty mshefty at ichips.intel.com
Mon Nov 1 16:59:04 PST 2004


On Mon, 1 Nov 2004 16:38:03 -0800 (PST)
Krishna Kumar <krkumar at us.ibm.com> wrote:

> Hi Sean,
> 
> I think it is reasonable to have current senders racing with
> unregister. The unregister is waiting for all references to drop to
> zero before freeing up the resources. It killed the ones waiting for
> responses(mad_cancel), killed the ones who are executing in callback
> handlers, and finally after dropping the loader's module refcnt, it
> waits for the refcnt to drop to zero. These can only be threads which
> are actively receiving mad packets and those threads in the process of
> sending mad packets while the unregister was going on (and the ones
> which fail is the only cause of the problem). Essentially I think the
> unregister will hang and not free up the resource.

The difference here is that a client is calling into the API at the same
time that they are trying to unregister.  The code, even with this
change, cannot handle this condition.

For example, if the thread calling ib_unregister_mad_agent executes
completely before the thread calling ib_post_send_mad runs (or can take
a reference on the mad_agent), the mad_agent is no longer valid, and the
structure will have been freed.  The thread executing ib_post_send_mad
can crash the system at this point.

If we want to allow a client to call ib_unregister_mad_agent and
ib_post_send_mad simultaneously, then ib_post_send_mad would need to
perform some sort of lookup (likely in some global map) to validate the
mad_agent.

- Sean



More information about the general mailing list