[ofa-general] Possible memory leak and null pointer dereference in local_completions()

Ralph Campbell ralph.campbell at qlogic.com
Wed Feb 4 11:58:05 PST 2009


On Wed, 2009-02-04 at 04:29 -0800, Hal Rosenstock wrote:
> On Tue, Feb 3, 2009 at 2:26 PM, Ralph Campbell
> <ralph.campbell at qlogic.com> wrote:
> > I was doing some tests with different MAD packets and
> > then reading the infiniband/core/mad.c code.
> >
> > handle_outgoing_dr_smp() can queue a struct ib_mad_local_private *local
> > on the mad_agent_priv->local_work work queue with
> > local->mad_priv == NULL if device->process_mad() returns
> > IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
> > (!ib_response_mad(&mad_priv->mad.mad) ||
> >  !mad_agent_priv->agent.recv_handler).
> >
> > In this case, local_completions() will be called with
> > local->mad_priv == NULL. The code does check for this
> > case and skips calling recv_mad_agent->agent.recv_handler().
> > This means recv == 0 so kmem_cache_free() is called with a
> > NULL pointer.
> 
> That could be fixed by changing the check for !recv prior to the
> kmem_cache_free there to a check for (!recv && local->mad_priv).

This is what we did to continue making progress so I know
it works.

> > Even if local->mad_priv != NULL, I don't see how local->mad_priv
> > is freed when recv == 1. Thus, it appears to be a memory leak.
> 
> For those cases, it's either freed in local_completions (as recv is
> set to 1 for local->mad_priv != NULL except when there is no mad recv
> agent but that is another bug (see below)) or earlier in the else
> clause of the IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY of
> handle_outgoing_dr_smp(). That's another issue that this points out
> where recv = 1 needs to be moved up in local_completions.

The other problem I noticed with setting recv = 1, is that recv = 0
is outside the while (!list_empty) loop so it is never reset back
to zero.

I'm not really following you about recv = 1 needs to be moved up in
local_completions.

What I was really looking for was a confirmation that the original
code had a memory leak. I don't see any reason to special case the
call to kmem_cache_free(). It seems to me that it is needed any time
local->mad_priv != NULL.
The NULL pointer bug is easily fixed in a number of different ways.

> Would you try the untested patch below and see if it fixes the problem
> you found ? Thanks.

We are in the middle of moving our office so I won't be able to
reproduce this until next week.

> -- Hal
> 
> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> index 5c54fc2..cca87e6 100644
> --- a/drivers/infiniband/core/mad.c
> +++ b/drivers/infiniband/core/mad.c
> @@ -2371,13 +2371,13 @@ static void local_completions(struct work_struct *work)
>                 list_del(&local->completion_list);
>                 spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>                 if (local->mad_priv) {
> +                       recv = 1;
>                         recv_mad_agent = local->recv_mad_agent;
>                         if (!recv_mad_agent) {
>                                 printk(KERN_ERR PFX "No receive MAD agent for lo
>                                 goto local_send_completion;
>                         }
> 
> -                       recv = 1;
>                         /*
>                          * Defined behavior is to complete response
>                          * before request
> @@ -2422,7 +2422,7 @@ local_send_completion:
> 
>                 spin_lock_irqsave(&mad_agent_priv->lock, flags);
>                 atomic_dec(&mad_agent_priv->refcount);
> -               if (!recv)
> +               if (!recv && local->mad_priv)
>                         kmem_cache_free(ib_mad_cache, local->mad_priv);
>                 kfree(local);
>         }
> 
> > So, I'm proposing the following patch:
> >
> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > index 5c54fc2..93d80e5 100644
> > --- a/drivers/infiniband/core/mad.c
> > +++ b/drivers/infiniband/core/mad.c
> > @@ -2356,7 +2356,6 @@ static void local_completions(struct work_struct *work)
> >        struct ib_mad_local_private *local;
> >        struct ib_mad_agent_private *recv_mad_agent;
> >        unsigned long flags;
> > -       int recv = 0;
> >        struct ib_wc wc;
> >        struct ib_mad_send_wc mad_send_wc;
> >
> > @@ -2377,7 +2376,6 @@ static void local_completions(struct work_struct *work)
> >                                goto local_send_completion;
> >                        }
> >
> > -                       recv = 1;
> >                        /*
> >                         * Defined behavior is to complete response
> >                         * before request
> > @@ -2422,7 +2420,7 @@ local_send_completion:
> >
> >                spin_lock_irqsave(&mad_agent_priv->lock, flags);
> >                atomic_dec(&mad_agent_priv->refcount);
> > -               if (!recv)
> > +               if (local->mad_priv)
> >                        kmem_cache_free(ib_mad_cache, local->mad_priv);
> >                kfree(local);
> >        }
> >
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >




More information about the general mailing list