[ofa-general] Possible memory leak and null pointer dereference in local_completions()

Hal Rosenstock hal.rosenstock at gmail.com
Wed Feb 4 15:35:15 PST 2009


On Wed, Feb 4, 2009 at 2:58 PM, Ralph Campbell
<ralph.campbell at qlogic.com> wrote:
> On Wed, 2009-02-04 at 04:29 -0800, Hal Rosenstock wrote:
>> On Tue, Feb 3, 2009 at 2:26 PM, Ralph Campbell
>> <ralph.campbell at qlogic.com> wrote:
>> > I was doing some tests with different MAD packets and
>> > then reading the infiniband/core/mad.c code.
>> >
>> > handle_outgoing_dr_smp() can queue a struct ib_mad_local_private *local
>> > on the mad_agent_priv->local_work work queue with
>> > local->mad_priv == NULL if device->process_mad() returns
>> > IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
>> > (!ib_response_mad(&mad_priv->mad.mad) ||
>> >  !mad_agent_priv->agent.recv_handler).
>> >
>> > In this case, local_completions() will be called with
>> > local->mad_priv == NULL. The code does check for this
>> > case and skips calling recv_mad_agent->agent.recv_handler().
>> > This means recv == 0 so kmem_cache_free() is called with a
>> > NULL pointer.
>>
>> That could be fixed by changing the check for !recv prior to the
>> kmem_cache_free there to a check for (!recv && local->mad_priv).
>
> This is what we did to continue making progress so I know
> it works.
>
>> > Even if local->mad_priv != NULL, I don't see how local->mad_priv
>> > is freed when recv == 1. Thus, it appears to be a memory leak.
>>
>> For those cases, it's either freed in local_completions (as recv is
>> set to 1 for local->mad_priv != NULL except when there is no mad recv
>> agent but that is another bug (see below)) or earlier in the else
>> clause of the IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY of
>> handle_outgoing_dr_smp(). That's another issue that this points out
>> where recv = 1 needs to be moved up in local_completions.
>
> The other problem I noticed with setting recv = 1, is that recv = 0
> is outside the while (!list_empty) loop so it is never reset back
> to zero.
>
> I'm not really following you about recv = 1 needs to be moved up in
> local_completions.

I was referring to handling the case where local->mad_priv != NULL and
there is no mad recv agent:

                if (local->mad_priv) {
                        recv_mad_agent = local->recv_mad_agent;
                        if (!recv_mad_agent) {
                                printk(KERN_ERR PFX "No receive MAD agent for lo
cal completion\n");
                                goto local_send_completion;
                        }

That was another case where there was a leak so I moved recv = 1 from
below this to above it just after the check of local->mad_priv in the
patch I proposed.

> What I was really looking for was a confirmation that the original
> code had a memory leak.

I need to look at this further for this. Haven't looked at this code
much in the past couple years.

> I don't see any reason to special case the
> call to kmem_cache_free(). It seems to me that it is needed any time
> local->mad_priv != NULL.
> The NULL pointer bug is easily fixed in a number of different ways.

I agree that if it turns out that this case was missed, then your
patch is simpler but it will take me a little bit to check this out.

>> Would you try the untested patch below and see if it fixes the problem
>> you found ? Thanks.
>
> We are in the middle of moving our office so I won't be able to
> reproduce this until next week.

I no longer have any test bed setup for this. Any chance you can
regress with the Mellanox HCAs to be sure this works there ? Part of
that testing should be running OpenSM as it creates some of those
cases.

-- Hal

>> -- Hal
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 5c54fc2..cca87e6 100644
>> --- a/drivers/infiniband/core/mad.c
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -2371,13 +2371,13 @@ static void local_completions(struct work_struct *work)
>>                 list_del(&local->completion_list);
>>                 spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>>                 if (local->mad_priv) {
>> +                       recv = 1;
>>                         recv_mad_agent = local->recv_mad_agent;
>>                         if (!recv_mad_agent) {
>>                                 printk(KERN_ERR PFX "No receive MAD agent for lo
>>                                 goto local_send_completion;
>>                         }
>>
>> -                       recv = 1;
>>                         /*
>>                          * Defined behavior is to complete response
>>                          * before request
>> @@ -2422,7 +2422,7 @@ local_send_completion:
>>
>>                 spin_lock_irqsave(&mad_agent_priv->lock, flags);
>>                 atomic_dec(&mad_agent_priv->refcount);
>> -               if (!recv)
>> +               if (!recv && local->mad_priv)
>>                         kmem_cache_free(ib_mad_cache, local->mad_priv);
>>                 kfree(local);
>>         }
>>
>> > So, I'm proposing the following patch:
>> >
>> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> > index 5c54fc2..93d80e5 100644
>> > --- a/drivers/infiniband/core/mad.c
>> > +++ b/drivers/infiniband/core/mad.c
>> > @@ -2356,7 +2356,6 @@ static void local_completions(struct work_struct *work)
>> >        struct ib_mad_local_private *local;
>> >        struct ib_mad_agent_private *recv_mad_agent;
>> >        unsigned long flags;
>> > -       int recv = 0;
>> >        struct ib_wc wc;
>> >        struct ib_mad_send_wc mad_send_wc;
>> >
>> > @@ -2377,7 +2376,6 @@ static void local_completions(struct work_struct *work)
>> >                                goto local_send_completion;
>> >                        }
>> >
>> > -                       recv = 1;
>> >                        /*
>> >                         * Defined behavior is to complete response
>> >                         * before request
>> > @@ -2422,7 +2420,7 @@ local_send_completion:
>> >
>> >                spin_lock_irqsave(&mad_agent_priv->lock, flags);
>> >                atomic_dec(&mad_agent_priv->refcount);
>> > -               if (!recv)
>> > +               if (local->mad_priv)
>> >                        kmem_cache_free(ib_mad_cache, local->mad_priv);
>> >                kfree(local);
>> >        }
>> >
>> >
>> > _______________________________________________
>> > general mailing list
>> > general at lists.openfabrics.org
>> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>> >
>> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>> >
>
>



More information about the general mailing list