[ofa-general] Possible memory leak and null pointer dereference in local_completions()
Hal Rosenstock
hal.rosenstock at gmail.com
Wed Feb 4 15:35:15 PST 2009
On Wed, Feb 4, 2009 at 2:58 PM, Ralph Campbell
<ralph.campbell at qlogic.com> wrote:
> On Wed, 2009-02-04 at 04:29 -0800, Hal Rosenstock wrote:
>> On Tue, Feb 3, 2009 at 2:26 PM, Ralph Campbell
>> <ralph.campbell at qlogic.com> wrote:
>> > I was doing some tests with different MAD packets and
>> > then reading the infiniband/core/mad.c code.
>> >
>> > handle_outgoing_dr_smp() can queue a struct ib_mad_local_private *local
>> > on the mad_agent_priv->local_work work queue with
>> > local->mad_priv == NULL if device->process_mad() returns
>> > IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
>> > (!ib_response_mad(&mad_priv->mad.mad) ||
>> > !mad_agent_priv->agent.recv_handler).
>> >
>> > In this case, local_completions() will be called with
>> > local->mad_priv == NULL. The code does check for this
>> > case and skips calling recv_mad_agent->agent.recv_handler().
>> > This means recv == 0 so kmem_cache_free() is called with a
>> > NULL pointer.
>>
>> That could be fixed by changing the check for !recv prior to the
>> kmem_cache_free there to a check for (!recv && local->mad_priv).
>
> This is what we did to continue making progress so I know
> it works.
>
>> > Even if local->mad_priv != NULL, I don't see how local->mad_priv
>> > is freed when recv == 1. Thus, it appears to be a memory leak.
>>
>> For those cases, it's either freed in local_completions (as recv is
>> set to 1 for local->mad_priv != NULL except when there is no mad recv
>> agent but that is another bug (see below)) or earlier in the else
>> clause of the IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY of
>> handle_outgoing_dr_smp(). That's another issue that this points out
>> where recv = 1 needs to be moved up in local_completions.
>
> The other problem I noticed with setting recv = 1, is that recv = 0
> is outside the while (!list_empty) loop so it is never reset back
> to zero.
>
> I'm not really following you about recv = 1 needs to be moved up in
> local_completions.
I was referring to handling the case where local->mad_priv != NULL and
there is no mad recv agent:
if (local->mad_priv) {
recv_mad_agent = local->recv_mad_agent;
if (!recv_mad_agent) {
printk(KERN_ERR PFX "No receive MAD agent for lo
cal completion\n");
goto local_send_completion;
}
That was another case where there was a leak so I moved recv = 1 from
below this to above it just after the check of local->mad_priv in the
patch I proposed.
> What I was really looking for was a confirmation that the original
> code had a memory leak.
I need to look at this further for this. Haven't looked at this code
much in the past couple years.
> I don't see any reason to special case the
> call to kmem_cache_free(). It seems to me that it is needed any time
> local->mad_priv != NULL.
> The NULL pointer bug is easily fixed in a number of different ways.
I agree that if it turns out that this case was missed, then your
patch is simpler but it will take me a little bit to check this out.
>> Would you try the untested patch below and see if it fixes the problem
>> you found ? Thanks.
>
> We are in the middle of moving our office so I won't be able to
> reproduce this until next week.
I no longer have any test bed setup for this. Any chance you can
regress with the Mellanox HCAs to be sure this works there ? Part of
that testing should be running OpenSM as it creates some of those
cases.
-- Hal
>> -- Hal
>>
>> diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> index 5c54fc2..cca87e6 100644
>> --- a/drivers/infiniband/core/mad.c
>> +++ b/drivers/infiniband/core/mad.c
>> @@ -2371,13 +2371,13 @@ static void local_completions(struct work_struct *work)
>> list_del(&local->completion_list);
>> spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
>> if (local->mad_priv) {
>> + recv = 1;
>> recv_mad_agent = local->recv_mad_agent;
>> if (!recv_mad_agent) {
>> printk(KERN_ERR PFX "No receive MAD agent for lo
>> goto local_send_completion;
>> }
>>
>> - recv = 1;
>> /*
>> * Defined behavior is to complete response
>> * before request
>> @@ -2422,7 +2422,7 @@ local_send_completion:
>>
>> spin_lock_irqsave(&mad_agent_priv->lock, flags);
>> atomic_dec(&mad_agent_priv->refcount);
>> - if (!recv)
>> + if (!recv && local->mad_priv)
>> kmem_cache_free(ib_mad_cache, local->mad_priv);
>> kfree(local);
>> }
>>
>> > So, I'm proposing the following patch:
>> >
>> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
>> > index 5c54fc2..93d80e5 100644
>> > --- a/drivers/infiniband/core/mad.c
>> > +++ b/drivers/infiniband/core/mad.c
>> > @@ -2356,7 +2356,6 @@ static void local_completions(struct work_struct *work)
>> > struct ib_mad_local_private *local;
>> > struct ib_mad_agent_private *recv_mad_agent;
>> > unsigned long flags;
>> > - int recv = 0;
>> > struct ib_wc wc;
>> > struct ib_mad_send_wc mad_send_wc;
>> >
>> > @@ -2377,7 +2376,6 @@ static void local_completions(struct work_struct *work)
>> > goto local_send_completion;
>> > }
>> >
>> > - recv = 1;
>> > /*
>> > * Defined behavior is to complete response
>> > * before request
>> > @@ -2422,7 +2420,7 @@ local_send_completion:
>> >
>> > spin_lock_irqsave(&mad_agent_priv->lock, flags);
>> > atomic_dec(&mad_agent_priv->refcount);
>> > - if (!recv)
>> > + if (local->mad_priv)
>> > kmem_cache_free(ib_mad_cache, local->mad_priv);
>> > kfree(local);
>> > }
>> >
>> >
>> > _______________________________________________
>> > general mailing list
>> > general at lists.openfabrics.org
>> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>> >
>> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>> >
>
>
More information about the general
mailing list