[ofa-general] [PATCH] IB/core: ib_post_send_mad() returns zero but doesn't generate send completion

Ralph Campbell ralph.campbell at qlogic.com
Mon Mar 2 16:57:24 PST 2009


On Sun, 2009-03-01 at 17:23 -0800, Hal Rosenstock wrote:
> On Fri, Feb 27, 2009 at 4:45 PM, Ralph Campbell
> <ralph.campbell at qlogic.com> wrote:
> 
> Good catches.
> 
> > If ib_post_send_mad() returns zero, it guarantees that there will be
> > a callback to the send_buf->mad_agent->send_handler() so that the
> > sender can call ib_free_send_mad(). Otherwise, the ib_mad_send_buf
> > will be leaked and the mad_agent reference count will never go to zero
> > and the IB device module cannot be unloaded.
> > The above can happen without this patch if process_mad() returns
> > (IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED).
> 
> This looks right although I'm not sure why this wasn't seen a long time ago.

mthca returns IB_MAD_RESULT_CONSUMED for TRAP repress. I verified that
using libibumad to send a direct routed TRAP repress with hop=1/cnt=0
does hang on mthca. I think opensm only sends LID routed TRAP repress
so that would explain why this wasn't seen earlier.
For ipath, it was fairly recently that we made SubnSet(Portinfo) to
disabled to not send a reply and we have a program that uses libibumad
to disable ports. Without the patch, the program hangs.

> > If process_mad() returns IB_MAD_RESULT_SUCCESS and there is no agent
> > registered to receive the mad being sent, handle_outgoing_dr_smp()
> > returns zero which causes a MAD packet which is at the end of the
> > directed route to be incorrectly sent on the wire but doesn't cause
> > a hang since the HCA generates a send completion.
> 
> This also looks right.
> 
> Does the packet really get out on the IB wire ? Was that with a
> Mellanox HCA ? I don't recall ever seeing these extra packets but
> maybe they end up swallowed by certain HCAs.

Yes, a packet is sent with either mthca or ipath.

> Unfortunately, I have no current way of regressing these changes. I'm
> presuming you've done this with Mellanox HCAs and with OpenSM running.

opensm runs fine with the patch applied on either mthca or ipath.

> One other question is whether this should be separated into two increments.

I think the changes are small enough and related enough to justify a
single patch but I will split it into two if someone feels I should.

> -- Hal
> 
> > Signed-off-by: Ralph Campbell <ralph.campbell at qlogic.com>
> >
> > diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
> > index dbcd285..62a99dc 100644
> > --- a/drivers/infiniband/core/mad.c
> > +++ b/drivers/infiniband/core/mad.c
> > @@ -742,9 +742,7 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
> >                break;
> >        case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED:
> >                kmem_cache_free(ib_mad_cache, mad_priv);
> > -               kfree(local);
> > -               ret = 1;
> > -               goto out;
> > +               break;
> >        case IB_MAD_RESULT_SUCCESS:
> >                /* Treat like an incoming receive MAD */
> >                port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
> > @@ -755,10 +753,12 @@ static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
> >                                                        &mad_priv->mad.mad);
> >                }
> >                if (!port_priv || !recv_mad_agent) {
> > +                       /*
> > +                        * No receiving agent so drop packet and
> > +                        * generate send completion.
> > +                        */
> >                        kmem_cache_free(ib_mad_cache, mad_priv);
> > -                       kfree(local);
> > -                       ret = 0;
> > -                       goto out;
> > +                       break;
> >                }
> >                local->mad_priv = mad_priv;
> >                local->recv_mad_agent = recv_mad_agent;
> >
> >
> > _______________________________________________
> > general mailing list
> > general at lists.openfabrics.org
> > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >




More information about the general mailing list