[ofa-general] Re: [PATCH RFC/untested v0.5] IPoIB/CM: fix SRQ WR leak
Michael S. Tsirkin
mst at dev.mellanox.co.il
Thu May 17 15:21:39 PDT 2007
> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC/untested v0.5] IPoIB/CM: fix SRQ WR leak
>
> > Hmm, I would like to quote the spec *literally*. Maybe
> > - and then invoke a Destroy QP or Reset QP.
>
> That's fine ... in fact the spec has a bullet for that line too (I
> just looked). The way you have it formatted now is visually kind of
> strange (the last line looks odd):
>
> > + * - Put the QP in the Error State
> > + * - Wait for the Affiliated Asynchronous Last WQE Reached Event;
> > + * - either:
> > + * drain the CQ by invoking the Poll CQ verb and either wait for CQ
> > + * to be empty or the number of Poll CQ operations has exceeded
> > + * CQ capacity size;
> > + * - or
> > + * post another WR that completes on the same CQ and wait for this
> > + * WR to return as a WC; (NB: this is the option that we use)
> > + * and then invoke a Destroy QP or Reset QP.
>
> which doesn't really match the spec's formatting... I guess this is
> pretty minor, but I would write the comment as:
>
> * - Put the QP in the Error State;
> * - wait for the Affiliated Asynchronous Last WQE Reached Event;
> * - either:
> * - drain the CQ by invoking the Poll CQ verb and either wait for CQ
> * to be empty or the number of Poll CQ operations has exceeded
> * CQ capacity size; or
> * - post another WR that completes on the same CQ and wait for this
> * WR to return as a WC;
> * - and then invoke a Destroy QP or Reset QP.
> *
> * For IPoIB we choose the second option of posting another WR, and we
> * keep a dedicated QP in the error state for doing this.
Yes, that's exactly what I did.
> > > This actually seems like a good motivation for the mthca RESET ->
> > > ERROR fix. We could avoid the transition to INIT if we fixed mthca
> > > and mlx4, right?
> >
> > Yes. That was the motivation.
>
> OK, I just queued the mthca and mlx4 versions of the RESET->ERROR fix.
> So I guess you can drop the dummy transition to INIT within IPoIB for
> the final version of the WQE leakage patch.
Yea. I'm just fixing it to work on top of the pkey change:
ipoib_cm_dev_stop will need a flush flag, and I think this means
I'll need a reap_list for RX connections like I do for TX:
so I can just go over this list and destroy all QPs safely
from inside the ipoib_wq.
BTW, it seems that for 2.6.23 the right thing to do will be
to merge TX and RX structures, and reuse some more of
the code between the two.
--
MST
More information about the general
mailing list