[ofa-general] Re: [PATCH RFC/untested v0.5] IPoIB/CM: fix SRQ WR leak

Michael S. Tsirkin mst at dev.mellanox.co.il
Thu May 17 15:21:39 PDT 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: [PATCH RFC/untested v0.5] IPoIB/CM: fix SRQ WR leak
> 
>  > Hmm, I would like to quote the spec *literally*. Maybe
>  > - and then invoke a Destroy QP or Reset QP.
> 
> That's fine ... in fact the spec has a bullet for that line too (I
> just looked).  The way you have it formatted now is visually kind of
> strange (the last line looks odd):
> 
>  > + * - Put the QP in the Error State
>  > + * - Wait for the Affiliated Asynchronous Last WQE Reached Event;
>  > + * - either:
>  > + *       drain the CQ by invoking the Poll CQ verb and either wait for CQ
>  > + *       to be empty or the number of Poll CQ operations has exceeded
>  > + *       CQ capacity size;
>  > + * - or
>  > + *       post another WR that completes on the same CQ and wait for this
>  > + *       WR to return as a WC; (NB: this is the option that we use)
>  > + * and then invoke a Destroy QP or Reset QP.
> 
> which doesn't really match the spec's formatting... I guess this is
> pretty minor, but I would write the comment as:
> 
> * - Put the QP in the Error State;
> * - wait for the Affiliated Asynchronous Last WQE Reached Event;
> * - either:
> *   - drain the CQ by invoking the Poll CQ verb and either wait for CQ
> *     to be empty or the number of Poll CQ operations has exceeded
> *     CQ capacity size; or
> *   - post another WR that completes on the same CQ and wait for this
> *     WR to return as a WC;
> * - and then invoke a Destroy QP or Reset QP.
> *
> * For IPoIB we choose the second option of posting another WR, and we
> * keep a dedicated QP in the error state for doing this.

Yes, that's exactly what I did.

>  > > This actually seems like a good motivation for the mthca RESET ->
>  > > ERROR fix.  We could avoid the transition to INIT if we fixed mthca
>  > > and mlx4, right?
>  > 
>  > Yes. That was the motivation.
> 
> OK, I just queued the mthca and mlx4 versions of the RESET->ERROR fix.
> So I guess you can drop the dummy transition to INIT within IPoIB for
> the final version of the WQE leakage patch.

Yea. I'm just fixing it to work on top of the pkey change:
ipoib_cm_dev_stop will need a flush flag, and I think this means
I'll need a reap_list for RX connections like I do for TX:
so I can just go over this list and destroy all QPs safely
from inside the ipoib_wq.

BTW, it seems that for 2.6.23 the right thing to do will be
to merge TX and RX structures, and reuse some more of
the code between the two.

-- 
MST



More information about the general mailing list