[ofa-general] Re: Re: [GIT PULL] please pull infiniband.git

Michael S. Tsirkin mst at dev.mellanox.co.il
Thu Mar 29 22:00:12 PDT 2007


> Quoting Roland Dreier <rdreier at cisco.com>:
> Subject: Re: Re: [GIT PULL] please pull infiniband.git
> 
>  > > I think the (ugly) solution that the IB spec authors had in mind is to
>  > > transition the QP to the error state and wait for the "last WQE reached"
>  > > affiliated event on that QP.
> 
>  > No, this does not work.
> 
>  > The last WQE reached event is on SRQ, not on QP, and it will never occur if we
>  > repost WRs on SRQ as we should to make other QPs on the same SRQ continue to
>  > work.
> 
> Look at the spec again.  The last WQE reached event is definitely
> affiliated with a QP (not an SRQ) and exists exactly to solve the
> problem we're talking about.

Right, I confused this with low watermark event.
In fact spec says explicitly:

                                                                                         3
        Note, for QPs that are associated with an SRQ, the Consumer should take 3
        the QP through the Error State before invoking a Destroy QP or a Modify 4
        QP to the Reset State. The Consumer may invoke the Destroy QP without
                                                                                         4
        first performing a Modify QP to the Error State and waiting for the Affiliated
                                                                                         4
ciation                Page 452                             Proprietary and Confidential
e Release 1.2          Software Transport Interface                     October 2004
ECIFICATIONS                                                           FINAL RELEASE
              Asynchronous Last WQE Reached Event. However, if the Consumer
                                                                                       1
              does not wait for the Affiliated Asynchronous Last WQE Reached Event,
                                                                                       2
              then WQE and Data Segment leakage may occur. Therefore, it is good
                                                                                       3
              programming practice to tear down a QP that is associated with an SRQ
              by using the following process:                                          4
                                                                                       5
              ?   Put the QP in the Error State;                                       6
                                                                                       7
              ?   wait for the Affiliated Asynchronous Last WQE Reached Event;
                                                                                       8
              ?   either:
                                                                                       9
                  ?   drain the CQ by invoking the Poll CQ verb and either wait for CQ
                                                                                       1
                      to be empty or the number of Poll CQ operations has exceeded
                                                                                       1
                      CQ capacity size; or
                                                                                       1
                  ?   post another WR that completes on the same CQ and wait for this
                                                                                       1
                      WR to return as a WC;
                                                                                       1
              ?   and then invoke a Destroy QP or Reset QP.
                                                                                       1
So the bug in in IPoIB CM and there only.

-- 
MST



More information about the general mailing list