[ofa-general] [PATCH][RFC] IB: Return "maybe missed event" hint from ib_req_notify_cq()

Hoang-Nam Nguyen HNGUYEN at de.ibm.com
Mon Apr 30 13:11:03 PDT 2007


Hi Roland!
As far as this concerns ehca this looks great.
Thanks
Nam

general-bounces at lists.openfabrics.org wrote on 27.04.2007 00:43:19:

>  >   - "IB: Return "maybe missed event" hint from ib_req_notify_cq()"
>  >     This extends the API in a way that lets us implement NAPI, but may
>  >     be useful for other things too.  It touches all the drivers, and I
>  >     still need to finish updating cxgb3 to work correctly.  I haven't
>  >     heard anything negative about this, so I'll fix it up, post it one
>  >     more time for review, and plan on merging it.
>
> As promised, here is that patch for review, with a cxgb3
> implementation included.
>
> ---
>
> The semantics defined by the InfiniBand specification say that
> completion events are only generated when a completions is added to a
> completion queue (CQ) after completion notification is requested.  In
> other words, this means that the following race is possible:
>
>    while (CQ is not empty)
>       ib_poll_cq(CQ);
>    // new completion is added after while loop is exited
>    ib_req_notify_cq(CQ);
>    // no event is generated for the existing completion
>
> To close this race, the IB spec recommends doing another poll of the
> CQ after requesting notification.
>
> However, it is not always possible to arrange code this way (for
> example, we have found that NAPI for IPoIB cannot poll after
> requesting notification).  Also, some hardware (eg Mellanox HCAs)
> actually will generate an event for completions added before the call
> to ib_req_notify_cq() -- which is allowed by the spec, since there's
> no way for any upper-layer consumer to know exactly when a completion
> was really added -- so the extra poll of the CQ is just a waste.
>
> Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
> ib_req_notify_cq() so that it can return a hint about whether the a
> completion may have been added before the request for notification.
> The return value of ib_req_notify_cq() is extended so:
>
>     < 0   means an error occurred while requesting notification
>    == 0   means notification was requested successfully, and if
>       IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
>       events were missed and it is safe to wait for another
>       event.
>     > 0   is only returned if IB_CQ_REPORT_MISSED_EVENTS was
>       passed in.  It means that the consumer must poll the
>       CQ again to make sure it is empty to avoid the race
>       described above.
>
> We add a flag to enable this behavior rather than turning it on
> unconditionally, because checking for missed events may incur
> significant overhead for some low-level drivers, and consumers that
> don't care about the results of this test shouldn't be forced to pay
> for the test.
>
> Signed-off-by: Roland Dreier <rolandd at cisco.com>
> ---
>  drivers/infiniband/hw/amso1100/c2.h         |    2 +-
>  drivers/infiniband/hw/amso1100/c2_cq.c      |   16 ++++++++---
>  drivers/infiniband/hw/cxgb3/cxio_hal.c      |    3 ++
>  drivers/infiniband/hw/cxgb3/iwch_provider.c |    8 +++--
>  drivers/infiniband/hw/ehca/ehca_iverbs.h    |    2 +-
>  drivers/infiniband/hw/ehca/ehca_reqs.c      |   14 +++++++--
>  drivers/infiniband/hw/ehca/ipz_pt_fn.h      |    8 +++++
>  drivers/infiniband/hw/ipath/ipath_cq.c      |   15 +++++++---
>  drivers/infiniband/hw/ipath/ipath_verbs.h   |    2 +-
>  drivers/infiniband/hw/mthca/mthca_cq.c      |   12 +++++---
>  drivers/infiniband/hw/mthca/mthca_dev.h     |    4 +-
>  include/rdma/ib_verbs.h                     |   40
> +++++++++++++++++++++------
>  12 files changed, 93 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/infiniband/hw/amso1100/c2.h
> b/drivers/infiniband/hw/amso1100/c2.h
> index 04a9db5..fa58200 100644
> --- a/drivers/infiniband/hw/amso1100/c2.h
> +++ b/drivers/infiniband/hw/amso1100/c2.h
> @@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2dev,
> struct c2_cq *cq);
>  extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index);
>  extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp,
u32mq_index);
>  extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct
> ib_wc *entry);
> -extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
> +extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags);
>
>  /* CM */
>  extern int c2_llp_connect(struct iw_cm_id *cm_id,
> diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c
> b/drivers/infiniband/hw/amso1100/c2_cq.c
> index 5175c99..d2b3366 100644
> --- a/drivers/infiniband/hw/amso1100/c2_cq.c
> +++ b/drivers/infiniband/hw/amso1100/c2_cq.c
> @@ -217,17 +217,19 @@ int c2_poll_cq(struct ib_cq *ibcq, int
> num_entries, struct ib_wc *entry)
>     return npolled;
>  }
>
> -int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
> +int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags)
>  {
>     struct c2_mq_shared __iomem *shared;
>     struct c2_cq *cq;
> +   unsigned long flags;
> +   int ret = 0;
>
>     cq = to_c2cq(ibcq);
>     shared = cq->mq.peer;
>
> -   if (notify == IB_CQ_NEXT_COMP)
> +   if ((notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_NEXT_COMP)
>        writeb(C2_CQ_NOTIFICATION_TYPE_NEXT, &shared->notification_type);
> -   else if (notify == IB_CQ_SOLICITED)
> +   else if ((notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>        writeb(C2_CQ_NOTIFICATION_TYPE_NEXT_SE,
&shared->notification_type);
>     else
>        return -EINVAL;
> @@ -241,7 +243,13 @@ int c2_arm_cq(struct ib_cq *ibcq, enum
> ib_cq_notify notify)
>      */
>     readb(&shared->armed);
>
> -   return 0;
> +   if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) {
> +      spin_lock_irqsave(&cq->lock, flags);
> +      ret = !c2_mq_empty(&cq->mq);
> +      spin_unlock_irqrestore(&cq->lock, flags);
> +   }
> +
> +   return ret;
>  }
>
>  static void c2_free_cq_buf(struct c2_dev *c2dev, struct c2_mq *mq)
> diff --git a/drivers/infiniband/hw/cxgb3/cxio_hal.c
> b/drivers/infiniband/hw/cxgb3/cxio_hal.c
> index f5e9aee..76049af 100644
> --- a/drivers/infiniband/hw/cxgb3/cxio_hal.c
> +++ b/drivers/infiniband/hw/cxgb3/cxio_hal.c
> @@ -114,7 +114,10 @@ int cxio_hal_cq_op(struct cxio_rdev *rdev_p,
> struct t3_cq *cq,
>              return -EIO;
>           }
>        }
> +
> +      return 1;
>     }
> +
>     return 0;
>  }
>
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 24e0df0..e89957f 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -292,7 +292,7 @@ static int iwch_resize_cq(struct ib_cq *cq, int
> cqe, struct ib_udata *udata)
>  #endif
>  }
>
> -static int iwch_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
> +static int iwch_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags
flags)
>  {
>     struct iwch_dev *rhp;
>     struct iwch_cq *chp;
> @@ -303,7 +303,7 @@ static int iwch_arm_cq(struct ib_cq *ibcq, enum
> ib_cq_notify notify)
>
>     chp = to_iwch_cq(ibcq);
>     rhp = chp->rhp;
> -   if (notify == IB_CQ_SOLICITED)
> +   if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>        cq_op = CQ_ARM_SE;
>     else
>        cq_op = CQ_ARM_AN;
> @@ -317,9 +317,11 @@ static int iwch_arm_cq(struct ib_cq *ibcq, enum
> ib_cq_notify notify)
>     PDBG("%s rptr 0x%x\n", __FUNCTION__, chp->cq.rptr);
>     err = cxio_hal_cq_op(&rhp->rdev, &chp->cq, cq_op, 0);
>     spin_unlock_irqrestore(&chp->lock, flag);
> -   if (err)
> +   if (err < 0)
>        printk(KERN_ERR MOD "Error %d rearming CQID 0x%x\n", err,
>               chp->cq.cqid);
> +   if (err > 0 && !(flags & IB_CQ_REPORT_MISSED_EVENTS))
> +      err = 0;
>     return err;
>  }
>
> diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h
> b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> index 95fd59f..9e5460d 100644
> --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
> +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> @@ -135,7 +135,7 @@ int ehca_poll_cq(struct ib_cq *cq, int
> num_entries, struct ib_wc *wc);
>
>  int ehca_peek_cq(struct ib_cq *cq, int wc_cnt);
>
> -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify);
> +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags
> notify_flags);
>
>  struct ib_qp *ehca_create_qp(struct ib_pd *pd,
>                struct ib_qp_init_attr *init_attr,
> diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c
> b/drivers/infiniband/hw/ehca/ehca_reqs.c
> index 08d3f89..caec9de 100644
> --- a/drivers/infiniband/hw/ehca/ehca_reqs.c
> +++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
> @@ -634,11 +634,13 @@ poll_cq_exit0:
>     return ret;
>  }
>
> -int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify)
> +int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags
> notify_flags)
>  {
>     struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);
> +   unsigned long spl_flags;
> +   int ret = 0;
>
> -   switch (cq_notify) {
> +   switch (notify_flags & IB_CQ_SOLICITED_MASK) {
>     case IB_CQ_SOLICITED:
>        hipz_set_cqx_n0(my_cq, 1);
>        break;
> @@ -649,5 +651,11 @@ int ehca_req_notify_cq(struct ib_cq *cq, enum
> ib_cq_notify cq_notify)
>        return -EINVAL;
>     }
>
> -   return 0;
> +   if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) {
> +      spin_lock_irqsave(&my_cq->spinlock, spl_flags);
> +      ret = ipz_qeit_is_valid(&my_cq->ipz_queue);
> +      spin_unlock_irqrestore(&my_cq->spinlock, spl_flags);
> +   }
> +
> +   return ret;
>  }
> diff --git a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
> b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
> index 8199c45..57f141a 100644
> --- a/drivers/infiniband/hw/ehca/ipz_pt_fn.h
> +++ b/drivers/infiniband/hw/ehca/ipz_pt_fn.h
> @@ -140,6 +140,14 @@ static inline void
> *ipz_qeit_get_inc_valid(struct ipz_queue *queue)
>     return cqe;
>  }
>
> +static inline int ipz_qeit_is_valid(struct ipz_queue *queue)
> +{
> +   struct ehca_cqe *cqe = ipz_qeit_get(queue);
> +   u32 cqe_flags = cqe->cqe_flags;
> +
> +   return cqe_flags >> 7 == (queue->toggle_state & 1);
> +}
> +
>  /*
>   * returns and resets Queue Entry iterator
>   * returns address (kv) of first Queue Entry
> diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c
> b/drivers/infiniband/hw/ipath/ipath_cq.c
> index 87462e0..9582145 100644
> --- a/drivers/infiniband/hw/ipath/ipath_cq.c
> +++ b/drivers/infiniband/hw/ipath/ipath_cq.c
> @@ -306,17 +306,18 @@ int ipath_destroy_cq(struct ib_cq *ibcq)
>  /**
>   * ipath_req_notify_cq - change the notification type for a completion
queue
>   * @ibcq: the completion queue
> - * @notify: the type of notification to request
> + * @notify_flags: the type of notification to request
>   *
>   * Returns 0 for success.
>   *
>   * This may be called from interrupt context.  Also called by
>   * ib_req_notify_cq() in the generic verbs code.
>   */
> -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
> +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags
> notify_flags)
>  {
>     struct ipath_cq *cq = to_icq(ibcq);
>     unsigned long flags;
> +   int ret = 0;
>
>     spin_lock_irqsave(&cq->lock, flags);
>     /*
> @@ -324,9 +325,15 @@ int ipath_req_notify_cq(struct ib_cq *ibcq,
> enum ib_cq_notify notify)
>      * any other transitions (see C11-31 and C11-32 in ch. 11.4.2.2).
>      */
>     if (cq->notify != IB_CQ_NEXT_COMP)
> -      cq->notify = notify;
> +      cq->notify = notify_flags & IB_CQ_SOLICITED_MASK;
> +
> +   if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) &&
> +       cq->queue->head != cq->queue->tail)
> +      ret = 1;
> +
>     spin_unlock_irqrestore(&cq->lock, flags);
> -   return 0;
> +
> +   return ret;
>  }
>
>  /**
> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h
> b/drivers/infiniband/hw/ipath/ipath_verbs.h
> index c0c8d5b..6b3b770 100644
> --- a/drivers/infiniband/hw/ipath/ipath_verbs.h
> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
> @@ -716,7 +716,7 @@ struct ib_cq *ipath_create_cq(struct ib_device
> *ibdev, int entries,
>
>  int ipath_destroy_cq(struct ib_cq *ibcq);
>
> -int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
> +int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags
> notify_flags);
>
>  int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata
*udata);
>
> diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c
> b/drivers/infiniband/hw/mthca/mthca_cq.c
> index efd79ef..cf0868f 100644
> --- a/drivers/infiniband/hw/mthca/mthca_cq.c
> +++ b/drivers/infiniband/hw/mthca/mthca_cq.c
> @@ -726,11 +726,12 @@ repoll:
>     return err == 0 || err == -EAGAIN ? npolled : err;
>  }
>
> -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify)
> +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags)
>  {
>     __be32 doorbell[2];
>
> -   doorbell[0] = cpu_to_be32((notify == IB_CQ_SOLICITED ?
> +   doorbell[0] = cpu_to_be32(((flags & IB_CQ_SOLICITED_MASK) ==
> +               IB_CQ_SOLICITED ?
>                 MTHCA_TAVOR_CQ_DB_REQ_NOT_SOL :
>                 MTHCA_TAVOR_CQ_DB_REQ_NOT)      |
>                to_mcq(cq)->cqn);
> @@ -743,7 +744,7 @@ int mthca_tavor_arm_cq(struct ib_cq *cq, enum
> ib_cq_notify notify)
>     return 0;
>  }
>
> -int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
> +int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags
flags)
>  {
>     struct mthca_cq *cq = to_mcq(ibcq);
>     __be32 doorbell[2];
> @@ -755,7 +756,8 @@ int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum
> ib_cq_notify notify)
>
>     doorbell[0] = ci;
>     doorbell[1] = cpu_to_be32((cq->cqn << 8) | (2 << 5) | (sn << 3) |
> -              (notify == IB_CQ_SOLICITED ? 1 : 2));
> +              ((flags & IB_CQ_SOLICITED_MASK) ==
> +               IB_CQ_SOLICITED ? 1 : 2));
>
>     mthca_write_db_rec(doorbell, cq->arm_db);
>
> @@ -766,7 +768,7 @@ int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum
> ib_cq_notify notify)
>     wmb();
>
>     doorbell[0] = cpu_to_be32((sn << 28)                       |
> -              (notify == IB_CQ_SOLICITED ?
> +              ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
>                 MTHCA_ARBEL_CQ_DB_REQ_NOT_SOL :
>                 MTHCA_ARBEL_CQ_DB_REQ_NOT)      |
>                cq->cqn);
> diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h
> b/drivers/infiniband/hw/mthca/mthca_dev.h
> index b7e42ef..9bae3cc 100644
> --- a/drivers/infiniband/hw/mthca/mthca_dev.h
> +++ b/drivers/infiniband/hw/mthca/mthca_dev.h
> @@ -495,8 +495,8 @@ void mthca_unmap_eq_icm(struct mthca_dev *dev);
>
>  int mthca_poll_cq(struct ib_cq *ibcq, int num_entries,
>          struct ib_wc *entry);
> -int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
> -int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
> +int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
> +int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
>  int mthca_init_cq(struct mthca_dev *dev, int nent,
>          struct mthca_ucontext *ctx, u32 pdn,
>          struct mthca_cq *cq);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 765589f..529a69d 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -431,9 +431,11 @@ struct ib_wc {
>     u8         port_num;   /* valid only for DR SMPs on switches */
>  };
>
> -enum ib_cq_notify {
> -   IB_CQ_SOLICITED,
> -   IB_CQ_NEXT_COMP
> +enum ib_cq_notify_flags {
> +   IB_CQ_SOLICITED         = 1 << 0,
> +   IB_CQ_NEXT_COMP         = 1 << 1,
> +   IB_CQ_SOLICITED_MASK      = IB_CQ_SOLICITED | IB_CQ_NEXT_COMP,
> +   IB_CQ_REPORT_MISSED_EVENTS   = 1 << 2,
>  };
>
>  enum ib_srq_attr_mask {
> @@ -987,7 +989,7 @@ struct ib_device {
>                       struct ib_wc *wc);
>     int                        (*peek_cq)(struct ib_cq *cq, int wc_cnt);
>     int                        (*req_notify_cq)(struct ib_cq *cq,
> -                      enum ib_cq_notify cq_notify);
> +                      enum ib_cq_notify_flags flags);
>     int                        (*req_ncomp_notif)(struct ib_cq *cq,
>                          int wc_cnt);
>     struct ib_mr *             (*get_dma_mr)(struct ib_pd *pd,
> @@ -1414,14 +1416,34 @@ int ib_peek_cq(struct ib_cq *cq, int wc_cnt);
>  /**
>   * ib_req_notify_cq - Request completion notification on a CQ.
>   * @cq: The CQ to generate an event for.
> - * @cq_notify: If set to %IB_CQ_SOLICITED, completion notification will
> - *   occur on the next solicited event. If set to %IB_CQ_NEXT_COMP,
> - *   notification will occur on the next completion.
> + * @flags:
> + *   Must contain exactly one of %IB_CQ_SOLICITED or %IB_CQ_NEXT_COMP
> + *   to request an event on the next solicited event or next work
> + *   completion at any type, respectively. %IB_CQ_REPORT_MISSED_EVENTS
> + *   may also be |ed in to request a hint about missed events, as
> + *   described below.
> + *
> + * Return Value:
> + *    < 0 means an error occurred while requesting notification
> + *   == 0 means notification was requested successfully, and if
> + *        IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events
> + *        were missed and it is safe to wait for another event.  In
> + *        this case is it guaranteed that any work completions added
> + *        to the CQ since the last CQ poll will trigger a completion
> + *        notification event.
> + *    > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed
> + *        in.  It means that the consumer must poll the CQ again to
> + *        make sure it is empty to avoid missing an event because of a
> + *        race between requesting notification and an entry being
> + *        added to the CQ.  This return value means it is possible
> + *        (but not guaranteed) that a work completion has been added
> + *        to the CQ since the last poll without triggering a
> + *        completion notification event.
>   */
>  static inline int ib_req_notify_cq(struct ib_cq *cq,
> -               enum ib_cq_notify cq_notify)
> +               enum ib_cq_notify_flags flags)
>  {
> -   return cq->device->req_notify_cq(cq, cq_notify);
> +   return cq->device->req_notify_cq(cq, flags);
>  }
>
>  /**
> --
> 1.5.1.2
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>
> To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general




More information about the general mailing list