[ofa-general] Re: IPoIB CM (NOSRQ) [PATCH 1] review

Tue Sep 18 12:54:29 PDT 2007

Sean,

In my earlier reply I missed two issues that you had pointed out. They
are addressed here.

Pradeep

>>> +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
>>> +                    struct ipoib_cm_rx *p, unsigned psn)
>> Function name is a little long.  Maybe there should be multiple
>> functions here.  (Use of 'and' in the function name points to multiple
>> functions that are grouped together.  Maybe we should add a function
>> naming rule: if the function name contains 'and', create separate
>> functions...)
> 
> Agreed, this can be dealt with the rest of the restructure.
>>> +{
>>> +    struct net_device *dev = cm_id->context;
>>> +    struct ipoib_dev_priv *priv = netdev_priv(dev);
>>> +    int ret;
>>> +    u32 qp_num, index;
>>> +    u64 i, recv_mem_used;
>>> +
>>> +    qp_num = p->qp->qp_num;
>> qp_num is only used in one place in this function, and only for a debug
>> print.
> 
> OK
> 
>>> +
>>> +    /* In the SRQ case there is a common rx buffer called the srq_ring.
>>> +     * However, for the NOSRQ case we create an rx_ring for every
>>> +     * struct ipoib_cm_rx.
>>> +     */
>>> +    p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring,
>>> GFP_KERNEL);
>>> +    if (!p->rx_ring) {
>>> +        printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
>>> +               qp_num);
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    spin_lock_irq(&priv->lock);
>>> +    list_add(&p->list, &priv->cm.passive_ids);
>>> +    spin_unlock_irq(&priv->lock);
>>> +
>>> +    init_context_and_add_list(cm_id, p, priv);
>> stale_task thread could be executing on 'p' at this point.  Is that
>> acceptable?  (I'm pretty sure I pointed this out before, but I don't
>> remember what the response was.)
> 
> In the previous review of version v6, you had caught bug (which I concurred)
> That has been fixed now.
> 
>> We just added 'p' to the passive_ids list here, but
>> init_context_and_add_list() also adds it to the list, but only in the
>> srq case.  It would be cleaner to always just add it to the list in
>> init_context_and_add_list() or always do it outside of the list.
> 
> I am not sure I understand this. init_context_and_add_list() adds to the list
> conditionally.
> 
>>> +    spin_lock_irq(&priv->lock);
>> Including the call above, we end up acquiring this lock 3 times in a
>> row, setting 2 variables between the first and second time, and doing
>> nothing between the second and third time.
> 
> Would it better to not drop the lock at all, but hold it till all 3 are done?
> This is not in the packet receive path, and hence not critical.
> 
>>> +
>>> +    for (index = 0; index < max_rc_qp; index++)
>>> +        if (priv->cm.rx_index_table[index] == NULL)
>>> +            break;
>> See previous comment about avoiding a linear search.
>>
>>> +
>>> +    recv_mem_used = (u64)ipoib_recvq_size *
>>> +            (u64)atomic_inc_return(&current_rc_qp) * CM_PACKET_SIZE;
>>> +    if ((index == max_rc_qp) ||
>>> +        (recv_mem_used >= max_recv_buf * (1ul << 20))) {
>> I would prefer a single check against max_rc_qp.  (Fold memory
>> constraints into limiting the value of max_rc_qp.)  Otherwise, we can
>> end up allocating a larger array of rx_index_table than is actually
>> usable.
>>
>>> +        spin_unlock_irq(&priv->lock);
>>> +        ipoib_warn(priv, "NOSRQ has reached the configurable limit "
>>> +               "of either %d RC QPs or, max recv buf size of "
>>> +               "0x%x MB\n", max_rc_qp, max_recv_buf);
>>> +
>>> +        /* We send a REJ to the remote side indicating that we
>>> +         * have no more free RC QPs and leave it to the remote side
>>> +         * to take appropriate action. This should leave the
>>> +         * current set of QPs unaffected and any subsequent REQs
>>> +         * will be able to use RC QPs if they are available.
>>> +         */
>>> +        ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0);
>>> +        ret = -EINVAL;
>>> +        goto err_alloc_and_post;
>>> +    }
>>> +
>>> +    priv->cm.rx_index_table[index] = p;
>>> +    spin_unlock_irq(&priv->lock);
>>> +
>>> +    /* We will subsequently use this stored pointer while freeing
>>> +     * resources in stale task
>>> +     */
>>> +    p->index = index;
>> Is it dangerous to have this not set before releasing the lock?  (It
>> doesn't look like it, but wanted to check.)  Could anything be iterating
>> over the table expecting p->index to be set.

Only the stale task will iterate over the table. This initialization happens
when REQ is received. So, if this thread gets scheduled out before p->index
is set there may be a possibility of a race. Good catch!

>>
>>> +
>>> +    ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> +    if (ret) {
>>> +        ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret);
>>> +        ipoib_cm_dev_cleanup(dev);
>>> +        goto err_alloc_and_post;
>>> +    }
>>> +
>>> +    for (i = 0; i < ipoib_recvq_size; ++i) {
>>> +        if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index,
>>> +                       IPOIB_CM_RX_SG - 1,
>>> +                       p->rx_ring[i].mapping)) {
>>> +            ipoib_warn(priv, "failed to allocate receive "
>>> +                   "buffer %d\n", (int)i);
>>> +            ipoib_cm_dev_cleanup(dev);
>>> +            ret = -ENOMEM;
>>> +            goto err_alloc_and_post;
>>> +        }
>>> +
>>> +        if (post_receive_nosrq(dev, i << 32 | index)) {
>>> +            ipoib_warn(priv, "post_receive_nosrq "
>>> +                   "failed for  buf %lld\n", (unsigned long long)i);
>>> +            ipoib_cm_dev_cleanup(dev);
>>> +            ret = -EIO;
>>
>> Why not just do:
>>
>> ret = post_receive_nosrq()?
>> if (ret) ...
>>
>>> +            goto err_alloc_and_post;
>>> +        }
>>> +    }
>>> +
>>> +    return 0;
>>> +
>>> +err_alloc_and_post:
>>> +    atomic_dec(&current_rc_qp);
>>> +    kfree(p->rx_ring);
>>> +    list_del_init(&p->list);
>> We need a lock here.
> 
> Agreed. You are correct.
> 
>> Is priv->cm.rx_index_table[index] cleaned up anywhere?
> 
> Yes, in dev_stop_nosrq().
> 
>>> +    return ret;
>>> +}
>>> +
>>>  static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct
>>> ib_cm_event *event)
>>>  {
>>>      struct net_device *dev = cm_id->context;
>>> @@ -301,9 +477,6 @@ static int ipoib_cm_req_handler(struct i
>>>          return -ENOMEM;
>>>      p->dev = dev;
>>>      p->id = cm_id;
>>> -    cm_id->context = p;
>>> -    p->state = IPOIB_CM_RX_LIVE;
>>> -    p->jiffies = jiffies;
>>>      INIT_LIST_HEAD(&p->list);
>>>  
>>>      p->qp = ipoib_cm_create_rx_qp(dev, p);
>>> @@ -313,19 +486,21 @@ static int ipoib_cm_req_handler(struct i
>>>      }
>>>  
>>>      psn = random32() & 0xffffff;
>>> -    ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> -    if (ret)
>>> -        goto err_modify;
>>> +    if (!priv->cm.srq) {
>>> +        ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn);
>>> +        if (ret)
>>> +            goto err_modify;
>>> +    } else {
>>> +        p->rx_ring = NULL;
>>> +        ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> +        if (ret)
>>> +            goto err_modify;
>>> +    }
>>>  
>>> -    spin_lock_irq(&priv->lock);
>>> -    queue_delayed_work(ipoib_workqueue,
>>> -               &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>>> -    /* Add this entry to passive ids list head, but do not re-add it
>>> -     * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
>>> -    p->jiffies = jiffies;
>>> -    if (p->state == IPOIB_CM_RX_LIVE)
>>> -        list_move(&p->list, &priv->cm.passive_ids);
>>> -    spin_unlock_irq(&priv->lock);
>>> +    if (priv->cm.srq) {
>>> +        p->state = IPOIB_CM_RX_LIVE;
>> This if can be merged with the previous if statement above, which
>> performs a similar check.
>>
>> Does it matter that the state is set outside of any locks?

No it does not matter, since we have not yet added p into the list as yet.

>>
>>> +        init_context_and_add_list(cm_id, p, priv);
>>> +    }
>>>  
>>>      ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
>>> &event->param.req_rcvd, psn);
>>>      if (ret) {
>>> @@ -398,29 +573,60 @@ static void skb_put_frags(struct sk_buff
>>>      }
>>>  }
>>>  
>