[ofa-general] Re: IPoIB CM (NOSRQ) [PATCH 1] review
Pradeep Satyanarayana
pradeeps at linux.vnet.ibm.com
Tue Sep 18 12:54:29 PDT 2007
Sean,
In my earlier reply I missed two issues that you had pointed out. They
are addressed here.
Pradeep
>>> +static int allocate_and_post_rbuf_nosrq(struct ib_cm_id *cm_id,
>>> + struct ipoib_cm_rx *p, unsigned psn)
>> Function name is a little long. Maybe there should be multiple
>> functions here. (Use of 'and' in the function name points to multiple
>> functions that are grouped together. Maybe we should add a function
>> naming rule: if the function name contains 'and', create separate
>> functions...)
>
> Agreed, this can be dealt with the rest of the restructure.
>>> +{
>>> + struct net_device *dev = cm_id->context;
>>> + struct ipoib_dev_priv *priv = netdev_priv(dev);
>>> + int ret;
>>> + u32 qp_num, index;
>>> + u64 i, recv_mem_used;
>>> +
>>> + qp_num = p->qp->qp_num;
>> qp_num is only used in one place in this function, and only for a debug
>> print.
>
> OK
>
>>> +
>>> + /* In the SRQ case there is a common rx buffer called the srq_ring.
>>> + * However, for the NOSRQ case we create an rx_ring for every
>>> + * struct ipoib_cm_rx.
>>> + */
>>> + p->rx_ring = kzalloc(ipoib_recvq_size * sizeof *p->rx_ring,
>>> GFP_KERNEL);
>>> + if (!p->rx_ring) {
>>> + printk(KERN_WARNING "Failed to allocate rx_ring for 0x%x\n",
>>> + qp_num);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + spin_lock_irq(&priv->lock);
>>> + list_add(&p->list, &priv->cm.passive_ids);
>>> + spin_unlock_irq(&priv->lock);
>>> +
>>> + init_context_and_add_list(cm_id, p, priv);
>> stale_task thread could be executing on 'p' at this point. Is that
>> acceptable? (I'm pretty sure I pointed this out before, but I don't
>> remember what the response was.)
>
> In the previous review of version v6, you had caught bug (which I concurred)
> That has been fixed now.
>
>> We just added 'p' to the passive_ids list here, but
>> init_context_and_add_list() also adds it to the list, but only in the
>> srq case. It would be cleaner to always just add it to the list in
>> init_context_and_add_list() or always do it outside of the list.
>
> I am not sure I understand this. init_context_and_add_list() adds to the list
> conditionally.
>
>>> + spin_lock_irq(&priv->lock);
>> Including the call above, we end up acquiring this lock 3 times in a
>> row, setting 2 variables between the first and second time, and doing
>> nothing between the second and third time.
>
> Would it better to not drop the lock at all, but hold it till all 3 are done?
> This is not in the packet receive path, and hence not critical.
>
>>> +
>>> + for (index = 0; index < max_rc_qp; index++)
>>> + if (priv->cm.rx_index_table[index] == NULL)
>>> + break;
>> See previous comment about avoiding a linear search.
>>
>>> +
>>> + recv_mem_used = (u64)ipoib_recvq_size *
>>> + (u64)atomic_inc_return(¤t_rc_qp) * CM_PACKET_SIZE;
>>> + if ((index == max_rc_qp) ||
>>> + (recv_mem_used >= max_recv_buf * (1ul << 20))) {
>> I would prefer a single check against max_rc_qp. (Fold memory
>> constraints into limiting the value of max_rc_qp.) Otherwise, we can
>> end up allocating a larger array of rx_index_table than is actually
>> usable.
>>
>>> + spin_unlock_irq(&priv->lock);
>>> + ipoib_warn(priv, "NOSRQ has reached the configurable limit "
>>> + "of either %d RC QPs or, max recv buf size of "
>>> + "0x%x MB\n", max_rc_qp, max_recv_buf);
>>> +
>>> + /* We send a REJ to the remote side indicating that we
>>> + * have no more free RC QPs and leave it to the remote side
>>> + * to take appropriate action. This should leave the
>>> + * current set of QPs unaffected and any subsequent REQs
>>> + * will be able to use RC QPs if they are available.
>>> + */
>>> + ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0);
>>> + ret = -EINVAL;
>>> + goto err_alloc_and_post;
>>> + }
>>> +
>>> + priv->cm.rx_index_table[index] = p;
>>> + spin_unlock_irq(&priv->lock);
>>> +
>>> + /* We will subsequently use this stored pointer while freeing
>>> + * resources in stale task
>>> + */
>>> + p->index = index;
>> Is it dangerous to have this not set before releasing the lock? (It
>> doesn't look like it, but wanted to check.) Could anything be iterating
>> over the table expecting p->index to be set.
Only the stale task will iterate over the table. This initialization happens
when REQ is received. So, if this thread gets scheduled out before p->index
is set there may be a possibility of a race. Good catch!
>>
>>> +
>>> + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> + if (ret) {
>>> + ipoib_warn(priv, "ipoib_cm_modify_rx_qp() failed %d\n", ret);
>>> + ipoib_cm_dev_cleanup(dev);
>>> + goto err_alloc_and_post;
>>> + }
>>> +
>>> + for (i = 0; i < ipoib_recvq_size; ++i) {
>>> + if (!ipoib_cm_alloc_rx_skb(dev, i << 32 | index,
>>> + IPOIB_CM_RX_SG - 1,
>>> + p->rx_ring[i].mapping)) {
>>> + ipoib_warn(priv, "failed to allocate receive "
>>> + "buffer %d\n", (int)i);
>>> + ipoib_cm_dev_cleanup(dev);
>>> + ret = -ENOMEM;
>>> + goto err_alloc_and_post;
>>> + }
>>> +
>>> + if (post_receive_nosrq(dev, i << 32 | index)) {
>>> + ipoib_warn(priv, "post_receive_nosrq "
>>> + "failed for buf %lld\n", (unsigned long long)i);
>>> + ipoib_cm_dev_cleanup(dev);
>>> + ret = -EIO;
>>
>> Why not just do:
>>
>> ret = post_receive_nosrq()?
>> if (ret) ...
>>
>>> + goto err_alloc_and_post;
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +
>>> +err_alloc_and_post:
>>> + atomic_dec(¤t_rc_qp);
>>> + kfree(p->rx_ring);
>>> + list_del_init(&p->list);
>> We need a lock here.
>
> Agreed. You are correct.
>
>> Is priv->cm.rx_index_table[index] cleaned up anywhere?
>
> Yes, in dev_stop_nosrq().
>
>>> + return ret;
>>> +}
>>> +
>>> static int ipoib_cm_req_handler(struct ib_cm_id *cm_id, struct
>>> ib_cm_event *event)
>>> {
>>> struct net_device *dev = cm_id->context;
>>> @@ -301,9 +477,6 @@ static int ipoib_cm_req_handler(struct i
>>> return -ENOMEM;
>>> p->dev = dev;
>>> p->id = cm_id;
>>> - cm_id->context = p;
>>> - p->state = IPOIB_CM_RX_LIVE;
>>> - p->jiffies = jiffies;
>>> INIT_LIST_HEAD(&p->list);
>>>
>>> p->qp = ipoib_cm_create_rx_qp(dev, p);
>>> @@ -313,19 +486,21 @@ static int ipoib_cm_req_handler(struct i
>>> }
>>>
>>> psn = random32() & 0xffffff;
>>> - ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> - if (ret)
>>> - goto err_modify;
>>> + if (!priv->cm.srq) {
>>> + ret = allocate_and_post_rbuf_nosrq(cm_id, p, psn);
>>> + if (ret)
>>> + goto err_modify;
>>> + } else {
>>> + p->rx_ring = NULL;
>>> + ret = ipoib_cm_modify_rx_qp(dev, cm_id, p->qp, psn);
>>> + if (ret)
>>> + goto err_modify;
>>> + }
>>>
>>> - spin_lock_irq(&priv->lock);
>>> - queue_delayed_work(ipoib_workqueue,
>>> - &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>>> - /* Add this entry to passive ids list head, but do not re-add it
>>> - * if IB_EVENT_QP_LAST_WQE_REACHED has moved it to flush list. */
>>> - p->jiffies = jiffies;
>>> - if (p->state == IPOIB_CM_RX_LIVE)
>>> - list_move(&p->list, &priv->cm.passive_ids);
>>> - spin_unlock_irq(&priv->lock);
>>> + if (priv->cm.srq) {
>>> + p->state = IPOIB_CM_RX_LIVE;
>> This if can be merged with the previous if statement above, which
>> performs a similar check.
>>
>> Does it matter that the state is set outside of any locks?
No it does not matter, since we have not yet added p into the list as yet.
>>
>>> + init_context_and_add_list(cm_id, p, priv);
>>> + }
>>>
>>> ret = ipoib_cm_send_rep(dev, cm_id, p->qp,
>>> &event->param.req_rcvd, psn);
>>> if (ret) {
>>> @@ -398,29 +573,60 @@ static void skb_put_frags(struct sk_buff
>>> }
>>> }
>>>
>
More information about the general
mailing list