[ofa-general] IPOB CM (NOSRQ) [PATCH V7] patch resubmit

Wed Jul 18 17:55:48 PDT 2007

Roland Dreier wrote:
> There's still some rather obvious problems with this patch.  It would
> really help if you would read over your patch again I think... anyway:
> 
>  > +#define CM_PACKET_SIZE (1ul << 16)
> 
> This duplicates IPOIB_CM_MTU I think... certainly it needs to be kept
> in sync with it somehow.

They are not quite the same. How about:
#define CM_PACKET_SIZE (ALIGN(IPOIB_CM_MTU, PAGE_SIZE))

This should keep the two in sync.

> 
>  > @@ -564,10 +574,9 @@ static inline void ipoib_cm_skb_too_long
>  >  	dev_kfree_skb_any(skb);
>  >  }
>  >  
>  > -static inline void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
>  > +void ipoib_cm_handle_rx_wc(struct net_device *dev, struct ib_wc *wc)
> 
> Why is this change here?  (This is in the CONFIG_INFINIBAND_IPOIB_CM=n
> part of ipoib.h)
> 
>  >  }
>  > -
>  >  #endif

Will do

> 
>  > -		ipoib_warn(priv, "post srq failed for buf %d (%d)\n", id, ret);
>  > +		ipoib_warn(priv, "post srq failed for buf %ld (%d)\n", id, ret);
> 
> extra noise here (and still wrong -- id might be long long on some
> architectures).

Correct, it should have been %lld

> 
>  > -		.event_handler = ipoib_cm_rx_event_handler,
> 
> why?  seems harmless to just leave this alone for all QPs even if an
> SRQ isn't attached.
> 

If memory serves me right, I tried that and ran into some inexplicable problems.
Maybe it was hang or no traffic went through -don't exactly recollect what it was.
After this change the problem went away.

> 
>  > +		spin_unlock_irq(&priv->lock);
>  > +		ipoib_warn(priv, "NOSRQ has reached the configurable limit "
>  > +			   "of either %d RC QPs or, max recv buf size of "
>  > +			   "0x%x MB\n", max_rc_qp, max_recv_buf);
> 
>  > +		ib_send_cm_rej(cm_id, IB_CM_REJ_NO_QP, NULL, 0, NULL, 0);
>  > +		ret = -EINVAL;
>  > +		goto err_alloc_and_post;
> 
> there's a bug here... you never undo the atomic_inc() of the number of
> RC QPs even though you exit without creating a new connection.

The atomic_dec() does happen, but that is in ipoib_cm_req_handler(). There are
several places where allocate_and_post_rbuf_nosrq() could return an error after 
the atomic_inc(). So, there is an atomic_dec() in the calling routine. On the
other hand I could move that to allocate_and_post_rbuf_nosrq() itself.

> 
>  > -	if (!p)
>  > +	if (!p) {
>  > +		printk(KERN_WARNING "Failed to allocate RX control block when "
>  > +		       "REQ arrived\n");
>  >  		return -ENOMEM;
>  > +	}
> 
> more unrelated changes... (feel free to send these as separate
> patches)
> 

OK

>  >  		kfree(p);
>  >  	}
>  >  
>  > +
>  >  	cancel_delayed_work(&priv->cm.stale_task);
>  >  }
> 
> extra noise in the patch
> 
>  > +		if (!priv->cm.srq) {
>  > +			atomic_dec(&current_rc_qp);
>  > +		}
> 
> no need for { } here

OK

> 
>  > +	/* We increase the size of the CQ in the NOSRQ case to prevent CQ
>  > +	 * overflow. Every new REQ creates a new RX QP and each QP has an
>  > +	 * RX ring associated with it. Therefore we could have
>  > +	 * NOSRQ_INDEX_TABLE_SIZE*ipoib_recvq_size + ipoib_sendq_size CQEs
>  > +	 * in a CQ.
>  > +	 */
>  > +	if (!priv->cm.srq)
>  > +		size += (NOSRQ_INDEX_TABLE_SIZE -1) * ipoib_recvq_size;
> 
> only need to do this if CM is enabled
> 
> space after - here please too.
> 
> that's just from a quick skim of the patch...
>

OK

Pradeep