[ofa-general] Re: [PATCH 4/4] [RFC] IPoIB/cm: Add connected mode support for devices without SRQs

Roland Dreier rdreier at cisco.com
Tue Nov 20 14:31:40 PST 2007


 > static int ipoib_cm_nonsrq_init_rx(struct net_device *dev, struct ib_cm_id *cm_id,
 >                                   struct ipoib_cm_rx *rx)

...

 >                 if (!ipoib_cm_alloc_rx_skb(dev, rx->rx_ring, i, IPOIB_CM_RX_SG - 1,
 >                                            rx->rx_ring[i].mapping)) {
 >                         ipoib_warn(priv, "failed to allocate receive buffer %d\n", i);
 >                                 ret = -ENOMEM;

 > This will cause an skb leak- see generic comments about ipoib_cm_alloc_rx_skb() below

Thanks... I fixed the error path here so that it cleans up properly I
think.

 > Generic comment about ipoib_cm_alloc_rx_skb() which is true for both the srq and non srq cases 
 > (except in the receive wc handler):
 > I find that there will be skb leakage if ipoib_cm_alloc_rx_skb() fails before all the rx skbs
 > are allocated. We must undo those allocations and mappings. Probably we should call 
 > ipoib_cm_dev_cleanup() and free the skbs and do the unmap in that routine.

I audited all the calls to ipoib_cm_alloc_rx_skb() and I think they
are all OK now.

 >                 if (wr_id == (IPOIB_CM_RX_DRAIN_WRID & ~(IPOIB_OP_CM | IPOIB_OP_RECV))) {
 >                         spin_lock_irqsave(&priv->lock, flags);
 >                         list_splice_init(&priv->cm.rx_drain_list, &priv->cm.rx_reap_list);
 >                         ipoib_cm_start_rx_drain(priv);
 > 
 > 
 > I do not understand why we need to call ipoib_cm_start_rx_drain(). We have already received a work 
 > completion with RX_DRAIN set.

This is not new code... it is there since the original SRQ cleanup
code in 518b1646.  Basically after one drain completion, we need to go
back and start draining any new connections that are waiting to clean up.

 >                 if (!p->rx_ring)
 > 
 > 
 > If rx_ring is indeed NULL, there is no point in continuing further,
 > we will probably crash sooner or later. We should insert a BUG_ON and 
 > thus crash the system.

This is actually checking whether the rx structure has an rx_ring or
not -- it is an obfuscated way of checking whether we are in the SRQ
case or not.  I changed things so that there is an explicit "has_srq"
local variable in the RX wc handler to make things clearer.

 > repost:
 >         if (p->rx_ring) {
 > 
 > 
 > Shouldn't this be if(!ipoib_cm_has_srq())?

It's equivalent as I mentioned above.  Anyway I changed this to if (has_srq).

I pushed out a new tree.  Let me know what you think now.

 - R.



More information about the general mailing list