[ofa-general] RE: [PATCH] ipoib/cm: make stale task actually run once in a while

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Fri May 11 10:29:05 PDT 2007


I see the first patch is in OFED-1.2-20070511-0600 now, I'll try it out.

Scott 

> -----Original Message-----
> From: Scott Weitzenkamp (sweitzen) 
> Sent: Wednesday, May 09, 2007 4:46 PM
> To: Michael S. Tsirkin; Scott Weitzenkamp (sweitzen)
> Cc: Yohad Dickman; Amit Krig; Tziporet Koren; 
> mst at mellanox.co.il; general at lists.openfabrics.org; Roland Dreier
> Subject: RE: [PATCH] ipoib/cm: make stale task actually run 
> once in a while
> 
> I see a new patch ipoib_correct_timers.patch in 
> OFED-1.2-20070509-0600, which patch should I try?
> 
> Scott 
> 
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] 
> > Sent: Monday, May 07, 2007 1:03 PM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: Yohad Dickman; Amit Krig; Tziporet Koren; 
> > mst at mellanox.co.il; general at lists.openfabrics.org; Roland Dreier
> > Subject: [PATCH] ipoib/cm: make stale task actually run once 
> > in a while
> > 
> > In the presence of some active passive connections, stale 
> > task would never run,
> > since each 4 RX CQEs we repeat queue_delayed_work calls which 
> > delays it for some
> > 10 minutes.  As a result, on a noisy system with failing 
> > ports, we slowly run
> > out of resources - slowing connection setup down and 
> > eventually failing.
> > 
> > What we actually want to do is - start stale task when a first
> > passive connection is added, rerun it every 10 min as long
> > as there are outstanding passive connections.
> > 
> > As a happy side effect, this removes some code from RX data path.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> > 
> > ---
> > 
> > Scott, I think this might address bugs 541 and 465: slow 
> > IPoIB CM HA failover
> > and eventual failing IPoIB HA. Could you test this please?
> > 
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index 2b242a4..b77e8d7 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -258,10 +258,11 @@ static int ipoib_cm_req_handler(struct 
> > ib_cm_id *cm_id, struct ib_cm_event *even
> >  	cm_id->context = p;
> >  	p->jiffies = jiffies;
> >  	spin_lock_irqsave(&priv->lock, flags);
> > +	if (list_empty(&priv->cm.passive_ids))
> > +		queue_delayed_work(ipoib_workqueue,
> > +				   &priv->cm.stale_task, 
> > IPOIB_CM_RX_DELAY);
> >  	list_add(&p->list, &priv->cm.passive_ids);
> >  	spin_unlock_irqrestore(&priv->lock, flags);
> > -	queue_delayed_work(ipoib_workqueue,
> > -			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> >  	return 0;
> >  
> >  err_rep:
> > @@ -380,8 +381,6 @@ void ipoib_cm_handle_rx_wc(struct 
> > net_device *dev, struct ib_wc *wc)
> >  			if (!list_empty(&p->list))
> >  				list_move(&p->list, 
> > &priv->cm.passive_ids);
> >  			spin_unlock_irqrestore(&priv->lock, flags);
> > -			queue_delayed_work(ipoib_workqueue,
> > -					   
> > &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> >  		}
> >  	}
> >  
> > @@ -1104,6 +1103,10 @@ static void ipoib_cm_stale_task(struct 
> > work_struct *work)
> >  		kfree(p);
> >  		spin_lock_irqsave(&priv->lock, flags);
> >  	}
> > +
> > +	if (!list_empty(&priv->cm.passive_ids))
> > +		queue_delayed_work(ipoib_workqueue,
> > +				   &priv->cm.stale_task, 
> > IPOIB_CM_RX_DELAY);
> >  	spin_unlock_irqrestore(&priv->lock, flags);
> >  }
> >  
> > -- 
> > MST
> > 



More information about the general mailing list