[ofa-general] RE: [PATCH] ipoib/cm: make stale task actually run once in a while

Scott Weitzenkamp (sweitzen) sweitzen at cisco.com
Wed May 9 16:45:33 PDT 2007


I see a new patch ipoib_correct_timers.patch in OFED-1.2-20070509-0600,
which patch should I try?

Scott 

> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il] 
> Sent: Monday, May 07, 2007 1:03 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Yohad Dickman; Amit Krig; Tziporet Koren; 
> mst at mellanox.co.il; general at lists.openfabrics.org; Roland Dreier
> Subject: [PATCH] ipoib/cm: make stale task actually run once 
> in a while
> 
> In the presence of some active passive connections, stale 
> task would never run,
> since each 4 RX CQEs we repeat queue_delayed_work calls which 
> delays it for some
> 10 minutes.  As a result, on a noisy system with failing 
> ports, we slowly run
> out of resources - slowing connection setup down and 
> eventually failing.
> 
> What we actually want to do is - start stale task when a first
> passive connection is added, rerun it every 10 min as long
> as there are outstanding passive connections.
> 
> As a happy side effect, this removes some code from RX data path.
> 
> Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> 
> ---
> 
> Scott, I think this might address bugs 541 and 465: slow 
> IPoIB CM HA failover
> and eventual failing IPoIB HA. Could you test this please?
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 2b242a4..b77e8d7 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -258,10 +258,11 @@ static int ipoib_cm_req_handler(struct 
> ib_cm_id *cm_id, struct ib_cm_event *even
>  	cm_id->context = p;
>  	p->jiffies = jiffies;
>  	spin_lock_irqsave(&priv->lock, flags);
> +	if (list_empty(&priv->cm.passive_ids))
> +		queue_delayed_work(ipoib_workqueue,
> +				   &priv->cm.stale_task, 
> IPOIB_CM_RX_DELAY);
>  	list_add(&p->list, &priv->cm.passive_ids);
>  	spin_unlock_irqrestore(&priv->lock, flags);
> -	queue_delayed_work(ipoib_workqueue,
> -			   &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>  	return 0;
>  
>  err_rep:
> @@ -380,8 +381,6 @@ void ipoib_cm_handle_rx_wc(struct 
> net_device *dev, struct ib_wc *wc)
>  			if (!list_empty(&p->list))
>  				list_move(&p->list, 
> &priv->cm.passive_ids);
>  			spin_unlock_irqrestore(&priv->lock, flags);
> -			queue_delayed_work(ipoib_workqueue,
> -					   
> &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
>  		}
>  	}
>  
> @@ -1104,6 +1103,10 @@ static void ipoib_cm_stale_task(struct 
> work_struct *work)
>  		kfree(p);
>  		spin_lock_irqsave(&priv->lock, flags);
>  	}
> +
> +	if (!list_empty(&priv->cm.passive_ids))
> +		queue_delayed_work(ipoib_workqueue,
> +				   &priv->cm.stale_task, 
> IPOIB_CM_RX_DELAY);
>  	spin_unlock_irqrestore(&priv->lock, flags);
>  }
>  
> -- 
> MST
> 



More information about the general mailing list