[ofa-general] RE: [PATCH] ipoib/cm: make stale task actually run once in a while (DOES NOT HELP)
Amit Krig
amitk at mellanox.co.il
Mon May 14 10:59:58 PDT 2007
Still failing in our test as well
Amit
-----Original Message-----
From: Scott Weitzenkamp (sweitzen) [mailto:sweitzen at cisco.com]
Sent: Saturday, May 12, 2007 1:32 AM
To: Michael S. Tsirkin; Scott Weitzenkamp (sweitzen)
Cc: Yohad Dickman; Amit Krig; Tziporet Koren; Michael S. Tsirkin;
general at lists.openfabrics.org; Roland Dreier
Subject: RE: [PATCH] ipoib/cm: make stale task actually run once in a
while (DOES NOT HELP)
Importance: High
This patch, which is in OFED-1.2-20070511-0600, does NOT help. I am
still seeing 105-second port failover times. Amit, did you try it?
Scott Weitzenkamp
SQA and Release Manager
Server Virtualization Business Unit
Cisco Systems
> -----Original Message-----
> From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il]
> Sent: Monday, May 07, 2007 1:03 PM
> To: Scott Weitzenkamp (sweitzen)
> Cc: Yohad Dickman; Amit Krig; Tziporet Koren; mst at mellanox.co.il;
> general at lists.openfabrics.org; Roland Dreier
> Subject: [PATCH] ipoib/cm: make stale task actually run once in a
> while
>
> In the presence of some active passive connections, stale task would
> never run, since each 4 RX CQEs we repeat queue_delayed_work calls
> which delays it for some 10 minutes. As a result, on a noisy system
> with failing ports, we slowly run out of resources - slowing
> connection setup down and eventually failing.
>
> What we actually want to do is - start stale task when a first passive
> connection is added, rerun it every 10 min as long as there are
> outstanding passive connections.
>
> As a happy side effect, this removes some code from RX data path.
>
> Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
>
> ---
>
> Scott, I think this might address bugs 541 and 465: slow IPoIB CM HA
> failover and eventual failing IPoIB HA. Could you test this please?
>
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 2b242a4..b77e8d7 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -258,10 +258,11 @@ static int ipoib_cm_req_handler(struct ib_cm_id
> *cm_id, struct ib_cm_event *even
> cm_id->context = p;
> p->jiffies = jiffies;
> spin_lock_irqsave(&priv->lock, flags);
> + if (list_empty(&priv->cm.passive_ids))
> + queue_delayed_work(ipoib_workqueue,
> + &priv->cm.stale_task,
> IPOIB_CM_RX_DELAY);
> list_add(&p->list, &priv->cm.passive_ids);
> spin_unlock_irqrestore(&priv->lock, flags);
> - queue_delayed_work(ipoib_workqueue,
> - &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> return 0;
>
> err_rep:
> @@ -380,8 +381,6 @@ void ipoib_cm_handle_rx_wc(struct net_device *dev,
> struct ib_wc *wc)
> if (!list_empty(&p->list))
> list_move(&p->list,
> &priv->cm.passive_ids);
> spin_unlock_irqrestore(&priv->lock, flags);
> - queue_delayed_work(ipoib_workqueue,
> -
> &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> }
> }
>
> @@ -1104,6 +1103,10 @@ static void ipoib_cm_stale_task(struct
> work_struct *work)
> kfree(p);
> spin_lock_irqsave(&priv->lock, flags);
> }
> +
> + if (!list_empty(&priv->cm.passive_ids))
> + queue_delayed_work(ipoib_workqueue,
> + &priv->cm.stale_task,
> IPOIB_CM_RX_DELAY);
> spin_unlock_irqrestore(&priv->lock, flags); }
>
> --
> MST
>
More information about the general
mailing list