[ofa-general] RE: [PATCH] ipoib/cm: make stale task actually run once in a while
Scott Weitzenkamp (sweitzen)
sweitzen at cisco.com
Fri May 11 10:29:05 PDT 2007
I see the first patch is in OFED-1.2-20070511-0600 now, I'll try it out.
Scott
> -----Original Message-----
> From: Scott Weitzenkamp (sweitzen)
> Sent: Wednesday, May 09, 2007 4:46 PM
> To: Michael S. Tsirkin; Scott Weitzenkamp (sweitzen)
> Cc: Yohad Dickman; Amit Krig; Tziporet Koren;
> mst at mellanox.co.il; general at lists.openfabrics.org; Roland Dreier
> Subject: RE: [PATCH] ipoib/cm: make stale task actually run
> once in a while
>
> I see a new patch ipoib_correct_timers.patch in
> OFED-1.2-20070509-0600, which patch should I try?
>
> Scott
>
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst at dev.mellanox.co.il]
> > Sent: Monday, May 07, 2007 1:03 PM
> > To: Scott Weitzenkamp (sweitzen)
> > Cc: Yohad Dickman; Amit Krig; Tziporet Koren;
> > mst at mellanox.co.il; general at lists.openfabrics.org; Roland Dreier
> > Subject: [PATCH] ipoib/cm: make stale task actually run once
> > in a while
> >
> > In the presence of some active passive connections, stale
> > task would never run,
> > since each 4 RX CQEs we repeat queue_delayed_work calls which
> > delays it for some
> > 10 minutes. As a result, on a noisy system with failing
> > ports, we slowly run
> > out of resources - slowing connection setup down and
> > eventually failing.
> >
> > What we actually want to do is - start stale task when a first
> > passive connection is added, rerun it every 10 min as long
> > as there are outstanding passive connections.
> >
> > As a happy side effect, this removes some code from RX data path.
> >
> > Signed-off-by: Michael S. Tsirkin <mst at dev.mellanox.co.il>
> >
> > ---
> >
> > Scott, I think this might address bugs 541 and 465: slow
> > IPoIB CM HA failover
> > and eventual failing IPoIB HA. Could you test this please?
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index 2b242a4..b77e8d7 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -258,10 +258,11 @@ static int ipoib_cm_req_handler(struct
> > ib_cm_id *cm_id, struct ib_cm_event *even
> > cm_id->context = p;
> > p->jiffies = jiffies;
> > spin_lock_irqsave(&priv->lock, flags);
> > + if (list_empty(&priv->cm.passive_ids))
> > + queue_delayed_work(ipoib_workqueue,
> > + &priv->cm.stale_task,
> > IPOIB_CM_RX_DELAY);
> > list_add(&p->list, &priv->cm.passive_ids);
> > spin_unlock_irqrestore(&priv->lock, flags);
> > - queue_delayed_work(ipoib_workqueue,
> > - &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> > return 0;
> >
> > err_rep:
> > @@ -380,8 +381,6 @@ void ipoib_cm_handle_rx_wc(struct
> > net_device *dev, struct ib_wc *wc)
> > if (!list_empty(&p->list))
> > list_move(&p->list,
> > &priv->cm.passive_ids);
> > spin_unlock_irqrestore(&priv->lock, flags);
> > - queue_delayed_work(ipoib_workqueue,
> > -
> > &priv->cm.stale_task, IPOIB_CM_RX_DELAY);
> > }
> > }
> >
> > @@ -1104,6 +1103,10 @@ static void ipoib_cm_stale_task(struct
> > work_struct *work)
> > kfree(p);
> > spin_lock_irqsave(&priv->lock, flags);
> > }
> > +
> > + if (!list_empty(&priv->cm.passive_ids))
> > + queue_delayed_work(ipoib_workqueue,
> > + &priv->cm.stale_task,
> > IPOIB_CM_RX_DELAY);
> > spin_unlock_irqrestore(&priv->lock, flags);
> > }
> >
> > --
> > MST
> >
More information about the general
mailing list