[ofa-general] IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!
Roland Dreier
rdreier at cisco.com
Wed Feb 28 08:42:41 PST 2007
I guess the solution is to merge IPoIB NAPI to avoid overloading the
system with interrupts. I'll fix up a few last things with my NAPI
patch and we can try to get it in shape to merge for 2.6.22.
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index f2aa923..97ea26f 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -301,6 +301,7 @@ void ipoib_ib_completion(struct ib_cq *c
> n = ib_poll_cq(cq, IPOIB_NUM_WC, priv->ibwc);
> for (i = 0; i < n; ++i)
> ipoib_ib_handle_wc(dev, priv->ibwc + i);
> + cond_resched();
obviously this is wrong because ipoib_ib_completion() is not
necessarily called in process context (in fact the ehca scaling hack
is probably the only driver that does call it when it's safe to
reschedule).
> } while (n == IPOIB_NUM_WC);
> }
>
> However I still saw that BUG trace occurred on 3-4 cpus after several hrs.
Right, because this patch is not really doing anything to reduce the
interrupt load.
More information about the general
mailing list