[ofa-general] OFED1.3.1: soft lockup in completion handler

Mon Apr 20 20:08:25 PDT 2009

Hi there,
I sufferred from soft lockup in completion handler, to be clear, I'm
running with Mellanox Technologies MT25418 and ofed1.3.1, 16 cores AMD,
and I have never seen this problem with <= 8 cores.
The completion handler is like:
void kiblnd_cq_completion (struct ib_cq *cq, void *arg)
{
spin_lock_irqsave(&conn->ibc_sched->ibs_lock, flags);
conn->ibc_ready = 1;
if (!conn->ibc_scheduled &&
(conn->ibc_nrx > 0 ||
conn->ibc_nsends_posted > 0)) {
kiblnd_conn_addref(conn); /* +1 ref for sched_conns */
conn->ibc_scheduled = 1;
list_add_tail(&conn->ibc_sched_list, &conn->ibc_sched->ibs_conns);
wake_up(&conn->ibc_sched->ibs_waitq);
}
spin_unlock_irqrestore(&conn->ibc_sched->ibs_lock, flags);
}

As you see, compeltion handler basically did nothing except wake_up,
ibs_waitq is per-CPU waitq, the thread on the queue is a CPU affinity
thread.
Call Trace:
<IRQ> [<ffffffff88631913>] :ko2iblnd:kiblnd_cq_completion+0x43/0xb0
[<ffffffff880bea5b>] :mlx4_core:mlx4_eq_int+0x3b/0x26f
[<ffffffff880bec9e>] :mlx4_core:mlx4_msi_x_interrupt+0xf/0x17
[<ffffffff800109a8>] handle_IRQ_event+0x29/0x58
[<ffffffff800b73ae>] __do_IRQ+0xa4/0x103
[<ffffffff880bd54a>] :mlx4_core:poll_catas+0x0/0x13c
[<ffffffff883d4f28>] :ib_cm:cm_work_handler+0x0/0xc52
[<ffffffff8006c575>] do_IRQ+0xe7/0xf5
[<ffffffff883d4f28>] :ib_cm:cm_work_handler+0x0/0xc52
[<ffffffff8005d615>] ret_from_intr+0x0/0xa
[<ffffffff880bd564>] :mlx4_core:poll_catas+0x1a/0x13c
[<ffffffff880bd54a>] :mlx4_core:poll_catas+0x0/0x13c
[<ffffffff883d4f28>] :ib_cm:cm_work_handler+0x0/0xc52
[<ffffffff800952ad>] run_timer_softirq+0x133/0x1af
[<ffffffff80011efc>] __do_softirq+0x5e/0xd6
[<ffffffff80154dd3>] end_msi_irq_w_maskbit+0xf/0x1c
[<ffffffff8005e2fc>] call_softirq+0x1c/0x28
[<ffffffff8006c6f2>] do_softirq+0x2c/0x85
[<ffffffff8006c57a>] do_IRQ+0xec/0xf5
[<ffffffff8005d615>] ret_from_intr+0x0/0xa

In our module, we have per-CPU thread and per-CPU waitq, each thread has
it's own connection list to poll on, completion handler will dispatch
connection to it's scheduler and wake the scheduler.
I'm very sure that scheduler doesn't have any heavy & slow operation
with holding ibs_lock, and I never got this problem on system with <= 8
cores.
Looks like completion handler is always run on the same core, so
completion handler race with all other cores on their own waitq and very
likely to get soft lockup...
I've tried to turn off irqbalancer and set /proc/irq/.../smp_affinity
for more cores, but changed nothing and still soft lockup.

After I installed ofed1.4.1 and create CQ with
ib_create_cq(....comp_vector), the problem is gone and get really good
performance. The problem now is, seems ofed1.4.1:mlx4 is the only driver
can really support multiple completion vectors, but we can't expect all
customers to have the same environment...
Is there only other possible way to resolve this?

Thanks
Liang