[ofa-general] OFED1.3.1: soft lockup in completion handler

Eli Cohen eli at dev.mellanox.co.il
Tue Apr 21 08:25:36 PDT 2009


On Tue, Apr 21, 2009 at 11:08:25AM +0800, Liang Zhen wrote:
> In our module, we have per-CPU thread and per-CPU waitq, each thread has
> it's own connection list to poll on, completion handler will dispatch
> connection to it's scheduler and wake the scheduler.
> I'm very sure that scheduler doesn't have any heavy & slow operation
> with holding ibs_lock, and I never got this problem on system with <= 8
> cores.
> Looks like completion handler is always run on the same core, so
> completion handler race with all other cores on their own waitq and very
> likely to get soft lockup...

Where is the other place that you acquire conn->ibc_sched->ibs_lock?
Is it in the per CPU thread? Maybe you should try to decrease the time
when the lock is acquired at the thread. Can you send all references
to the code aquiring the lock?


> I've tried to turn off irqbalancer and set /proc/irq/.../smp_affinity
> for more cores, but changed nothing and still soft lockup.
> 
> After I installed ofed1.4.1 and create CQ with
> ib_create_cq(....comp_vector), the problem is gone and get really good
> performance. The problem now is, seems ofed1.4.1:mlx4 is the only driver
> can really support multiple completion vectors, but we can't expect all
> customers to have the same environment...
> Is there only other possible way to resolve this?
> 
> Thanks
> Liang
> _______________________________________________
> general mailing list
> general at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general



More information about the general mailing list