[ofa-general] OFED1.3.1: soft lockup in completion handler
Liang Zhen
Zhen.Liang at Sun.COM
Thu Apr 23 08:13:51 PDT 2009
Eli Cohen wrote:
> Where is the other place that you acquire conn->ibc_sched->ibs_lock?
> Is it in the per CPU thread? Maybe you should try to decrease the time
> when the lock is acquired at the thread. Can you send all references
> to the code aquiring the lock?
>
Eli,
Yes, it's a per-CPU lock (for CPU affinity thread), so completion
callback can only race with one thread a time, and I'm very sure the
thread just do very light operations with the lock.
I actually already know the reason: Most time, interrupts always perfer
to happen on the same cpu, so there is no chance to schedule
CPU-watchdog thread if there is quite a lot interrupts on the cpu, i.e:
100K/Sec, although we have irqbalance, but it's wakeup per 10
seconds(it's a pity that we can't change to interval, it's hard-code
constant), which is enough to trigger soft lockup warning.
So I add a static counter for completion handler, and when we found
there are too many interrupts for several seconds, we just call
touch_softlockup_watchdog() to tell watchdog the CPU is OK, not soft
lockup..... Also, we reserve the first core on each CPU socket (no
affinity thread bound on it), to make sure there are enough cores to
handle interrupts.
I know it's urgly, but it works and it's the only way that I can find to
resolve my problem, if there is no mutiple completion vectors.
Anyway, I really think multiple completion vectors will be an important
feature in the recent future, because our hardwares are more and more
faster, and machines have more and more CPU-cores.
Thanks
Liang
>
>
>> I've tried to turn off irqbalancer and set /proc/irq/.../smp_affinity
>> for more cores, but changed nothing and still soft lockup.
>>
>> After I installed ofed1.4.1 and create CQ with
>> ib_create_cq(....comp_vector), the problem is gone and get really good
>> performance. The problem now is, seems ofed1.4.1:mlx4 is the only driver
>> can really support multiple completion vectors, but we can't expect all
>> customers to have the same environment...
>> Is there only other possible way to resolve this?
>>
>> Thanks
>> Liang
>> _______________________________________________
>> general mailing list
>> general at lists.openfabrics.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
>>
>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
>>
More information about the general
mailing list