[ofa-general] SRP/mlx4 interrupts throttling performance
cameron at harr.org
Fri Oct 24 12:38:32 PDT 2008
Vladislav Bolkhovitin wrote:
>> ** Sometimes the benchmark "zombied" (process doing no work, but
>> process can't be killed) after running a certain amount of time.
>> However, it wasn't repeatable in a reliable way, so I mark that this
>> particular run has zombied before.
> That means that there is a bug somewhere. Usually such bugs are found
> in few hours of code auditing (srpt driver is pretty simple) or by
> using kernel debug facilities (example diff to .config attached). I
> personally always prefer put my effort on fixing real things, not
> inventing various workarounds, like srpt_thread in this case.
> So I would:
> 1. Completely remove srpt thread and all related code. It doesn't do
> anything, which can't be done in SIRQ context (tasklet)
> 2. Audit the code to check if it does any action, which it shouldn't
> do on SIRQ and fix it. This step isn't required, but usually it saves
> a lot of time of puzzled debugging in the future.
> 3. Change in srpt_handle_rdma_comp() and srpt_handle_new_iu()
> SCST_CONTEXT_THREAD to SCST_CONTEXT_DIRECT_ATOMIC.
I also changed it in srpt_handle_err_comp()
> Then I would run the problematic tests (heavy tpc-h workload, e.g.) on
> debug kernel and fix found problems.
> Anyway, Cameron, can you get the latest code from SCST trunk and try
> with it? It was recently updated. Also please add the case with
> changes from (3) above.
This is all with version 1.0.1 of SCST (v532).
In my fio test, I do runs with srpt thread=1 and then =0. When it was
set to zero during the test, I got many errors printed out by FIO, and
the target eventually crashed. This is the first part of a long call trace.
NMI Watchdog detected LOCKUP on CPU 0
Modules linked in: ib_srpt(U) scst_vdisk(U) scst(U) fio_driver(PU)
fio_port(PU) autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_ipoib mlx4_ib
ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api nls_utf8 hfsplus
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec button battery
asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_i801 shpchp
i2c_core e1000e mlx4_core i5000_edac edac_mc pcspkr ata_piix libata
sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 25732, comm: scsi_tgt0 Tainted: P 2.6.18-92.1.13.el5 #1
RIP: 0010:[<ffffffff80064bcb>] [<ffffffff80064bcb>]
RSP: 0018:ffffffff80418a88 EFLAGS: 00000086
RAX: ffff810785307fd8 RBX: ffffffff884e68a0 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff884e68a0
RBP: ffffffff884e62a0 R08: ffff810790926900 R09: ffff8107909268e8
R10: 0000000000000018 R11: ffffffff884fcab3 R12: 0000000000000001
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8107f0f374c0
FS: 0000000000000000(0000) GS:ffffffff803a0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000037bc0986d0 CR3: 0000000000201000 CR4: 00000000000006e0
Process scsi_tgt0 (pid: 25732, threadinfo ffff810785306000, task
Stack: 0000000000000000 ffffffff884c509d ffff8107909268e8 ffff810790926900
00000002071dd688 0000020000000220 0000000000000200 00000000da984c08
0000000000000000 ffff8107909267f0 ffff810806ceee20 0000000000000001
<IRQ> [<ffffffff884c509d>] :scst:sgv_pool_alloc+0x10c/0x5d3
More information about the general