[ofa-general] SRP/mlx4 interrupts throttling performance

Cameron Harr cameron at harr.org
Tue Nov 4 11:38:03 PST 2008


Vladislav Bolkhovitin wrote:
> Cameron Harr wrote:
>> Vladislav Bolkhovitin wrote:
>>>> ** Sometimes the benchmark "zombied" (process doing no work, but 
>>>> process can't be killed) after running a certain amount of time. 
>>>> However, it wasn't repeatable in a reliable way, so I mark that 
>>>> this particular run has zombied before.
>>> That means that there is a bug somewhere. Usually such bugs are 
>>> found in few hours of code auditing (srpt driver is pretty simple) 
>>> or by using kernel debug facilities (example diff to .config 
>>> attached). I personally always prefer put my effort on fixing real 
>>> things, not inventing various workarounds, like srpt_thread in this 
>>> case.
>>>
>>> So I would:
>>>
>>>   1. Completely remove srpt thread and all related code. It doesn't do
>>> anything, which can't be done in SIRQ context (tasklet)
>>>
>>>   2. Audit the code to check if it does any action, which it 
>>> shouldn't do on SIRQ and fix it. This step isn't required, but 
>>> usually it saves a lot of time of puzzled debugging in the future.
>>>
>>>   3. Change in srpt_handle_rdma_comp() and  srpt_handle_new_iu()
>>> SCST_CONTEXT_THREAD to SCST_CONTEXT_DIRECT_ATOMIC.
>>

I'm assuming you didn't want me to implement this change this time, correct?

>> I also changed it in srpt_handle_err_comp()
>>> Then I would run the problematic tests (heavy tpc-h workload, e.g.) 
>>> on debug kernel and fix found problems.
>>>
>>> Anyway, Cameron, can you get the latest code from SCST trunk and try 
>>> with it? It was recently updated. Also please add the case with 
>>> changes from (3) above.
>> This is all with version 1.0.1 of SCST (v532).
>> In my fio test, I do runs with srpt thread=1 and then =0. When it was 
>> set to zero during the test, I got many errors printed out by FIO, 
>> and the target eventually crashed. This is the first part of a long 
>> call trace.
>>
>> NMI Watchdog detected LOCKUP on CPU 0
>> CPU 0
>> Modules linked in: ib_srpt(U) scst_vdisk(U) scst(U) fio_driver(PU) 
>> fio_port(PU) autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_ipoib 
>> mlx4_ib ib_cm ib_sa ib_mad ib_core ipv6 xfrm_nalgo crypto_api 
>> nls_utf8 hfsplus dm_mirror dm_multipath dm_mod video sbs backlight 
>> i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp 
>> parport i2c_i801 shpchp i2c_core e1000e mlx4_core i5000_edac edac_mc 
>> pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd 
>> ehci_hcd
>> Pid: 25732, comm: scsi_tgt0 Tainted: P      2.6.18-92.1.13.el5 #1
>> RIP: 0010:[<ffffffff80064bcb>]  [<ffffffff80064bcb>] 
>> .text.lock.spinlock+0x29/0x30
>> RSP: 0018:ffffffff80418a88  EFLAGS: 00000086
>> RAX: ffff810785307fd8 RBX: ffffffff884e68a0 RCX: 0000000000000000
>> RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff884e68a0
>> RBP: ffffffff884e62a0 R08: ffff810790926900 R09: ffff8107909268e8
>> R10: 0000000000000018 R11: ffffffff884fcab3 R12: 0000000000000001
>> R13: 0000000000000001 R14: 0000000000000000 R15: ffff8107f0f374c0
>> FS:  0000000000000000(0000) GS:ffffffff803a0000(0000) 
>> knlGS:0000000000000000
>> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
>> CR2: 00000037bc0986d0 CR3: 0000000000201000 CR4: 00000000000006e0
>> Process scsi_tgt0 (pid: 25732, threadinfo ffff810785306000, task 
>> ffff810810852100)
>> Stack:  0000000000000000 ffffffff884c509d ffff8107909268e8 
>> ffff810790926900
>>  00000002071dd688 0000020000000220 0000000000000200 00000000da984c08
>>  0000000000000000 ffff8107909267f0 ffff810806ceee20 0000000000000001
>> Call Trace:
>>  <IRQ>  [<ffffffff884c509d>] :scst:sgv_pool_alloc+0x10c/0x5d3
>>  [<ffffffff884c1f85>] :scst:scst_alloc_space+0x5b/0x106
>>  [<ffffffff884bdc90>] :scst:scst_process_active_cmd+0x4fc/0x131c
>>  [<ffffffff884bee46>] :scst:scst_cmd_init_done+0x17f/0x3ef
>>  [<ffffffff884fb1ff>] :ib_srpt:srpt_handle_new_iu+0x281/0x4e7
>>  [<ffffffff8835ec3d>] :mlx4_ib:mlx4_ib_free_srq_wqe+0x27/0x4f
>>  [<ffffffff883591da>] :mlx4_ib:get_sw_cqe+0x12/0x30
>>  [<ffffffff88359c97>] :mlx4_ib:mlx4_ib_poll_cq+0x432/0x48f
>>  [<ffffffff884fcc43>] :ib_srpt:srpt_completion+0x190/0x250
>>  [<ffffffff8811aa5b>] :mlx4_core:mlx4_eq_int+0x3b/0x26f
>>  [<ffffffff8811ac9e>] :mlx4_core:mlx4_msi_x_interrupt+0xf/0x17
>
> According to this trace, Vu was incorrect when he wrote that 
> srpt_handle_new_iu called on tasklet context. It at least sometimes 
> called from IRQ context. Try with the attached patch. It's against the 
> latest trunk.
I tried with the latest scst and srpt as of this morning. Previously, I 
had used srpt-1.0.0. The following results are with BLOCKIO, and I'll 
have a NULLIO in a bit. You can see from here that I don't hang any 
more, but the srpt thread=0 are a little lower.

As before this run was done with ioengine=libaio and iodepth=16. I 
pretty much always get significantly better performance with libaio than 
with sync or other engines. Also, the iodepth setting tended to give me 
better results.
----------------------------------------------
type=randwrite  bs=512  drives=1 scst_threads=1 srptthread=1 iops=67073.48
type=randwrite  bs=4k   drives=1 scst_threads=1 srptthread=1 iops=54876.82
type=randwrite  bs=512  drives=2 scst_threads=1 srptthread=1 iops=74858.00
type=randwrite  bs=4k   drives=2 scst_threads=1 srptthread=1 iops=75357.15
type=randwrite  bs=512  drives=3 scst_threads=1 srptthread=1 iops=83257.72
type=randwrite  bs=4k   drives=3 scst_threads=1 srptthread=1 iops=82186.79
type=randwrite  bs=512  drives=1 scst_threads=2 srptthread=1 iops=59908.06
type=randwrite  bs=4k   drives=1 scst_threads=2 srptthread=1 iops=50982.91
type=randwrite  bs=512  drives=2 scst_threads=2 srptthread=1 iops=99243.07
type=randwrite  bs=4k   drives=2 scst_threads=2 srptthread=1 iops=79670.62
type=randwrite  bs=512  drives=3 scst_threads=2 srptthread=1 iops=102898.37
type=randwrite  bs=4k   drives=3 scst_threads=2 srptthread=1 iops=92248.25
type=randwrite  bs=512  drives=1 scst_threads=3 srptthread=1 iops=63086.77
type=randwrite  bs=4k   drives=1 scst_threads=3 srptthread=1 iops=53020.41
type=randwrite  bs=512  drives=2 scst_threads=3 srptthread=1 iops=95990.06
type=randwrite  bs=4k   drives=2 scst_threads=3 srptthread=1 iops=77487.26
type=randwrite  bs=512  drives=3 scst_threads=3 srptthread=1 iops=105945.85
type=randwrite  bs=4k   drives=3 scst_threads=3 srptthread=1 iops=95389.01
type=randwrite  bs=512  drives=1 scst_threads=1 srptthread=0 iops=50299.36
type=randwrite  bs=4k   drives=1 scst_threads=1 srptthread=0 iops=48070.11
type=randwrite  bs=512  drives=2 scst_threads=1 srptthread=0 iops=54017.21
type=randwrite  bs=4k   drives=2 scst_threads=1 srptthread=0 iops=50407.20
type=randwrite  bs=512  drives=3 scst_threads=1 srptthread=0 iops=55822.11
type=randwrite  bs=4k   drives=3 scst_threads=1 srptthread=0 iops=50447.82
type=randwrite  bs=512  drives=1 scst_threads=2 srptthread=0 iops=60672.48
type=randwrite  bs=4k   drives=1 scst_threads=2 srptthread=0 iops=48811.93
type=randwrite  bs=512  drives=2 scst_threads=2 srptthread=0 iops=81919.87
type=randwrite  bs=4k   drives=2 scst_threads=2 srptthread=0 iops=72912.99
type=randwrite  bs=512  drives=3 scst_threads=2 srptthread=0 iops=91036.45
type=randwrite  bs=4k   drives=3 scst_threads=2 srptthread=0 iops=88994.63
type=randwrite  bs=512  drives=1 scst_threads=3 srptthread=0 iops=58929.21
type=randwrite  bs=4k   drives=1 scst_threads=3 srptthread=0 iops=48698.90
type=randwrite  bs=512  drives=2 scst_threads=3 srptthread=0 iops=83967.58
type=randwrite  bs=4k   drives=2 scst_threads=3 srptthread=0 iops=73932.36
type=randwrite  bs=512  drives=3 scst_threads=3 srptthread=0 iops=96686.46
type=randwrite  bs=4k   drives=3 scst_threads=3 srptthread=0 iops=88689.27




More information about the general mailing list