[Scst-devel] [ofa-general] SRP/mlx4 interrupts throttling performance
Cameron Harr
cameron at harr.org
Tue Jan 13 08:42:59 PST 2009
Vladislav Bolkhovitin wrote:
> Cameron Harr, on 01/13/2009 02:56 AM wrote:
>> Vladislav Bolkhovitin wrote:
>>>>> I think srptthread=0 performs worse in this case, because with it
>>>>> part of processing done in SIRQ, but seems scheduler make it be
>>>>> done on the same CPU as fct0-worker, which does the data transfer
>>>>> to your SSD device job. And this thread is always consumes about
>>>>> 100% CPU, so it has less CPU time, hence less overall performance.
>>>>>
>>>>> So, try to affine fctX-worker, SCST threads and SIRQ processing on
>>>>> different CPUs and check again. You can affine threads using
>>>>> utility from
>>>>> http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/,
>>>>> how to affine IRQ see Documentation/IRQ-affinity.txt in your
>>>>> kernel tree.
>>
>> I ran with the two fct-worker threads pinned to cpus 7,8, the
>> scsi_tgt threads pinned to cpus 4, 5 or 6 and irqbalance pinned on
>> cpus 1-3. I wasn't sure if I should play with the 8 ksoftirqd procs,
>> since there is one process per cpu. From these results, I don't see a
>> big difference,
>
> Hmm, you sent me before the following results:
>
> type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1
> iops=54934.31
> type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0
> iops=50199.90
> type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1
> iops=51510.68
> type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0
> iops=49951.89
> type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1
> iops=51924.17
> type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0
> iops=49874.57
> type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1
> iops=79680.42
> type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0
> iops=74504.65
> type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1
> iops=78558.77
> type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0
> iops=75224.25
> type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1
> iops=75411.52
> type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0
> iops=73238.46
>
> I see quite a big improvement. For instance, for drives=1
> scst_threads=1 srptthread=1 case it is 36%. Or, do you use different
> hardware, so those results can't be compared?
Vlad, you've got a good eye. Unfortunately, those results can't really
be compared because I believe the previous results were intentionally
run in a worse-case performance scenario. However I did run no-affinity
runs before the affinity runs and would say performance increase is
variable and somewhat inconclusive:
type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 iops=76724.08
type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 iops=91318.28
type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 iops=60374.94
type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 iops=91618.18
type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 iops=63076.21
type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 iops=92251.24
type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 iops=50539.96
type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 iops=57884.80
type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 iops=54502.85
type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 iops=93230.44
type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 iops=55941.89
type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 iops=94480.92
>
>> but would still give srpt thread=1 a slight performance advantage.
>
> At this level CPU caches starting playing essential role. To get the
> maximum performance the commands processing of each command should use
> the same CPU L2+ cache(s), i.e. be done on the same physical CPU, but
> on different cores. Most likely, affinity assigned by you was worse,
> than the scheduler decisions. What's your CPU configuration? Please
> send me the top/vmstat output during tests from the target as well as
> your dmesg from the target just after it's booted.
My CPU config on the target (where I did the affinity) is 2 quad-core
Xeon E5440 @ 2.83GHz. I didn't have my script configured to dump top and
vmstat, so here's data from a rerun (and I have attached requested
info). I'm not sure what is accounting for the spike at the beginning,
but it seems consistent.
type=randwrite bs=4k drives=1 scst_threads=1 srptthread=1 iops=104699.43
type=randwrite bs=4k drives=2 scst_threads=1 srptthread=1 iops=133928.98
type=randwrite bs=4k drives=1 scst_threads=2 srptthread=1 iops=82736.73
type=randwrite bs=4k drives=2 scst_threads=2 srptthread=1 iops=82221.42
type=randwrite bs=4k drives=1 scst_threads=3 srptthread=1 iops=70203.53
type=randwrite bs=4k drives=2 scst_threads=3 srptthread=1 iops=85628.45
type=randwrite bs=4k drives=1 scst_threads=1 srptthread=0 iops=75646.90
type=randwrite bs=4k drives=2 scst_threads=1 srptthread=0 iops=87124.32
type=randwrite bs=4k drives=1 scst_threads=2 srptthread=0 iops=74545.84
type=randwrite bs=4k drives=2 scst_threads=2 srptthread=0 iops=88348.71
type=randwrite bs=4k drives=1 scst_threads=3 srptthread=0 iops=71837.15
type=randwrite bs=4k drives=2 scst_threads=3 srptthread=0 iops=84387.22
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.out
Type: application/octet-stream
Size: 34008 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090113/38bae937/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: top.target.bz2
Type: application/octet-stream
Size: 72060 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090113/38bae937/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vmstat.target.bz2
Type: application/octet-stream
Size: 15983 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090113/38bae937/attachment-0002.obj>
More information about the general
mailing list