[ofa-general] SRP/mlx4 interrupts throttling performance
Cameron Harr
cameron at harr.org
Wed Oct 8 10:13:49 PDT 2008
Vu Pham wrote:
> Cameron Harr wrote:
>>
>> One thing that makes results hard to interpret is that they vary
>> enormously. I've been doing more testing with 3 physical LUNs
>> (instead of two) on the target, srpt_thread=0, and changing between
>> scst_thread=[1,2,3]. With scst_thread=1, I'm fairly low (50K IOPs),
>> while at 2 and three threads, the results are higher, though in all
>> cases, the context switches are low, often less than 1:1.
>>
>
> Can you test again with srpt_thread=0,1 and scst_threads=1,2,3 in
> NULLIO mode (with 1,2,3 export NULLIO luns)
srpt_thread=0:
scst_t: | 1 | 2 | 3 |
-------------------------------------------|
1 LUN* | 54K | 54K-75K | 54K-75K |
2 LUNs* |120K-200K|150K-200K**| 120K-180K**|
3 LUNs* |170K-195K|160K-195K | 130K-170K**|
srpt_thread=1:
scst_t: | 1 | 2 | 3 |
------------------------------------------|
1 LUN* | 74K | 54K | 55K |
2 LUNs* |140K-190K| 130K-200K | 150K-220K |
3 LUNs* |170K-195K| 170K-195K | 175K-195K |
* a FIO (benchmark) process was run for each LUN, so when there were 3
LUNs, there were three FIO processes runnning simultaneously.
** Sometimes the benchmark "zombied" (process doing no work, but process
can't be killed) after running a certain amount of time. However, it
wasn't repeatable in a reliable way, so I mark that this particular run
has zombied before.
- Note 1: There were a number of outliers (often between 98K and 230K),
but I tried to capture where the bulk of the activity happened. It's
still somewhat of a rough guess though. Where the range is large, it
usually mean the results were just really scattered.
Summary: It's hard to draw a good summary due to the variation of
results. I would say the runs with srpt_thread=1 tended to have fewer
outliers at the beginning, but as time went on, they scattered as well.
Running with 2 or 3 threads almost seems to be a toss-up.
>>
>> Also a little disconcerting is that my average request size on the
>> target has gotten larger. I'm always writing 512B packets, and when I
>> run on one initiator, the average reqsz is around 600-800B. When I
>> add an initiator, the average reqsz basically doubles and is now
>> around 1200 - 1600B. I'm specifying direct IO in the test and scst is
>> configured as blockio (and thus direct IO), but it appears something
>> is cached at some point and seems to be coalesced when another
>> initiator is involved. Does this seem odd or normal? This shows true
>> whether the initiators are writing to different partitions on the
>> same LUN or the same LUN with no partitions.
>
> What io scheduler are you running on local storage? Since you are
> using blockio you should play around with io scheduler's tuned
> parameters (for example deadline scheduler: front_merges,
> write_starved,...) Please see ~/Documentation/block/*.txt
I'm using CFQ. Months ago, I tried different schedulers with their
default options and saw basically no difference. I can try some of that
again; however I don't believe I can tune the schedulers because my back
end doesn't give me a "queue" directory in /sys/block/<dev>/
-Cameron
More information about the general
mailing list