[ofa-general] SRP/mlx4 interrupts throttling performance
Cameron Harr
cameron at harr.org
Tue Oct 7 12:51:13 PDT 2008
Cameron Harr wrote:
> I may be hitting the instability problems and am currently rebooting
> my initiators again after the test (FIO) went into zombie-mode.
>
> When I first set thread=0, with scst_threads=8, my performance was
> much lower (around 50-60K IOPs) than normal and it appeared that only
> one target could be written to at a time. I set scst_threads=2 after
> that and got pretty wide performance differences, between 55K and 85K
> IOPs. I then brought in another initiator and was seeing numbers as
> high as 135K IOPs and as low as 70K IPs, but could also see that a lot
> of the requests were being coalesced by the time they got to the
> target. I let it run for a while, and when I came back, the tests were
> still "running" but no work was being done and the processes couldn't
> be killed.
>
One thing that makes results hard to interpret is that they vary
enormously. I've been doing more testing with 3 physical LUNs (instead
of two) on the target, srpt_thread=0, and changing between
scst_thread=[1,2,3]. With scst_thread=1, I'm fairly low (50K IOPs),
while at 2 and three threads, the results are higher, though in all
cases, the context switches are low, often less than 1:1.
My best performance comes with scst_threads=3 again (and are often
pegged at 100% CPU), but the results seem to go in phases between the
low 80s, the 90s, 110s, 120s and will run for a while in 130s and 140s
(in thousands of IOPs). For reference, locally on the three LUNs I get
around 130-150K IOPs. But the numbers really vary.
Also a little disconcerting is that my average request size on the
target has gotten larger. I'm always writing 512B packets, and when I
run on one initiator, the average reqsz is around 600-800B. When I add
an initiator, the average reqsz basically doubles and is now around 1200
- 1600B. I'm specifying direct IO in the test and scst is configured as
blockio (and thus direct IO), but it appears something is cached at some
point and seems to be coalesced when another initiator is involved. Does
this seem odd or normal? This shows true whether the initiators are
writing to different partitions on the same LUN or the same LUN with no
partitions.
More information about the general
mailing list