[ofa-general] SRP/mlx4 interrupts throttling performance

Thu Oct 16 12:57:29 PDT 2008

(Sorry for delay. I was busy with related task, without completing which 
we can't go further.)

Cameron Harr wrote:
> Vu Pham wrote:
>> Cameron Harr wrote:
>>> One thing that makes results hard to interpret is that they vary 
>>> enormously. I've been doing more testing with 3 physical LUNs 
>>> (instead of two) on the target, srpt_thread=0, and changing between 
>>> scst_thread=[1,2,3]. With scst_thread=1, I'm fairly low (50K IOPs), 
>>> while at 2 and three threads, the results are higher, though in all 
>>> cases, the context switches are low, often less than 1:1.
>>>
>> Can you test again with srpt_thread=0,1 and scst_threads=1,2,3 in 
>> NULLIO mode (with 1,2,3 export NULLIO luns)
> srpt_thread=0:
> scst_t: |    1    |    2      |    3       |
> -------------------------------------------|
> 1 LUN*  |  54K    | 54K-75K   | 54K-75K    |
> 2 LUNs* |120K-200K|150K-200K**| 120K-180K**|
> 3 LUNs* |170K-195K|160K-195K  | 130K-170K**|
> 
> srpt_thread=1:
> scst_t: |    1    |    2      |    3      |
> ------------------------------------------|
> 1 LUN*  |   74K   |    54K    |   55K     |
> 2 LUNs* |140K-190K| 130K-200K | 150K-220K |
> 3 LUNs* |170K-195K| 170K-195K | 175K-195K |
> 
> * a FIO (benchmark) process was run for each LUN, so when there were 3 
> LUNs, there were three FIO processes runnning simultaneously.

What FIO script do you use? Also how long each run take? Usually big 
variations are due to too quick test runs.

Also, it would be better if you use O_DIRECT mode or sg interface 
directly to pass requests to the target. Otherwise, the variations can 
be due to cache activities on the initiators.

> ** Sometimes the benchmark "zombied" (process doing no work, but process 
> can't be killed) after running a certain amount of time. However, it 
> wasn't repeatable in a reliable way, so I mark that this particular run 
> has zombied before.

That means that there is a bug somewhere. Usually such bugs are found in 
few hours of code auditing (srpt driver is pretty simple) or by using 
kernel debug facilities (example diff to .config attached). I personally 
always prefer put my effort on fixing real things, not inventing various 
workarounds, like srpt_thread in this case.

So I would:

   1. Completely remove srpt thread and all related code. It doesn't do
anything, which can't be done in SIRQ context (tasklet)

   2. Audit the code to check if it does any action, which it shouldn't 
do on SIRQ and fix it. This step isn't required, but usually it saves a 
lot of time of puzzled debugging in the future.

   3. Change in srpt_handle_rdma_comp() and  srpt_handle_new_iu()
SCST_CONTEXT_THREAD to SCST_CONTEXT_DIRECT_ATOMIC.

Then I would run the problematic tests (heavy tpc-h workload, e.g.) on 
debug kernel and fix found problems.

Anyway, Cameron, can you get the latest code from SCST trunk and try 
with it? It was recently updated. Also please add the case with changes 
from (3) above.

> - Note 1: There were a number of outliers (often between 98K and 230K), 
> but I tried to capture where the bulk of the activity happened. It's 
> still somewhat of a rough guess though. Where the range is large, it 
> usually mean the results were just really scattered.
> 
> Summary: It's hard to draw a good summary due to the variation of 
> results. I would say the runs with srpt_thread=1 tended to have fewer 
> outliers at the beginning, but as time went on, they scattered as well. 
> Running with 2 or 3 threads almost seems to be a toss-up.
>>> Also a little disconcerting is that my average request size on the 
>>> target has gotten larger. I'm always writing 512B packets, and when I 
>>> run on one initiator, the average reqsz is around 600-800B. When I 
>>> add an initiator, the average reqsz basically doubles and is now 
>>> around 1200 - 1600B. I'm specifying direct IO in the test and scst is 
>>> configured as blockio (and thus direct IO), but it appears something 
>>> is cached at some point and seems to be coalesced when another 
>>> initiator is involved. Does this seem odd or normal? This shows true 
>>> whether the initiators are writing to different partitions on the 
>>> same LUN or the same LUN with no partitions.
>> What io scheduler are you running on local storage? Since you are 
>> using blockio you should play around with io scheduler's tuned 
>> parameters (for example deadline scheduler: front_merges, 
>> write_starved,...) Please see ~/Documentation/block/*.txt
> I'm using CFQ. Months ago, I tried different schedulers with their 
> default options and saw basically no difference. I can try some of that 
> again; however I don't believe I can tune the schedulers because my back 
> end doesn't give me a "queue" directory in /sys/block/<dev>/
> 
> -Cameron
>