[ofa-general] SRP/mlx4 interrupts throttling performance
Cameron Harr
cameron at harr.org
Tue Oct 7 11:15:07 PDT 2008
Vu Pham wrote:
>>> Using that, and watching who's moving up in amount of time waiting,
>>> the main culprits are all of the scst_threads when scst_threads=8,
>>> and when threads=2, the culprit is srpt_thread.
>>
>> After some code examination, I figured out that Vu has chosen a
>> "defensive programming" way ;): always switch to another thread.
>>
>> I personally don't see why srpt_thread is needed at all. Vu, if you
>> think that the processing is too heavy weighted, you should rather
>> use tasklets instead.
>>
>> SCST functions scst_cmd_init_done() and scst_rx_data() should be
>> called with context SCST_CONTEXT_DIRECT_ATOMIC from interrupt context
>> or SCST_CONTEXT_DIRECT from thread context. Then amount of context
>> switches per cmd will go to the same reasonable level <=1 as for
>> qla2x00t. \
>
> You are correct - by default srp run in thread mode - srp can also run
> in tasklet mode (parameter thread=0); however, the main trade of is
> instability (in heavy tpc-h workload)
>
> I already let Cameron know about this. We should have some prelim.
> number from him soon (running with thread=0) and we need some quality
> time to debug/fix the instability of some special workload
>
I may be hitting the instability problems and am currently rebooting my
initiators again after the test (FIO) went into zombie-mode.
When I first set thread=0, with scst_threads=8, my performance was much
lower (around 50-60K IOPs) than normal and it appeared that only one
target could be written to at a time. I set scst_threads=2 after that
and got pretty wide performance differences, between 55K and 85K IOPs. I
then brought in another initiator and was seeing numbers as high as 135K
IOPs and as low as 70K IPs, but could also see that a lot of the
requests were being coalesced by the time they got to the target. I let
it run for a while, and when I came back, the tests were still "running"
but no work was being done and the processes couldn't be killed.
More information about the general
mailing list