[ofa-general] SRP/mlx4 interrupts throttling performance

Cameron Harr cameron at harr.org
Tue Oct 7 11:15:07 PDT 2008


Vu Pham wrote:
>>> Using that, and watching who's moving up in amount of time waiting, 
>>> the main culprits are all of the scst_threads when scst_threads=8, 
>>> and when threads=2, the culprit is srpt_thread.
>>
>> After some code examination, I figured out that Vu has chosen a 
>> "defensive programming" way ;): always switch to another thread.
>>
>> I personally don't see why srpt_thread is needed at all. Vu, if you 
>> think that the processing is too heavy weighted, you should rather 
>> use tasklets instead.
>>
>> SCST functions scst_cmd_init_done() and scst_rx_data() should be 
>> called with context SCST_CONTEXT_DIRECT_ATOMIC from interrupt context 
>> or SCST_CONTEXT_DIRECT from thread context. Then amount of context 
>> switches per cmd will go to the same reasonable level <=1 as for 
>> qla2x00t. \
>
> You are correct - by default srp run in thread mode - srp can also run 
> in tasklet mode (parameter thread=0); however, the main trade of is 
> instability (in heavy tpc-h workload)
>
> I already let Cameron know about this. We should have some prelim. 
> number from him soon (running with thread=0) and we need some quality 
> time to debug/fix the instability of some special workload
>
I may be hitting the instability problems and am currently rebooting my 
initiators again after the test (FIO) went into zombie-mode.

When I first set thread=0, with scst_threads=8, my performance was much 
lower (around 50-60K IOPs) than normal and it appeared that only one 
target could be written to at a time. I set scst_threads=2 after that 
and got pretty wide performance differences, between 55K and 85K IOPs. I 
then brought in another initiator and was seeing numbers as high as 135K 
IOPs and as low as 70K IPs, but could also see that a lot of the 
requests were being coalesced by the time they got to the target. I let 
it run for a while, and when I came back, the tests were still "running" 
but no work was being done and the processes couldn't be killed.




More information about the general mailing list