[ofa-general] SRP/mlx4 interrupts throttling performance

Mon Oct 6 13:11:50 PDT 2008


Vu Pham wrote:
> Cameron Harr wrote:
>> Vu Pham wrote:
>>>
>>>> Alternatively, is there anything in the SCST layer I should tweak. I'm
>>>> still running rev 245 of that code (kinda old, but works with OFED 
>>>> 1.3.1
>>>> w/o hacks).
>
> With blockio I get the best performance + stability with scst_threads=1

I got best performance with threads=2 or 3, and I've noticed that the 
srpt_thread is often at 99%, though if I increase/decrease the 
"thread=?" parameter for ib_srpt, it doesn't seem to make a difference. 
A second initiator doesn't seem to help much either, with a single 
initiator writing to two targets, can now usually get between 95K and 
105K IOPs.
>>>>>
>>>>> My target server (with DAS) contains 8 2.8 GHz CPU cores and can 
>>>>> sustain over 200K IOPs locally, but only around 73K IOPs over SRP.
>>>
>>> Is this number from one initiator or multiple?
>> One initiator. At first I thought it might be a limitation of the 
>> SRP, and added a second initiator, but the aggregate performance of 
>> the two was about equal to that of a single initiator.
>
> Try again with scst_threads=1. I expect that you can get ~140K with 
> two initiators
>
Unfortunately, I'm nowhere close that high, though I am significantly 
higher than before. 2 initiators does seem to reduce the context 
switching rate however, which is good.
>>>>> Looking at /proc/interrupts, I see that the mlx_core (comp) device 
>>>>> is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled 
>>>>> for that PCI-E slot, but it only ever uses 2 of the CPUs, and only 
>>>>> 1 at a time. None of the other CPUs has an interrupt rate more 
>>>>> than about 40-50K/s.
>>> The number of interrupt can be cut down if there are more 
>>> completions to be processed by sw. ie. please test with multiple QPs 
>>> between one initiator vs. your target and multiple initiators vs. 
>>> your target
Interrupts are still pretty high (around 160K/s now), but that seems to 
not be my bottleneck. Context switching seems to be about 2-2.5 for 
every IOP and sometimes less - not perfect, but not horrible either.
>
> ib_srpt process completions in event callback handler. With more QPs 
> there are more completions pending per interrupt instead of one 
> completion event per interrupt.
> You can have multiple QPs between initiator vs. target by using 
> different initiator_id_ext ie.
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 > 
> /sys/class/infiniband_srp/.../add_target
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 > 
> /sys/class/infiniband_srp/.../add_target
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 > 
> /sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understand the 
reasoning behind it. My hunch is there's another bottleneck now to look for.

Cameron