[ofa-general] SRP/mlx4 interrupts throttling performance
Cameron Harr
cameron at harr.org
Mon Oct 6 13:11:50 PDT 2008
Vu Pham wrote:
> Cameron Harr wrote:
>> Vu Pham wrote:
>>>
>>>> Alternatively, is there anything in the SCST layer I should tweak. I'm
>>>> still running rev 245 of that code (kinda old, but works with OFED
>>>> 1.3.1
>>>> w/o hacks).
>
> With blockio I get the best performance + stability with scst_threads=1
I got best performance with threads=2 or 3, and I've noticed that the
srpt_thread is often at 99%, though if I increase/decrease the
"thread=?" parameter for ib_srpt, it doesn't seem to make a difference.
A second initiator doesn't seem to help much either, with a single
initiator writing to two targets, can now usually get between 95K and
105K IOPs.
>>>>>
>>>>> My target server (with DAS) contains 8 2.8 GHz CPU cores and can
>>>>> sustain over 200K IOPs locally, but only around 73K IOPs over SRP.
>>>
>>> Is this number from one initiator or multiple?
>> One initiator. At first I thought it might be a limitation of the
>> SRP, and added a second initiator, but the aggregate performance of
>> the two was about equal to that of a single initiator.
>
> Try again with scst_threads=1. I expect that you can get ~140K with
> two initiators
>
Unfortunately, I'm nowhere close that high, though I am significantly
higher than before. 2 initiators does seem to reduce the context
switching rate however, which is good.
>>>>> Looking at /proc/interrupts, I see that the mlx_core (comp) device
>>>>> is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled
>>>>> for that PCI-E slot, but it only ever uses 2 of the CPUs, and only
>>>>> 1 at a time. None of the other CPUs has an interrupt rate more
>>>>> than about 40-50K/s.
>>> The number of interrupt can be cut down if there are more
>>> completions to be processed by sw. ie. please test with multiple QPs
>>> between one initiator vs. your target and multiple initiators vs.
>>> your target
Interrupts are still pretty high (around 160K/s now), but that seems to
not be my bottleneck. Context switching seems to be about 2-2.5 for
every IOP and sometimes less - not perfect, but not horrible either.
>
> ib_srpt process completions in event callback handler. With more QPs
> there are more completions pending per interrupt instead of one
> completion event per interrupt.
> You can have multiple QPs between initiator vs. target by using
> different initiator_id_ext ie.
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=1 >
> /sys/class/infiniband_srp/.../add_target
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=2 >
> /sys/class/infiniband_srp/.../add_target
> echo id_ext=xxx,ioc_guid=yyy,....initiator_ext=3 >
> /sys/class/infiniband_srp/.../add_target
This doesn't seem to net much of an improvement, though I understand the
reasoning behind it. My hunch is there's another bottleneck now to look for.
Cameron
More information about the general
mailing list