[ofa-general] SRP/mlx4 interrupts throttling performance

Fri Oct 3 11:09:36 PDT 2008

Vu Pham wrote:
>
>> Alternatively, is there anything in the SCST layer I should tweak. I'm
>> still running rev 245 of that code (kinda old, but works with OFED 1.3.1
>> w/o hacks).
>>
>
> What is the mode (pass thru, blockio...)?
blockio
> What is the scst_threads=<xx> parameters?
Default, which I believe is #cpus
>
>>
>>>
>>>
>>> My target server (with DAS) contains 8 2.8 GHz CPU cores and can 
>>> sustain over 200K IOPs locally, but only around 73K IOPs over SRP.
>
> Is this number from one initiator or multiple?
One initiator. At first I thought it might be a limitation of the SRP, 
and added a second initiator, but the aggregate performance of the two 
was about equal to that of a single initiator.
>
>>> Looking at /proc/interrupts, I see that the mlx_core (comp) device 
>>> is pushing about 135K Int/s on 1 of 2 CPUs. All CPUs are enabled for 
>>> that PCI-E slot, but it only ever uses 2 of the CPUs, and only 1 at 
>>> a time. None of the other CPUs has an interrupt rate more than about 
>>> 40-50K/s.
>>>
>
> The number of interrupt can be cut down if there are more completions 
> to be processed by sw. ie. please test with multiple QPs between one 
> initiator vs. your target and multiple initiators vs. your target
>
A couple questions here on my side. How would more QP connections reduce 
interrupts? It seems like they'd still need to come through the same mlx 
device, causing the same number or more, of interrupts.  More 
importantly thought, how would one increase the number of QPs between 
and initiator and target? I did have my ib_srpt threads up, would that 
be comparable?

>>> Does anyone know of a trick to spread those interrupts out more 
>>> (which I realize might be bad due to context switching), or 
>>> something else that will reduce my interrupts on that cpu? The mlx4 
>>> is a MSI-X interrupt. I've changed it to an APIC int, but it seems 
>>> to give slightly lower performance.
>>>
> There userspace daemon, irqbalanced, that dynamically directs IRQs to 
> different CPUs. You can define which CPUs CAN handle an IRQ but you 
> cannot control how it is done. You can look at 
> Documentation/IRQ-affinity.txt for details how to configure it. In 
> some cases I found better performance-wise to shut the irqbalanced off 
> and assign the process to one (ore more) CPU and use a different CPU 
> to serve interrupts.
>
Earlier, I did go over that file, and tried playing around with 
/sys/class/pci_bus/<slot>/cpu_affinity and /proc/irq/<slot>/smp_affinity 
for the pci slot I was using, but didn't have much luck. I also tried 
turning off irqbalance, but that made no difference.

Additionally, I found that I can load the newer scst code if I use the 
kernel-supplied modules and the standalone srpt-1.0.0 package that I 
think you provide Vu. I was about to try it along with dropping a module 
param for ib_srpt (I was using a thread count of 32 that had given me 
better performance on an earlier test). I'll report back on this.
Thanks for the help,
Cameron