<br><font size=2><tt>Hello Leonid,</tt></font>
<br>
<br><font size=2><tt>Leonid Arsh <leonida@voltaire.com> wrote on
04/23/2006 06:38:00 AM:<br>
> Shirley,<br>
> <br>
> some additional information you may be interested:<br>
> <br>
> According to our experience with the Voltaire IPoIB driver,<br>
> splitting CQ harmed the throughput (we checked with the iperf
<br>
> application, UDP mode.) Splitting the the CQ caused more interrupts,
<br>
> context switches and CQ polls.</tt></font>
<br><font size=2><tt>> Note, the case is rather different from
OpenIB mthca, since Voltare <br>
> IPoIB is based on the VAPI driver,<br>
> where CQ completions are handled in a tasklet context,<br>
> unlike mthca where CQ completions are handled in the
HW interrupt <br>
> context.<br>
</tt></font>
<br><font size=2><tt>That expected because only one tasklet is allowed
running across all cpus in the same time.</tt></font>
<br><font size=2><tt>Have you tried to use other SOFTIRQ instead of TASKLET_SOFTIRQ?</tt></font>
<br><font size=2><tt>My expectation is the performance will be better since
there would be multiple</tt></font>
<br><font size=2><tt>softirqs running simultaneously. If it's a simple
change of your code, could you please try it?</tt></font>
<br>
<br><font size=2><tt>I am thinking to split mthca CQ completion into HW
interrupt and softirq context.</tt></font>
<br><font size=2><tt><br>
> NAPI gave us some improvement. I think NAPI should improve
much more <br>
> in mthca, with the HW interrupt CQ completions.</tt></font>
<br><font size=2 face="sans-serif">Yes, with the hardware interrupts are
disabled.<br>
</font>
<br><font size=2 face="sans-serif">It would be interesting to compare the
completion CQ with NAPI and in softirq context.</font>
<br>
<br><font size=2 face="sans-serif">It all depends on how you implement
NAPI. If you only implement NAPI without </font>
<br><font size=2 face="sans-serif">changing the sender, NAPI might not
get better performance than softirq.</font>
<br><font size=2 face="sans-serif">The benefit of NAPI, it has one dev->poll
running across all cpus to prevent </font>
<br><font size=2 face="sans-serif">packets out of order totally.</font>
<br>
<br><font size=2 face="sans-serif">Thanks<br>
Shirley Ma<br>
IBM Linux Technology Center<br>
15300 SW Koll Parkway<br>
Beaverton, OR 97006-6063<br>
Phone(Fax): (503) 578-7638</font>