<br><font size=2><tt>Roland,</tt></font>

<br>

<br><font size=2><tt>Roland Dreier <rdreier@cisco.com> wrote on 04/17/2006

01:12:38 PM:<br>

> Have you ever seen this hurt performance?  It seems that splitting<br>

> receives and send CQs will increase the number of events generated

and<br>

> possibly use more CPU.<br>

The performance gain was not free, it did cost cpu utilization 3-5% more.

</tt></font>

<br><font size=2><tt>I don't have the comparison of the number of interrupts

with the same throughout.</tt></font>

<br><font size=2><tt><br>

> Actually, do you have some explanation for why this helps performance?<br>

> My intuition would be that it just generates more interrupts for the<br>

> same workload.<br>

The only lock contension in IPoIB I saw is tx_lock. When seperating </tt></font>

<br><font size=2><tt>the completion queue to have seperate completion handler.

It could improve</tt></font>

<br><font size=2><tt>the performance. I didn't look at driver code, it

might have some impact </tt></font>

<br><font size=2><tt>there? </tt></font>

<br>

<br><font size=2><tt>I did see high interrupts and I had pached IPoIB which

I mentioned before to have </tt></font>

<br><font size=2><tt>different NUM_WC under different workloads. It could

reduce the interrupts N times </tt></font>

<br><font size=2><tt>for the same throughput, and gain better throughput

under same cpu utilization. </tt></font>

<br><font size=2><tt>I am still investigating interrupts/cpu utilization/throughput

issues.</tt></font>

<br><font size=2><tt><br>

> One specific question:<br>

> <br>

>  > -       struct ib_wc ibwc[IPOIB_NUM_WC];<br>

>  > +       struct ib_wc *send_ibwc;<br>

>  > +       struct ib_wc *recv_ibwc;<br>

> <br>

> Why are you changing these to be dynamically allocated outside of

the<br>

> main structure?  Is it to avoid false sharing of cachelines?<br>

Yep, this was one of the reasons.</tt></font>

<br>

<br><font size=2><tt>> It might be better to sort the whole structure

so that we have all the<br>

> common, read-mostly stuff first, then TX stuff (marked with<br>

> ____cacheline_aligned_in_smp) and then RX stuff, also marked to be<br>

> cacheline aligned.<br>

> <br>

>  - R.<br>

</tt></font><font size=2 face="sans-serif">Sure. I will replace it and

rerun the test to see the difference.</font>

<br>

<br><font size=2 face="sans-serif">Thanks<br>

Shirley Ma<br>

IBM Linux Technology Center<br>

15300 SW Koll Parkway<br>

Beaverton, OR 97006-6063<br>

Phone(Fax): (503) 578-7638</font>