<br><font size=2><tt>"Michael S. Tsirkin" <mst@mellanox.co.il>
wrote on 04/29/2006 06:01:41 PM:<br>
<br>
> Quoting r. Shirley Ma <xma@us.ibm.com>:<br>
> > Subject: Re: [openib-general] Re: Re: [PATCH] IPoIB splitting
CQ,?<br>
> increase both send/recv poll NUM_WC & interval<br>
> > <br>
> > <br>
> > Michael,<br>
> > <br>
> > "Michael S. Tsirkin" <mst@mellanox.co.il> wrote
on 04/29/2006 03:23:51 PM:<br>
> > > Quoting r. Shirley Ma <xma@us.ibm.com>:<br>
> > > > Subject: Re: [openib-general] Re: Re: [PATCH] IPoIB
splitting CQ,?<br>
> > > increase both send/recv poll NUM_WC & interval<br>
> > > ><br>
> > > ><br>
> > > > Michael,<br>
> > > ><br>
> > > > smp kernel on UP result is very bad. It dropped 40%
throughput.<br>
> > > > up kernel on UP thoughput dropped with cpu utilization
dropped<br>
> > > from 75% idle to 52% idle.<br>
> > ><br>
> > > Hmm. So far it seems the approach only works well on 2 CPUs.<br>
> > <br>
> > Did a clean 2.6.16 uniprocessor kernel build on both sides,<br>
> > + patch1 (splitting CQ & handler)<br>
> > + patch2 (tune CQ polling interval)<br>
> > + patch3 (use work queue in CQ handler)<br>
> > + patch4 (remove tx_ring) (rx_ring removal hasn't done yet)<br>
> > <br>
> > Without tuning, i got 1-3% throughput increase with average 10%<br>
> > cpu utiilzation reduce on netserver side. W/O patches, netperf
side<br>
> > is 100% cpu utilization.<br>
> > <br>
> > The best result I got so far with tunning, 25% throughput increase<br>
> > + 2-5% cpu utilization saving in netperf side.<br>
> <br>
> Is the difference with previous result the tx_ring removal?<br>
</tt></font>
<br><font size=2><tt>The previous comparsion test was based on one node
UP with 4x mthca,</tt></font>
<br><font size=2><tt>one node SMP with 12x ehca without tx_ring removal
since one of my machine</tt></font>
<br><font size=2><tt>was dead.</tt></font>
<br>
<br><font size=2><tt>The poor result bothered me. So I fixed the other
node.</tt></font>
<br><font size=2><tt>This time I made a clean UP kernel build, use mthca
on both netperf</tt></font>
<br><font size=2><tt>and netserver, and rerun test W/O above patches.</tt></font>
<br><font size=2><tt><br>
> > > > I didn't see latency difference. I used TCP_RR test.<br>
> > ><br>
> > > This is somewhat surprising, isn't it? One would explain
the extra<br>
> > > context switch to have some effect on latency, would one
not?<br>
> > <br>
> > I got around 4% latency decrease on UP with less cpu utilization.<br>
> <br>
> You mean, latency actually got better? If so, that is surprising.<br>
> <br>
> -- <br>
> MST</tt></font>
<br>
<br><font size=2 face="sans-serif">Sorry, I should have said latency was
increased around 4% with all</font>
<br><font size=2 face="sans-serif">of these patches with less cpu utilization.</font>
<br><font size=2 face="sans-serif"><br>
Thanks</font>
<br><font size=2 face="sans-serif">Shirley Ma<br>
IBM Linux Technology Center<br>
15300 SW Koll Parkway<br>
Beaverton, OR 97006-6063<br>
Phone(Fax): (503) 578-7638<br>
<br>
</font>
<br><font size=2><tt><br>
</tt></font>