<br><font size=2><tt>"Michael S. Tsirkin" <mst@mellanox.co.il>

wrote on 04/29/2006 06:01:41 PM:<br>

<br>

> Quoting r. Shirley Ma <xma@us.ibm.com>:<br>

> > Subject: Re: [openib-general] Re: Re: [PATCH] IPoIB splitting

CQ,?<br>

> increase both send/recv poll NUM_WC & interval<br>

> > <br>

> > <br>

> > Michael,<br>

> > <br>

> > "Michael S. Tsirkin" <mst@mellanox.co.il> wrote

on 04/29/2006 03:23:51 PM:<br>

> > > Quoting r. Shirley Ma <xma@us.ibm.com>:<br>

> > > > Subject: Re: [openib-general] Re: Re: [PATCH] IPoIB

splitting CQ,?<br>

> > > increase both send/recv poll NUM_WC & interval<br>

> > > ><br>

> > > ><br>

> > > > Michael,<br>

> > > ><br>

> > > > smp kernel on UP result is very bad. It dropped 40%

throughput.<br>

> > > > up kernel on UP thoughput dropped with cpu utilization

dropped<br>

> > > from 75% idle to 52% idle.<br>

> > ><br>

> > > Hmm. So far it seems the approach only works well on 2 CPUs.<br>

> > <br>

> > Did a clean 2.6.16 uniprocessor kernel build on both sides,<br>

> > + patch1 (splitting CQ & handler)<br>

> > + patch2 (tune CQ polling interval)<br>

> > + patch3 (use work queue in CQ handler)<br>

> > + patch4 (remove tx_ring) (rx_ring removal hasn't done yet)<br>

> > <br>

> > Without tuning, i got 1-3% throughput increase with average 10%<br>

> > cpu utiilzation reduce on netserver side. W/O patches, netperf

side<br>

> > is 100% cpu utilization.<br>

> > <br>

> > The best result I got so far with tunning, 25% throughput increase<br>

> > + 2-5% cpu utilization saving in netperf side.<br>

> <br>

> Is the difference with previous result the tx_ring removal?<br>

</tt></font>

<br><font size=2><tt>The previous comparsion test was based on one node

UP with 4x mthca,</tt></font>

<br><font size=2><tt>one node SMP with 12x ehca without tx_ring removal

since one of my machine</tt></font>

<br><font size=2><tt>was dead.</tt></font>

<br>

<br><font size=2><tt>The poor result bothered me. So I fixed the other

node.</tt></font>

<br><font size=2><tt>This time I made a clean UP kernel build, use mthca

on both netperf</tt></font>

<br><font size=2><tt>and netserver, and rerun test W/O above patches.</tt></font>

<br><font size=2><tt><br>

> > > > I didn't see latency difference. I used TCP_RR test.<br>

> > ><br>

> > > This is somewhat surprising, isn't it? One would explain

the extra<br>

> > > context switch to have some effect on latency, would one

not?<br>

> > <br>

> > I got around 4% latency decrease on UP with less cpu utilization.<br>

> <br>

> You mean, latency actually got better? If so, that is surprising.<br>

> <br>

> -- <br>

> MST</tt></font>

<br>

<br><font size=2 face="sans-serif">Sorry, I should have said latency was

increased around 4% with all</font>

<br><font size=2 face="sans-serif">of these patches with less cpu utilization.</font>

<br><font size=2 face="sans-serif"><br>

Thanks</font>

<br><font size=2 face="sans-serif">Shirley Ma<br>

IBM Linux Technology Center<br>

15300 SW Koll Parkway<br>

Beaverton, OR 97006-6063<br>

Phone(Fax): (503) 578-7638<br>

<br>

</font>

<br><font size=2><tt><br>

</tt></font>