[ofw] IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.

Tzachi Dar tzachid at mellanox.co.il
Tue Nov 23 23:56:55 PST 2010


Nttttcp can be downloaded from here:

http://www.microsoft.com/whdc/device/network/tcp_tool.mspx

> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Wednesday, November 24, 2010 3:13 AM
> To: Tzachi Dar; Alex Naslednikov
> Cc: ofw at lists.openfabrics.org
> Subject: RE: IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP
> OK.
> 
> Tzachi Dar wrote:
> > In short you need to disable Nagle's algorithm when running this test
> > (or use bigger send buffers), or run with ntttcp.
> 
> Do you have a windows version of nttcp.exe you could share?
> Or source + build env?
> 
> >
> > When Nagle is running it means that there is no more than one unacked
> > packet in the air, or that you have a complete packet to send.
> >
> > On the other hand there is a tcp property known as delayed acks.
> > Delayed acks means that if one receives a packet it does not send an
> > ack until either the ack contains data or 200 ms have elapsed.
> >
> > In other words, you get into buffer combination that don't allow you
> > to send more data until there is ack, but ack only comes ones in
> > 200ms.
> >
> > See also:
> > http://www.stuartcheshire.org/papers/NagleDelayedAck/
> > http://en.wikipedia.org/wiki/Nagle's_algorithm
> >
> >
> > Thanks
> > Tzachi
> >
> 
> Thanks for the excellent education.
> 
> ttcp -D (sockopt TCP_NODELAY) cures the symptoms....not so sure about
> the problem.
> 
> Jumbo packets tends to embrace the old saying "bigger is not always
> better".
> 
> stan.
> >
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: Smith, Stan [mailto:stan.smith at intel.com]
> >> Sent: Tuesday, November 23, 2010 12:42 AM
> >> To: Alex Naslednikov; Tzachi Dar
> >> Cc: ofw at lists.openfabrics.org
> >> Subject: IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP
> OK.
> >>
> >> Hello,
> >>   I've been observing strange TCP bandwidth performance behaviors
> >> with varying IPoIB-CM MTU sizes. For a fixed ttcp/TCP transfer size
> >> (-n 2048 -l 8192), one would expect bandwidth to increase as MTU
> >> size increases.
> >>
> >> Using a heavily debug/checked IPoIB-CM driver on twin Windows 2008
> R2
> >> x64 systems with Infinihost HCAs (MT25208), the observed behavior
> is:
> >>
> >> When the MTU is > 8KB ... 64KB, TCP performance (ttcp -n 2048 -l
> >> 8192) is around 0.04 MB/sec?
> >>
> >> When the MTU is 2048 ... 8KB, TCP performance scales from 85 to 113
> >> MB/sec.
> >>
> >>  MTU (bytes)             sender's BW MB/sec
> >>  -------------------------------------
> >>  2048   2K                        85.11
> >>  4096   4K                       102.56
> >>  8192   8K                       113.48
> >>  9216   9K                       000.04
> >>  12288 12K                       000.04
> >>  16384 16K                       000.04
> >>  65536 64K                       000.04
> >>
> >>  65536 64K (UDP)                 128.00
> >>
> >> My local definition of Round-Trip Time (RTT) is defined as the time
> >> measured from a TCP segment/packet send (ib_post_send) to the time
> >> when the TCP ACK arrives as viewed from Wireshark.
> >>
> >> When the IPOIB-CM MTU <= 8KB, such that the NDIS send SGL (scatter
> >> gather list) size is <= 8KB, RTT is around 90 us (microseconds).
> >>
> >> When the IPoIB-CM MTU is > 8KB, RTT climbs to around 203 ms
> >> (milliseconds); ouch!
> >>
> >> The same ttcp transfer size using UDP instead of TCP does not suffer
> >> the performance drop @ 8KB.
> >>
> >> Are you aware of any TCP protocol settings/properties which
> >> could/should be adjusted to bring TCP bandwidth performance up to or
> >> > 8KB MTU BW levels?
> >>
> >> thanks,
> >>
> >> Stan.




More information about the ofw mailing list