[ofw] IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.

Tzachi Dar tzachid at mellanox.co.il
Tue Nov 23 02:11:04 PST 2010


In short you need to disable Nagle's algorithm when running this test (or use bigger send buffers), or run with ntttcp.

When Nagle is running it means that there is no more than one unacked packet in the air, or that you have a complete packet to send.

On the other hand there is a tcp property known as delayed acks. Delayed acks means that if one receives a packet it does not send an ack until either the ack contains data or 200 ms have elapsed.

In other words, you get into buffer combination that don't allow you to send more data until there is ack, but ack only comes ones in 200ms.

See also:
http://www.stuartcheshire.org/papers/NagleDelayedAck/
http://en.wikipedia.org/wiki/Nagle's_algorithm 


Thanks
Tzachi






> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Tuesday, November 23, 2010 12:42 AM
> To: Alex Naslednikov; Tzachi Dar
> Cc: ofw at lists.openfabrics.org
> Subject: IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.
> 
> Hello,
>   I've been observing strange TCP bandwidth performance behaviors with
> varying IPoIB-CM MTU sizes.
> For a fixed ttcp/TCP transfer size (-n 2048 -l 8192), one would expect
> bandwidth to increase as MTU size increases.
> 
> Using a heavily debug/checked IPoIB-CM driver on twin Windows 2008 R2
> x64 systems with Infinihost HCAs (MT25208), the observed behavior is:
> 
> When the MTU is > 8KB ... 64KB, TCP performance (ttcp -n 2048 -l 8192)
> is around 0.04 MB/sec?
> 
> When the MTU is 2048 ... 8KB, TCP performance scales from 85 to 113
> MB/sec.
> 
>  MTU (bytes)             sender's BW MB/sec
>  -------------------------------------
>  2048   2K                        85.11
>  4096   4K                       102.56
>  8192   8K                       113.48
>  9216   9K                       000.04
>  12288 12K                       000.04
>  16384 16K                       000.04
>  65536 64K                       000.04
> 
>  65536 64K (UDP)                 128.00
> 
> My local definition of Round-Trip Time (RTT) is defined as the time
> measured from a TCP segment/packet send (ib_post_send) to the time when
> the TCP ACK arrives as viewed from Wireshark.
> 
> When the IPOIB-CM MTU <= 8KB, such that the NDIS send SGL (scatter
> gather list) size is <= 8KB, RTT is around 90 us (microseconds).
> 
> When the IPoIB-CM MTU is > 8KB, RTT climbs to around 203 ms
> (milliseconds); ouch!
> 
> The same ttcp transfer size using UDP instead of TCP does not suffer
> the performance drop @ 8KB.
> 
> Are you aware of any TCP protocol settings/properties which
> could/should be adjusted to bring TCP bandwidth performance up to or >
> 8KB MTU BW levels?
> 
> thanks,
> 
> Stan.
> 
> 
> 




More information about the ofw mailing list