[ofw] IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.
Smith, Stan
stan.smith at intel.com
Tue Nov 23 17:12:55 PST 2010
Tzachi Dar wrote:
> In short you need to disable Nagle's algorithm when running this test
> (or use bigger send buffers), or run with ntttcp.
Do you have a windows version of nttcp.exe you could share?
Or source + build env?
>
> When Nagle is running it means that there is no more than one unacked
> packet in the air, or that you have a complete packet to send.
>
> On the other hand there is a tcp property known as delayed acks.
> Delayed acks means that if one receives a packet it does not send an
> ack until either the ack contains data or 200 ms have elapsed.
>
> In other words, you get into buffer combination that don't allow you
> to send more data until there is ack, but ack only comes ones in
> 200ms.
>
> See also:
> http://www.stuartcheshire.org/papers/NagleDelayedAck/
> http://en.wikipedia.org/wiki/Nagle's_algorithm
>
>
> Thanks
> Tzachi
>
Thanks for the excellent education.
ttcp -D (sockopt TCP_NODELAY) cures the symptoms....not so sure about the problem.
Jumbo packets tends to embrace the old saying "bigger is not always better".
stan.
>
>
>
>
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Tuesday, November 23, 2010 12:42 AM
>> To: Alex Naslednikov; Tzachi Dar
>> Cc: ofw at lists.openfabrics.org
>> Subject: IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.
>>
>> Hello,
>> I've been observing strange TCP bandwidth performance behaviors
>> with varying IPoIB-CM MTU sizes. For a fixed ttcp/TCP transfer size
>> (-n 2048 -l 8192), one would expect bandwidth to increase as MTU
>> size increases.
>>
>> Using a heavily debug/checked IPoIB-CM driver on twin Windows 2008 R2
>> x64 systems with Infinihost HCAs (MT25208), the observed behavior is:
>>
>> When the MTU is > 8KB ... 64KB, TCP performance (ttcp -n 2048 -l
>> 8192) is around 0.04 MB/sec?
>>
>> When the MTU is 2048 ... 8KB, TCP performance scales from 85 to 113
>> MB/sec.
>>
>> MTU (bytes) sender's BW MB/sec
>> -------------------------------------
>> 2048 2K 85.11
>> 4096 4K 102.56
>> 8192 8K 113.48
>> 9216 9K 000.04
>> 12288 12K 000.04
>> 16384 16K 000.04
>> 65536 64K 000.04
>>
>> 65536 64K (UDP) 128.00
>>
>> My local definition of Round-Trip Time (RTT) is defined as the time
>> measured from a TCP segment/packet send (ib_post_send) to the time
>> when the TCP ACK arrives as viewed from Wireshark.
>>
>> When the IPOIB-CM MTU <= 8KB, such that the NDIS send SGL (scatter
>> gather list) size is <= 8KB, RTT is around 90 us (microseconds).
>>
>> When the IPoIB-CM MTU is > 8KB, RTT climbs to around 203 ms
>> (milliseconds); ouch!
>>
>> The same ttcp transfer size using UDP instead of TCP does not suffer
>> the performance drop @ 8KB.
>>
>> Are you aware of any TCP protocol settings/properties which
>> could/should be adjusted to bring TCP bandwidth performance up to or
>> > 8KB MTU BW levels?
>>
>> thanks,
>>
>> Stan.
More information about the ofw
mailing list