[ofw] IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.

Smith, Stan stan.smith at intel.com
Tue Nov 23 17:12:55 PST 2010


Tzachi Dar wrote:
> In short you need to disable Nagle's algorithm when running this test
> (or use bigger send buffers), or run with ntttcp.

Do you have a windows version of nttcp.exe you could share?
Or source + build env?

>
> When Nagle is running it means that there is no more than one unacked
> packet in the air, or that you have a complete packet to send.
>
> On the other hand there is a tcp property known as delayed acks.
> Delayed acks means that if one receives a packet it does not send an
> ack until either the ack contains data or 200 ms have elapsed.
>
> In other words, you get into buffer combination that don't allow you
> to send more data until there is ack, but ack only comes ones in
> 200ms.
>
> See also:
> http://www.stuartcheshire.org/papers/NagleDelayedAck/
> http://en.wikipedia.org/wiki/Nagle's_algorithm
>
>
> Thanks
> Tzachi
>

Thanks for the excellent education.

ttcp -D (sockopt TCP_NODELAY) cures the symptoms....not so sure about the problem.

Jumbo packets tends to embrace the old saying "bigger is not always better".

stan.
>
>
>
>
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Tuesday, November 23, 2010 12:42 AM
>> To: Alex Naslednikov; Tzachi Dar
>> Cc: ofw at lists.openfabrics.org
>> Subject: IPoIB-CM MTU and bandwidth scaling issue - TCP only, UDP OK.
>>
>> Hello,
>>   I've been observing strange TCP bandwidth performance behaviors
>> with varying IPoIB-CM MTU sizes. For a fixed ttcp/TCP transfer size
>> (-n 2048 -l 8192), one would expect bandwidth to increase as MTU
>> size increases.
>>
>> Using a heavily debug/checked IPoIB-CM driver on twin Windows 2008 R2
>> x64 systems with Infinihost HCAs (MT25208), the observed behavior is:
>>
>> When the MTU is > 8KB ... 64KB, TCP performance (ttcp -n 2048 -l
>> 8192) is around 0.04 MB/sec?
>>
>> When the MTU is 2048 ... 8KB, TCP performance scales from 85 to 113
>> MB/sec.
>>
>>  MTU (bytes)             sender's BW MB/sec
>>  -------------------------------------
>>  2048   2K                        85.11
>>  4096   4K                       102.56
>>  8192   8K                       113.48
>>  9216   9K                       000.04
>>  12288 12K                       000.04
>>  16384 16K                       000.04
>>  65536 64K                       000.04
>>
>>  65536 64K (UDP)                 128.00
>>
>> My local definition of Round-Trip Time (RTT) is defined as the time
>> measured from a TCP segment/packet send (ib_post_send) to the time
>> when the TCP ACK arrives as viewed from Wireshark.
>>
>> When the IPOIB-CM MTU <= 8KB, such that the NDIS send SGL (scatter
>> gather list) size is <= 8KB, RTT is around 90 us (microseconds).
>>
>> When the IPoIB-CM MTU is > 8KB, RTT climbs to around 203 ms
>> (milliseconds); ouch!
>>
>> The same ttcp transfer size using UDP instead of TCP does not suffer
>> the performance drop @ 8KB.
>>
>> Are you aware of any TCP protocol settings/properties which
>> could/should be adjusted to bring TCP bandwidth performance up to or
>> > 8KB MTU BW levels?
>>
>> thanks,
>>
>> Stan.




More information about the ofw mailing list