[openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices
Rimmer, Todd
trimmer at silverstorm.com
Mon Sep 18 07:09:00 PDT 2006
> From: Or Gerlitz
> Sent: Monday, September 18, 2006 5:45 AM
> To: Michael S. Tsirkin
> Cc: OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
>
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
>
> >> Eitan Zahavi wrote:
> >>> The following patch solves an issue with OpenSM preferring largest
MTU
> >>> for PathRecord/MultiPathRecord for paths going to or from MT23108
> (Tavor)
> >>> devices instead of using a 1K MTU which is best for this device.
>
> >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?
>
> > I don't think so, no. Tavor supports 2K MTU, but it has better
> performance with
> > 1K MTU than 2K MTU. QP type should not matter.
>
> Can you double check that please, as far as i know there is something
> like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW
with
> Tavor/UD/2048 is **no less** then Tavor/UD/1024.
>
> So its very common for IPoIB net devices impl. to expose 2044 or 1500
> bytes MTU to the OS eg to cope with Ethernet and reduce IP
> fragmentation/reassembly of UDP/TCP traffic.
>
Putting this in the SM alone and making it a fabric wide setting is
inappropriate. The performance difference depends on application
message size. Application message size can vary per ULP and/or per
application itself. For example one MPI application may send mostly
large messages while another may send mostly small messages. The same
could be true of applications for other ULPs such as uDAPL and SDP, etc.
The root issue is the Tavor HCA has 1 too few credits to truly double
buffer at 2K MTU. However at message sizes > 1K but < 2K the 2K MTU
performs better.
Here are some MPI bandwidth results:
Tavor w/ 2K MTU:
512 140.394173
1024 310.553002
1500 407.003858
1800 435.538752
2048 392.831026
4096 417.592991
Tavor w/ 1K MTU:
512 140.261964
1024 300.789425
1500 379.746835
1800 416.726957
2048 425.227096
4096 501.442289
Note that message sizes shown on left do not include MPI headers. Hence
actual IB message size is approx 50 bytes larger.
So we see at IB message sizes < 1024 (MPI 512 message), performance is
the same.
At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance
is best with 2K MTU.
At IB message sizes > 2048 (MPI 2048-4096 messages above), performance
is best with 1K MTU.
At larger IB message sizes (MPI 4096 message), performance starts to
take off and ultimately at 128K message size (not shown) the 50%
difference between 1K and 2K MTU reaches its peak.
Todd Rimmer
More information about the general
mailing list