[openib-general] [PATCH] osm: PathRecord prefer 1K MTU for MT23108 devices

Rimmer, Todd trimmer at silverstorm.com
Mon Sep 18 07:09:00 PDT 2006


> From: Or Gerlitz
> Sent: Monday, September 18, 2006 5:45 AM
> To: Michael S. Tsirkin
> Cc: OPENIB
> Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU
for
> MT23108 devices
> 
> Michael S. Tsirkin wrote:
> > Quoting r. Or Gerlitz <ogerlitz at voltaire.com>:
> 
> >> Eitan Zahavi wrote:
> >>> The following patch solves an issue with OpenSM preferring largest
MTU
> >>> for PathRecord/MultiPathRecord for paths going to or from MT23108
> (Tavor)
> >>> devices instead of using a 1K MTU which is best for this device.
> 
> >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP?
> 
> > I don't think so, no. Tavor supports 2K MTU, but it has better
> performance with
> > 1K MTU than 2K MTU. QP type should not matter.
> 
> Can you double check that please, as far as i know there is something
> like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW
with
> Tavor/UD/2048 is **no less** then Tavor/UD/1024.
> 
> So its very common for IPoIB net devices impl. to expose 2044 or 1500
> bytes MTU to the OS eg to cope with Ethernet and reduce IP
> fragmentation/reassembly of UDP/TCP traffic.
> 

Putting this in the SM alone and making it a fabric wide setting is
inappropriate.  The performance difference depends on application
message size.  Application message size can vary per ULP and/or per
application itself.  For example one MPI application may send mostly
large messages while another may send mostly small messages.  The same
could be true of applications for other ULPs such as uDAPL and SDP, etc.

The root issue is the Tavor HCA has 1 too few credits to truly double
buffer at 2K MTU.  However at message sizes > 1K but < 2K the 2K MTU
performs better.

Here are some MPI bandwidth results:
Tavor w/ 2K MTU:
512             140.394173
1024            310.553002
1500            407.003858
1800            435.538752
2048            392.831026
4096            417.592991

Tavor w/ 1K MTU:
512             140.261964
1024            300.789425
1500            379.746835
1800            416.726957
2048            425.227096
4096            501.442289

Note that message sizes shown on left do not include MPI headers.  Hence
actual IB message size is approx 50 bytes larger.

So we see at IB message sizes < 1024 (MPI 512 message), performance is
the same.
At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance
is best with 2K MTU.
At IB message sizes > 2048 (MPI 2048-4096 messages above), performance
is best with 1K MTU.
At larger IB message sizes (MPI 4096 message), performance starts to
take off and ultimately at 128K message size (not shown) the 50%
difference between 1K and 2K MTU reaches its peak.

Todd Rimmer




More information about the general mailing list