[openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K

Wed Sep 13 14:54:26 PDT 2006

> From: Sean Hefty
> Sent: Wednesday, September 13, 2006 5:23 PM
> To: Michael S. Tsirkin
> Cc: openib-general at openib.org
> Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to
> limitMTU to 1K
> 
> Michael S. Tsirkin wrote:
> >>Although, I don't like the idea of the CMA changing every path to
use an
> MTU of
> >>1k.
> >
> > Well, that's why it's off by default.
> > So, Ack?
> 
> I'd like to find a way to support a 1k MTU to tavor HCAs without
making
> the MTU
> 1k to other HCAs, in case we're dealing with a heterogeneous
environment.
> 
> Is this really the responsibility of the querying node or the SA?
> 
> - Sean
> 

The real issue here is how to handle "optimization" tricks for selected
models of HCAs.  While Tavor supports a 2K MTU and works with it, it has
been found to offer better MPI bandwidth when running 1K MTU.  For many
other ULPs no difference in performance is observable (because many
other ULPs don't stress the HCA the way MPI bandwidth benchmarks do).

Another dimension to this problem is that its not clear what the best
optimization will be in heterogeneous environments.  Such as a Tavor HCA
talking to a Sinai, Arbel or other type of TCA based device using a
non-MPI protocol (such as a storage target).  In those environments a 2K
MTU may perform the same (or depending on the storage target, perhaps
even better).

At this point I would suggest this is a subtle performance issue
specific to MPI and MPI libraries can appropriately provide options to
tune the maximum MTU MPI to use or request (which is only one of dozens
of MPI tunables needed to fine tune MPI).  MPI environments will tend to
be more homogeneous which also simplifies the solution.

Pushing these types of ULP and source/destination specific issues into
the core stack or SM will get very complex very quick.  Given the issue
on the table (Tavor performance) is specific to an older HCA model, it
may not even be that critical since the highest performance customers
have long since moved toward PCIe and DDR fabrics, neither of which are
supported by Tavor.

Todd Rimmer