[openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
Michael S. Tsirkin
mst at mellanox.co.il
Wed Sep 13 15:01:03 PDT 2006
Quoting r. Rimmer, Todd <trimmer at silverstorm.com>:
> Subject: RE: [openib-general] [PATCH for-2.6.18] IB/cma: option to limitMTU to 1K
>
> > From: Sean Hefty
> > Sent: Wednesday, September 13, 2006 5:23 PM
> > To: Michael S. Tsirkin
> > Cc: openib-general at openib.org
> > Subject: Re: [openib-general] [PATCH for-2.6.18] IB/cma: option to
> > limitMTU to 1K
> >
> > Michael S. Tsirkin wrote:
> > >>Although, I don't like the idea of the CMA changing every path to
> use an
> > MTU of
> > >>1k.
> > >
> > > Well, that's why it's off by default.
> > > So, Ack?
> >
> > I'd like to find a way to support a 1k MTU to tavor HCAs without
> making
> > the MTU
> > 1k to other HCAs, in case we're dealing with a heterogeneous
> environment.
> >
> > Is this really the responsibility of the querying node or the SA?
> >
> > - Sean
> >
>
> The real issue here is how to handle "optimization" tricks for selected
> models of HCAs. While Tavor supports a 2K MTU and works with it, it has
> been found to offer better MPI bandwidth when running 1K MTU. For many
> other ULPs no difference in performance is observable (because many
> other ULPs don't stress the HCA the way MPI bandwidth benchmarks do).
>
> Another dimension to this problem is that its not clear what the best
> optimization will be in heterogeneous environments. Such as a Tavor HCA
> talking to a Sinai, Arbel or other type of TCA based device using a
> non-MPI protocol (such as a storage target). In those environments a 2K
> MTU may perform the same (or depending on the storage target, perhaps
> even better).
If Tavor is involved at either end, 1K MTU is better than 2K MTU.
> At this point I would suggest this is a subtle performance issue
> specific to MPI
This is not specific to MPI. All ULPs experience this issue.
> and MPI libraries can appropriately provide options to
> tune the maximum MTU MPI to use or request (which is only one of dozens
> of MPI tunables needed to fine tune MPI). MPI environments will tend to
> be more homogeneous which also simplifies the solution.
>
> Pushing these types of ULP and source/destination specific issues into
> the core stack or SM will get very complex very quick.
It's actually relatively simple.
> Given the issue
> on the table (Tavor performance) is specific to an older HCA model, it
> may not even be that critical since the highest performance customers
> have long since moved toward PCIe and DDR fabrics, neither of which are
> supported by Tavor.
All the more reason to pt the simple logic in one place
and not expect all apprlications to optimize for this hardware.
--
MST
More information about the general
mailing list