[openib-general] [PATCH] IB/ipoib: use appropriate path selector

Hal Rosenstock halr at voltaire.com
Thu Sep 14 04:35:10 PDT 2006


On Thu, 2006-09-14 at 07:14, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > 
> > On Thu, 2006-09-14 at 00:46, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <halr at voltaire.com>:
> > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > 
> > > > On Wed, 2006-09-13 at 18:08, Michael S. Tsirkin wrote:
> > > > > Quoting r. Roland Dreier <rdreier at cisco.com>:
> > > > > > Subject: Re: [PATCH] IB/ipoib: use appropriate path selector
> > > > > > 
> > > > > >     Michael> IPoIB in linux needs 2K MTU. Therefore it must set mtu
> > > > > >     Michael> selector in path record query accordingly.
> > > > > > 
> > > > > > Umm -- why does it need a 2K MTU?  As far as I know it should work
> > > > > > fine with any MTU, assuming the SA sets the MTU of the broadcast
> > > > > > multicast group correctly.
> > > > > 
> > > > > Hmm, you are right, it is just that existing implementations all
> > > > > set that to 2K.
> > > > 
> > > > By default yes. It can be configured.
> > > > 
> > > > > But there is a silent assumption that MTU of any path is >= broadcast
> > > > > multicast group MTU, and this is what I want to fix.
> > > > 
> > > > The spec says:
> > > > "The value (for IB MTU) assigned to the broadcast-GID must not be
> > > > greater than any physical link MTU spanned by the IPoIB subnet".
> > > > so if the broadcast group is improperly setup not to follow this, there
> > > > will be other issues.
> > > 
> > > Correct. IPoIB uses broadcast group MTU to get the value reported to
> > > Linux. If some link has a lower MTU IPoIB can not use it.
> > > 
> > > > It doesn't need to be included in the PR request.
> > > 
> > > I disagree here. If you do not set selector, SA is free to return
> > > a path with lower MTU even though physical link allows higher MTU.
> > > Does it say otherwise somewhere?
> > 
> > No but isn't this relying on using PRs in a certain way by IPoIB
> > implementations (and any other UD application) v. connected apps ?
> 
> Not really.
> 
> Tavor is faster with 1K MTU than with 2K MTU - it does not matter connected or
> not. So, for me, it makes sense for SM to choose 1K if Tavor is involved,
> unless application requested otherwise.
> 
> If an application (again, no matter connected or UD) needs a specific MTU it
> should use mtu selector in path query. If it does not, SM is free to choose any
> MTU supported by link, for best performance. If one end is Tavor, this happens to
> be 1K and not the maximum MTU.
> 
> So what we have here is IPoIB bug - it requires that path mtu >= bcast group
> mtu, but does not pass this information in query. This only happens to work
> if SM always selects max link MTU for each path query.

> Makes sense?

Understood. As I said in a previous email, if it happens that the path
MTU < broadcast group MTU, I think there would be join issues for some
nodes out there.

-- Hal







More information about the general mailing list