[openib-general] Problem is routing CM REQ

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Mon Feb 12 18:03:30 PST 2007


On Mon, Feb 12, 2007 at 04:45:33PM -0800, Sean Hefty wrote:
> >>4. A PR from the local SA with reversible=1 indicates that data sent from 
> >>the remote GID to the local GID using the PR TC and FL will route locally 
> >>using the specified LID pair.  This holds whether the PR SGID is local or 
> >>remote.
> >
> >>5. A PR from a remote SA with reversible=1 indicates that data sent from 
> >>the local GID to the remote GID using the PR TC and FL will route 
> >>remotely using the specified LID pair.  This holds whether the PR SGID is 
> >>local or remote.
> >
> >I can't think how to actually implement these restrictions in the
> >general case without SLID spoofing and the general method I outlined
> >in my prior email.
> 
> But you agree with the expectations, and what reversible indicates?  Or are 
> you claiming that reversible paths between different subnets is undefined, 
> or means something different than specified in 13.5.4?  (E.g. reversible 
> applies only at the network level if global routing is used.)

I think pure reversible paths are a good idea to support on routed
paths - meaning strictly the definition from 13.5.4. That is a GMP
sender can request a PR with reversible=1 and know that if the
receiver applies 13.5.4 then the reply packet will get back to the
receiver. Note: As per the QP LID matching rules the SLID is not
matched for UD - so a reversible PR would not have to guarentee the
return path router SLID on the local side.

What your #4 and #5 are talking about is not just that, but also PR
queries that can unambigously identify the LID selections of the
router in advance. That is hugely different! IMHO, just because a
reversible path exists and will be used by the router shouldn't be
taken to mean that the it is the only one or that the SA can tell you
which of many possible choices it will be.

> >Think about this - it is backwards for the UD case. You have specified
> >that the SGID->DGID direction uses the returned SLID/DLID which are
> >ensured by the flowlabel in the GRH. But the local side only controls
> >what it sends. How does this GRH get to the remote side? In UD the
> >returned GRH from the PR controls the selection of LID on the DGID's
> >subnet. That is how it must be.
> 
> I'm not following you here.  For UD, query the local SA, then direct the 
> send to the router LID.  I would only query the remote SA for RC, in order 
> to get the remote LID information to put into the CM REQ.

I'm talking about the locality of information in the PR.

Eg:
PR query to SA: SGID=Node1 DGID=Node2 ==> Flowlabel=XX SLID=Node1 DLID=1

What direction does FlowLabel=xx refer to? Do you put it in the local
side's QP or do you put it in the CM REQ?

The use model that UD defines says it is to go in the QP, not the CM
REQ. It also more or less requires that the remote SA have a hand in
selecting the FlowLabel since the router on the Node2 subnet is the
one that acts on it.

When I read your mails I get the impression you want to put the
FlowLabel from the local PR in the CM REQ - which makes huge amounts
of sense, but is not really what is set out in IBA I feel. :<

Staying aligned with the UD use model for PRs is why I outlined a
solution that required the local SA to consult the remote SA to get
the FlowLabel.

> >The major problem is that there are multiple router paths that a given
> >GRH can take that are only fully disambiguated by the router lid at
> >the sender.
> 
> But doesn't 19.2.4.1 imply that once a router selects a path, it will 
> continue to use that same path for similar packets?  So, if we inject a GRH 
> into the internetwork from the source router, then isn't a single path 
> followed to the remote endpoint?

Yes. Absolutely. 

I view this problem not as if there is an existing fixed path, but
trying to find a way to support unambiguous identification of that
path when the DGID alone is not enough information.
[Ingress port, DGID, Flowlabel and TClass are the minimum required set
AFAIK]

BTW, 19.2.4.1 seems to imply that nothing in the spec is going to
cause a problem for the routers path selection since 'a session is
used in a deliberately vauge way'. My reading of 9.6.1.5 makes me
pretty sure it causes a problem due to the LRH.SLID matching - you
also agree right?

> Relaxing 9.6.1.5 seems like a nice solution to most of the problems, but it 
> also seems like one that would fail to work with any existing HCAs.

I agree. In fact until your mail last week I was operating under the
assumption (reinforced by text like 19.2.4.1) that nothing like
9.6.1.5 existed in the spec.

It wouldn't suprise me if the spec writers intended things to work as
though 9.6.1.5 didn't cause this problem and reworked it. If so then
cards that can't be fixed with a firmware upgrade wouldn't support
mutliple routed paths, but would support the simple single router LID
case. That might be acceptable. 

If so then I'd expect also for a SGID=off-subnet query to return the
remote LIDs to make CM work properly with existing conforming
implementations (that use 3 PR queries to get non-reversable paths
;>).

Jason




More information about the general mailing list