[openib-general] Problem is routing CM REQ

Michael Krause krause at cup.hp.com
Mon Feb 12 15:31:15 PST 2007


At 12:56 PM 2/12/2007, Jason Gunthorpe wrote:
>On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote:
> > >Ah, I think I missed the key step in your scheme.. You plan to query
> > >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
> > >was thinking only about the SGID=local DGID=remote query direction)
> >
> > I'm not sure that the query needs the GIDs reversed, as long as the 
> path is
> > reversible.  So, the local query would be:
> >
> > SGID=local, DGID=remote, reversible=1   (to SA)
> >
> > And the remote query would be:
> >
> > SGID=local, DGID=remote, reversible=1,  (to SA')
> > TClass & FlowLabel=from previous query response
>
>1) What does the TClass and FlowLabel returned from SGID=local
>    DGID=remote mean?
>    Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 
> direction
>    or both?
>1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure
>     what FlowLabel to return.
>1b) If it is for both directions then somehow SA, SA' and all four
>     router ports need to agree on global flowlabels.
>2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed'
>    since SGID=local is the wrong subnet for SA'.
>    I think defining this to mean something is risky.
>2b) A PR query with TClass and FlowLabel present in the query is
>     currently expected to return an answer with those fields matching.
>     That implies #1b..

TClass is intended to communicate the end-to-end QoS desired.   TClass is 
then mapped to a SL that is local to each subnet.   A flow label is 
intended to much the same as in the IP world and is left, in essence, to 
routers to manage.    An endnode look up should be to find the address 
vector to the remote.   A look up may return multiple vectors.   The SLID 
would correspond to each local subnet router port that acts as a first-hop 
destination to the remote subnet.    I don't see why the router protocol 
would not simply enable all paths on the local subnet to a given remote 
subnet be acquired.  All of the work is kept local to the SA / SM in the 
source subnet when determining a remote path to take.   Why is there any 
need to define more than just this?  Define a router protocol to 
communicate the each subnet's prefix, TClass, etc. and apply KISS.   A 
management entity that wanted to manage out each subnet provides router 
management in terms of route selection, etc. can be constructed by using 
the existing protocols / tools combined with a new router protocol which 
only does DGID to next hop SLID mapping.

Mike


>So, here is how I see this working..
>
>- There is a single well known 'reversible' flowlabel. When a router
>   processes a GRH with that flowlabel it produces a packet that
>   has a SLID that is always the same, no matter what router port is
>   used (A' or B' in my example). The LRH is also reversible according
>   to the rules in IBA.
>
>   A well known value side-steps the global information problem and
>   allows the GRH to be reversible.
>- Whenever a PR has reversible=1 the result returns the well known flowlabel.
>   The router LID is always the single shared SLID.
>- To get a more optimal path the following sequence of queries are used:
>   to SA: SGID=Node1 DGID=Node2
>    [In the background SA asks SA' what flow label to use]
>   to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above)
>   to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above)
>    [In the background SA' asks SA what flow label to use]
>   to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above)
>
>   It is almost guarenteed that the FlowLabel will be asymetric. This
>   is to keep the flowlabel space local to each subnet.
>
>   In the background quries SA and SA' also examine the global route
>   topology to select an optimal no-spoof needed router LID. The
>   background exchange is how the disambiguation problem with
>   multiple-router path is solved.
>
>Implicit in this are five IBA affecting things:
>  - that PRs with SGID=non-local mean something specific
>  - PRs with DGID=non-local cause the SA to communicate with the remote
>    SA to learn the GRH's FlowLabel
>    (except in the case where reversible=1)
>  - clients can communicate with remote SA's
>  - Routers do the SLID spoofing you outlined.
>  - SA's and routers collaborate quite closely on how the
>    router produces a LRH. In particular the SA controls the SLID
>    spoofing
>
>A new query type or maybe some kind of modified multi-path-record
>query could be defined by IBA to reduce the 6 exchanges required to
>something more efficient.
>
>Does this match what you are thinking?
>
> > >   SA                                                      SA'
> > >Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
> > >      |-> (LID 2) Router A                              |
> > >      |-> (LID 3) Router B -------  Router B' (LID B) --|
> > >
> > >Router A and Router B are independent redundant devices, not a route
> > >cloud of some sort. B -> A' is not a possible path.
> >
> > Since A' and B' connect to the same subnet, B -> A' should be a valid path.
>
>Please don't dismiss this case as it is a simple case of a more
>generalized problem. People will want to deploy primay and seconday
>routers (like dual star switching) that don't intercommunicate for
>reliability. The B -> A' path does not exist because the A and B
>routers are seperate non-linked devices and not just 4 ports on one
>large router. [A more general view would be a router ring architecture
>where the clockwise and counterclockwise paths use different
>hardware/cables]
>
>There is alot of complex work in the router and SA side to make this
>kind of topology work, but it is critical that the clients use path
>queries that can provide enough data to the SA and return enough data
>to the client to support this.
>
> > >I can think of the following downsides:
> > > 1) Re-reading Michael Krause's email makes me think that defeating
> > >    the QP SLID check is contrary to the spirit of IBA
> >
> > I don't think we need to defeat the QP SLID check if we want extra 
> routing,
> > but having redundant routers use the same link layer address isn't
> > necessarily a bad thing.
>
>Well, it is one and the same, the SLID is really only used in the QP
>SLID check so changing it around only serves to defeat that check.
>
>Jason
>
>_______________________________________________
>openib-general mailing list
>openib-general at openib.org
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit 
>http://openib.org/mailman/listinfo/openib-general






More information about the general mailing list