[openib-general] Problem is routing CM REQ

Mon Feb 12 12:56:34 PST 2007

On Mon, Feb 12, 2007 at 09:23:06AM -0800, Sean Hefty wrote:
> >Ah, I think I missed the key step in your scheme.. You plan to query
> >the local SM for SGID=remote DGID=local? (ie reversed from 'normal'. I
> >was thinking only about the SGID=local DGID=remote query direction)
> 
> I'm not sure that the query needs the GIDs reversed, as long as the path is 
> reversible.  So, the local query would be:
> 
> SGID=local, DGID=remote, reversible=1   (to SA)
> 
> And the remote query would be:
>
> SGID=local, DGID=remote, reversible=1,  (to SA')
> TClass & FlowLabel=from previous query response

1) What does the TClass and FlowLabel returned from SGID=local
   DGID=remote mean?
   Do you use it in the Node1 -> Node2 direction or the Node2 -> Node1 direction
   or both?
1a) If it is Node1 -> Node2 then the local SA has to query SA' to figure
    what FlowLabel to return.
1b) If it is for both directions then somehow SA, SA' and all four
    router ports need to agree on global flowlabels.
2) In the 2nd query, passing SGID=local, DGID=remote is 'reversed' 
   since SGID=local is the wrong subnet for SA'.
   I think defining this to mean something is risky.
2b) A PR query with TClass and FlowLabel present in the query is
    currently expected to return an answer with those fields matching.
    That implies #1b..

So, here is how I see this working..

- There is a single well known 'reversible' flowlabel. When a router
  processes a GRH with that flowlabel it produces a packet that
  has a SLID that is always the same, no matter what router port is
  used (A' or B' in my example). The LRH is also reversible according
  to the rules in IBA.

  A well known value side-steps the global information problem and
  allows the GRH to be reversible.
- Whenever a PR has reversible=1 the result returns the well known flowlabel.
  The router LID is always the single shared SLID.
- To get a more optimal path the following sequence of queries are used:
  to SA: SGID=Node1 DGID=Node2
   [In the background SA asks SA' what flow label to use]
  to SA': SGID=Node1 DGID=Node2 FlowLabel=(from above)
  to SA': SGID=Node2 DGID=Node1 SLID=(dlid from above)
   [In the background SA' asks SA what flow label to use]
  to SA: SGID=Node2 DGID=Node1 FlowLabel=(from above)

  It is almost guarenteed that the FlowLabel will be asymetric. This
  is to keep the flowlabel space local to each subnet.

  In the background quries SA and SA' also examine the global route
  topology to select an optimal no-spoof needed router LID. The
  background exchange is how the disambiguation problem with
  multiple-router path is solved.

Implicit in this are five IBA affecting things:
 - that PRs with SGID=non-local mean something specific
 - PRs with DGID=non-local cause the SA to communicate with the remote
   SA to learn the GRH's FlowLabel
   (except in the case where reversible=1)
 - clients can communicate with remote SA's
 - Routers do the SLID spoofing you outlined.
 - SA's and routers collaborate quite closely on how the
   router produces a LRH. In particular the SA controls the SLID
   spoofing

A new query type or maybe some kind of modified multi-path-record
query could be defined by IBA to reduce the 6 exchanges required to
something more efficient.

Does this match what you are thinking?

> >   SA                                                      SA'
> >Node1 --> (LID 1) Router A -------  Router A' (LID A) ---> Node2
> >      |-> (LID 2) Router A                              |
> >      |-> (LID 3) Router B -------  Router B' (LID B) --|
> >
> >Router A and Router B are independent redundant devices, not a route
> >cloud of some sort. B -> A' is not a possible path.
> 
> Since A' and B' connect to the same subnet, B -> A' should be a valid path.

Please don't dismiss this case as it is a simple case of a more
generalized problem. People will want to deploy primay and seconday
routers (like dual star switching) that don't intercommunicate for
reliability. The B -> A' path does not exist because the A and B
routers are seperate non-linked devices and not just 4 ports on one
large router. [A more general view would be a router ring architecture
where the clockwise and counterclockwise paths use different
hardware/cables]

There is alot of complex work in the router and SA side to make this
kind of topology work, but it is critical that the clients use path
queries that can provide enough data to the SA and return enough data
to the client to support this.

> >I can think of the following downsides:
> > 1) Re-reading Michael Krause's email makes me think that defeating
> >    the QP SLID check is contrary to the spirit of IBA
> 
> I don't think we need to defeat the QP SLID check if we want extra routing, 
> but having redundant routers use the same link layer address isn't 
> necessarily a bad thing.

Well, it is one and the same, the SLID is really only used in the QP
SLID check so changing it around only serves to defeat that check.

Jason