[ofa-general] Re: [OpenSM][0/18] - Routing Chaining

Sasha Khapyorsky sashak at voltaire.com
Sun Sep 28 13:26:48 PDT 2008


Hi Al,

Sorry about delay. It took some time to review this patch series in
deep.

On 12:20 Mon 15 Sep     , Al Chu wrote:
> 
> As we've discussed before, we wanted to put routing chaining into
> opensm.

This is great. Thanks!

> osm_ucast defaults to minhop - The current code automatically
> defaulted to minhop if anything in the selected routing engine failed.
> Naturally this had to be changed for routing chaining.  I moved minhop
> out of the ucast_mgr code to make it its own routing engine instead.

I fully agree that minhop should be implemented as regular routing
engine. However I don't think we need to move
osm_ucast_mgr_build_lid_matrices() and ucast_mgr_build_lfts() and make
it minhop routing engine specific. osm_ucast_mgr_build_lid_matrices()
implements generic lid matrix generation code and ucast_mgr_build_lfts()
has generic balancer, both are useful by other routing engines.

> osm_ucast assumption on routing failures - The current code defaulted
> to minhop if anything in the selected routing engine failed.  Because
> of this some routing engines (most notably "file" routing)
> intentionally "failed" when it wanted default to some portion of
> minhop behavior.  All routing behavior had to be moved into routing
> engines to have the routing engines fully fail/succeed on their own.

Maybe we can use simpler solution - to use method's return status. It
can return negative value on failure, zero on success and positive if
method fallback is requested.

This will help to keep routing engine method to be optional (potentially
we can add multicast routing methods there) and to keep this simple
enough when single method fallback is desired.

> updn routing - currently utilizes the minhop build_fwd_tables but
> minhop's code assumes if build_lid_matrices is not-null, it is in
> "up/dn routing mode" instead of "minhop mode".  Perfectly fine when
> you can specify max of one routing engine, but needs to be abstracted
> out of minhop so up/dn is independent in its routing "attempt" in the
> chain.

Agree. But actually usage of this check is odd IMHO. I think it is fine
to just remove this and make unconditional debug warning instead.

> minhop routing assumed to never fail - Currently minhop routing cannot
> "fail".  So if someone wanted to put minhop into the middle of a
> routing chain, it makes no sense.  I assume this was based on legacy,
> when the minhop algorithm did not have options like
> "guid_routing_order_file" that could be parsed incorrectly.

Sure it is legacy, but not for only this reason.

In some already old days OpenSM state machine was design so that
"managers" (such as ucast_mgr, lid_mgr, etc.) were needed to return
value (osm_signal_t) which indicates sending MADs instead of its actual
execution status.

Fortunately it is not the case anymore, so we can rework all managers
(including ucast_mgr) to return its real "status".

> So, lots of rearchitecture were done and lots of cleanup was done as
> well.  Some bug fixes along the way too.

Finally this patch series leaves as with us:

21 files changed, 1538 insertions(+), 698 deletions(-)

, which I think is pretty big for routing chaining :(.

Assuming that we want this important feature to be included in OFED-1.4
and that OFED release cycle is already in "RC" phase I reworked this
as single and smaller patch which is based on your original patch series
(so obviously authorship is preserved). It includes some thoughts
mentioned above and also works with ibsim (still test it although). I
will post it to the list shortly. Let me know how it looks?

Ideally it would be nice to have it integrated before the next RC release
(6 Oct).

Sasha



More information about the general mailing list