[ofa-general] [PATCH] opensm: Parallelize (Stripe) LFT sets across switches

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Jul 21 11:44:46 PDT 2009


On Tue, Jul 21, 2009 at 02:03:12PM -0400, Hal Rosenstock wrote:
 
> Currently, MADs are pipelined to a single switch at a time which
> effectively serializes these requests due to processing at the SMA.
> This patch pipelines (stripes) them across the switches first before
> proceeding with successive blocks. As a result of this striping,
> multiple switches can process the set and respond concurrently
> which results in an improvement to the subnet initialization time.

Doing this without also using LID routing to the target switch is just
going to overload the SMAs in the intermediate switches with too many
DR SMPs.

The most efficient approach is to program LFTs using a breadth first
search of the connectivity graph starting from the SM end port:
 1) Within the BFS consider things as layers (number of hops from the
    SM). All switches in a layer can proceed largely in parallel,
    within the capability of the SM to capture the MAD replies at
    least. Advancing to the next layer must wait until the prior
    layer is fully programmed.
 2) At each switch send at most two partial DR MADs - where the MAD is
    LID routed up to the parent switch in the BFS, and then direct routed
    one step to the target switch. The two MADs would program the LFT
    block for the SM LID and for the switch LID.
 3) Sent LID routed mads directly to the target switch to fill in the
    rest of the LFT entries.

Step 2 needs to respect the concurrency limit for the parent switch,
and #3 needs to respect the concurrency limit for the target switch.
As you go out the BFS there are more parent swtiches and target
switches available to run in parallel.

Eliminating DR hops will significantly improve MAD round trip time
and give you more possible parallelism before the SMAs in intermediate
switches become overloaded.

Overall, optimizing things so that as few DR MADs are used as possible
is desirable. The NodeInfo write to set the switch LID can also be put
into the above process and then all other switch programming MADs can
simply happen via LID routed packets.

Jason



More information about the general mailing list