[ofa-general] Re: [PATCHv2] opensm: Parallelize (Stripe) LFT sets across switches

Tue Aug 4 13:44:06 PDT 2009

On Tue, Aug 4, 2009 at 4:15 PM, Sasha Khapyorsky <sashak at voltaire.com>wrote:

> On 12:45 Tue 04 Aug     , Hal Rosenstock wrote:
> > >
> > > > This patch also introduces a new config option (max_smps_per_node)
> > > > which indicates how deep the per node pipeline is (current default is
> 4).
> > > > This also has the effect of limiting the number of times that the
> switch
> > > > list is traversed. Maybe this embellishment is unnecessary.
> > >
> > > Then why is it needed?
> >
> >
> > Also, as was discussed in the thread on this, it gives a way to control
> > possible VL15 overflow.
>
> VL15 overflow is controlled by max_wire_smps not by max_smps_per_node.

It's a different control on VL15 overflow. It can easily be eliminated if
that's what you want. There's actually some minor simplification with doing
this.

>
>
> > > I don't really like such separation (osm_ucast_mgr_set_fwd_tbl_top and
> > > osm_ucast_pipeline_tbl).
> >
> >
> > Why not ? What's the matter with doing this ?
>
> To not expose this (LFTs setup) algorithm to routing engines. And to
> eliminate duplicated function calls.
>
> > > Why to not use a single function and update all
> > > routing engines appropriately (you need to do it anyway), so that this
> > > will only fill up new_lfts table?
> >
> >
> > I'm not following what you're describing. set_fwd_tbl_top sets
> LinearFDBTop
> > whereas pipeline_tbl starts the cascade of LFT sets based on
> > max_smps_per_node.
>
> You can setup new_lfts arrays in routing engines and at the end of cycle
> call single osm_*setup*_lfts() which will do everything - setup TOPs and
> start to run LFT blocks update.
>
> > > > +
> > > > +             p_path =
> > > osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_sw->p_node, 0));
> > > > +
> > > > +             mad_context.lft_context.node_guid = node_guid;
> > > > +             mad_context.lft_context.set_method = TRUE;
> > > > +
> > > > +             osm_sm_set_next_lft_block(sm, p_sw, &block[0], p_path,
> > > > +                                       &mad_context);
> > > > +
> > > > +             p_sw->lft_block_id_ho++;
> > >
> > > Wouldn't it be simpler to encode block_id in a mad context?
> >
> >
> > Why simpler ? I think it complicates the receiver code to do that
> (assuming
> > max_smps_per_node remains).
>
> Ok.
>
> > > I suppose that this breaks 'file' routing engine (did you test it?) -
> > > instead of switch LFTs setup this will only update its TOPs.
> >
> > At this point, I don't recall.
>
> You removed osm_ucast_mgr_set_fwd_table() calls and placed
> osm_ucast_mgr_set_fwd_tbl_top() instead - obviously nothing will run an
> actual LFT blocks setup.

>
> > > Is it possible (for example in case of send errors) that "partial" LFT
> > > blocks sending will trigger wait_for_pending_transaction() completion?
> >
> >
> > I don't know. Is this different from the original algorithm in the case
> of
> > send errors ?
>
> Yes, it is different - unlike the original code it leaves ucast mgr (and
> go to wait in wait_for_pending()) before all required LFT blocks update
> requests were sent.

This goes away if there is no max_smps_per_node support.

So do you want to also preserve the original behavior/algorithm or you have
no preference ?

-- Hal

>
>
> Sasha
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/general/attachments/20090804/78f65531/attachment.html>