[ofa-general] Re: [PATCHv2] opensm: Parallelize (Stripe) LFT sets across switches

Sasha Khapyorsky sashak at voltaire.com
Tue Aug 4 13:15:05 PDT 2009


On 12:45 Tue 04 Aug     , Hal Rosenstock wrote:
> >
> > > This patch also introduces a new config option (max_smps_per_node)
> > > which indicates how deep the per node pipeline is (current default is 4).
> > > This also has the effect of limiting the number of times that the switch
> > > list is traversed. Maybe this embellishment is unnecessary.
> >
> > Then why is it needed?
> 
> 
> Also, as was discussed in the thread on this, it gives a way to control
> possible VL15 overflow.

VL15 overflow is controlled by max_wire_smps not by max_smps_per_node.

> > I don't really like such separation (osm_ucast_mgr_set_fwd_tbl_top and
> > osm_ucast_pipeline_tbl).
> 
> 
> Why not ? What's the matter with doing this ?

To not expose this (LFTs setup) algorithm to routing engines. And to
eliminate duplicated function calls.

> > Why to not use a single function and update all
> > routing engines appropriately (you need to do it anyway), so that this
> > will only fill up new_lfts table?
> 
> 
> I'm not following what you're describing. set_fwd_tbl_top sets LinearFDBTop
> whereas pipeline_tbl starts the cascade of LFT sets based on
> max_smps_per_node.

You can setup new_lfts arrays in routing engines and at the end of cycle
call single osm_*setup*_lfts() which will do everything - setup TOPs and
start to run LFT blocks update.

> > > +
> > > +             p_path =
> > osm_physp_get_dr_path_ptr(osm_node_get_physp_ptr(p_sw->p_node, 0));
> > > +
> > > +             mad_context.lft_context.node_guid = node_guid;
> > > +             mad_context.lft_context.set_method = TRUE;
> > > +
> > > +             osm_sm_set_next_lft_block(sm, p_sw, &block[0], p_path,
> > > +                                       &mad_context);
> > > +
> > > +             p_sw->lft_block_id_ho++;
> >
> > Wouldn't it be simpler to encode block_id in a mad context?
> 
> 
> Why simpler ? I think it complicates the receiver code to do that (assuming
> max_smps_per_node remains).

Ok.

> > I suppose that this breaks 'file' routing engine (did you test it?) -
> > instead of switch LFTs setup this will only update its TOPs.
> 
> At this point, I don't recall.

You removed osm_ucast_mgr_set_fwd_table() calls and placed
osm_ucast_mgr_set_fwd_tbl_top() instead - obviously nothing will run an
actual LFT blocks setup.

> > Is it possible (for example in case of send errors) that "partial" LFT
> > blocks sending will trigger wait_for_pending_transaction() completion?
> 
> 
> I don't know. Is this different from the original algorithm in the case of
> send errors ?

Yes, it is different - unlike the original code it leaves ucast mgr (and
go to wait in wait_for_pending()) before all required LFT blocks update
requests were sent.

Sasha



More information about the general mailing list