[ofa-general] Re: [OpenSM] updn routing performance fix???

Sasha Khapyorsky sashak at voltaire.com
Fri Feb 29 17:46:35 PST 2008


On 09:58 Fri 29 Feb     , Albert Chu wrote:
> 
> What you're trying to do is calculate "ignore_existing_lfts" when the port
> trap is received rather than during routing later on?

Not trap but when PortInfo is received during subnet discovery phase of
sweep (before routing configuration).

> Logically it looks
> fine.  I tried to make a fix from the "trap side" instead of the "routing
> side" initially too, but I didn't see a clean way to do it (obviously I
> don't know the code as well).  I'll try it out when I get a chance.
> 
> (FYI, I noticed
> +        if (p_physp->need_update)
> should probably be:
> +        if (p_physp->need_update && p_node->sw)
> given the code a few lines above?
> )

Yeah, this should be similar, but I don't understand yet why p_node->sw
check is really needed few lines above - for switches PortInfo is
queried only after SwitchInfo receive where p_node->sw is initialized.
Probably we can just remove this check here.

> > Regardless to this it also could be useful to add to the console a
> > command to set p_subn->ignore_existing_lfts up manually.
> 
> Yeah, like you said above, this would especially be needed when a new
> switch is added to the network.  I'll work with Ira on this.

Thanks.

> > Hmm, interesting... Are you running mpibench during heavy sweep? If so
> > could the degradation be due to a fact of path migration and potential
> > packet drops?
> 
> Afraid not, it was after the heavy sweeps.  I ran opensm in the foreground
> and saw nothing going on besides the occasional lite sweep.
> 
> I've seen similar "inconsistencies" on performance when I've run ~120 node
> jobs on this cluster.  So I personally think the tests are due to
> randomness of the nodes selected.  I don't know if anything can be
> definitive until a 140+ node job is run (which I don't know if I can :-().

Ok.

Sasha



More information about the general mailing list