[ofa-general] Re: [OpenSM] updn routing performance fix???
Sasha Khapyorsky
sashak at voltaire.com
Fri Feb 29 17:46:35 PST 2008
On 09:58 Fri 29 Feb , Albert Chu wrote:
>
> What you're trying to do is calculate "ignore_existing_lfts" when the port
> trap is received rather than during routing later on?
Not trap but when PortInfo is received during subnet discovery phase of
sweep (before routing configuration).
> Logically it looks
> fine. I tried to make a fix from the "trap side" instead of the "routing
> side" initially too, but I didn't see a clean way to do it (obviously I
> don't know the code as well). I'll try it out when I get a chance.
>
> (FYI, I noticed
> + if (p_physp->need_update)
> should probably be:
> + if (p_physp->need_update && p_node->sw)
> given the code a few lines above?
> )
Yeah, this should be similar, but I don't understand yet why p_node->sw
check is really needed few lines above - for switches PortInfo is
queried only after SwitchInfo receive where p_node->sw is initialized.
Probably we can just remove this check here.
> > Regardless to this it also could be useful to add to the console a
> > command to set p_subn->ignore_existing_lfts up manually.
>
> Yeah, like you said above, this would especially be needed when a new
> switch is added to the network. I'll work with Ira on this.
Thanks.
> > Hmm, interesting... Are you running mpibench during heavy sweep? If so
> > could the degradation be due to a fact of path migration and potential
> > packet drops?
>
> Afraid not, it was after the heavy sweeps. I ran opensm in the foreground
> and saw nothing going on besides the occasional lite sweep.
>
> I've seen similar "inconsistencies" on performance when I've run ~120 node
> jobs on this cluster. So I personally think the tests are due to
> randomness of the nodes selected. I don't know if anything can be
> definitive until a 140+ node job is run (which I don't know if I can :-().
Ok.
Sasha
More information about the general
mailing list