[ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them

Sasha Khapyorsky sashak at voltaire.com
Mon Mar 19 11:55:31 PDT 2007


On 16:50 Mon 19 Mar     , Yevgeny Kliteynik wrote:
> 
> In __osm_ucast_mgr_process_neighbor(), there is the following assertion:
> 
>        CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho,
>                                                     port_num ) );
> 
> This assertion fails, since the hop count becomes inconsistent.

This is not big problem IMO, we just need to not deal with non-existing
LIDs there (so __osm_ucast_mgr_process_neighbor() code should be
improved in this direction and this assertion removed). And the LFTs
generation code doesn't try to build entries for non-existing LIDs, so
"old" min hop vectors will be ignored there.

But I think we could have a problem when the port (switch with master)
is reconnected at different location. Then old/invalid hop counts will
be counted again and if it "wins" we can get not expected routing paths.
So obviously hop matrix cleanup is simplest fix - Agreed.

> >>I'm not sure about the trunk though.
> >>Sasha,
> >>Can you please check that you latest improvements to the
> >>routing don't have this problem?
> >
> >With disconnecting switches should be similar behavior I guess.
> 
> Right, I checked it - same problem.

Interesting. This function is different in the master and doesn't scan
LIDs from 1 up to max anymore, instead it scans only switches existing at
the moment.

Could you provide more details about the master? Do you able to see the
problem with just switch disconnections? What is the test case?

Sasha



More information about the general mailing list