[ofa-general] Re: [PATCHv2] osm: Clearing lid matrices before rebuilding them
Yevgeny Kliteynik
kliteyn at dev.mellanox.co.il
Tue Mar 20 02:52:04 PDT 2007
Sasha Khapyorsky wrote:
> On 16:50 Mon 19 Mar , Yevgeny Kliteynik wrote:
>> In __osm_ucast_mgr_process_neighbor(), there is the following assertion:
>>
>> CL_ASSERT( hops <= osm_switch_get_hop_count( p_sw, lid_ho,
>> port_num ) );
>>
>> This assertion fails, since the hop count becomes inconsistent.
>
> This is not big problem IMO, we just need to not deal with non-existing
> LIDs there (so __osm_ucast_mgr_process_neighbor() code should be
> improved in this direction and this assertion removed). And the LFTs
> generation code doesn't try to build entries for non-existing LIDs, so
> "old" min hop vectors will be ignored there.
>
> But I think we could have a problem when the port (switch with master)
> is reconnected at different location. Then old/invalid hop counts will
> be counted again and if it "wins" we can get not expected routing paths.
> So obviously hop matrix cleanup is simplest fix - Agreed.
>
>>>> I'm not sure about the trunk though.
>>>> Sasha,
>>>> Can you please check that you latest improvements to the
>>>> routing don't have this problem?
>>> With disconnecting switches should be similar behavior I guess.
>> Right, I checked it - same problem.
>
> Interesting. This function is different in the master and doesn't scan
> LIDs from 1 up to max anymore, instead it scans only switches existing at
> the moment.
>
> Could you provide more details about the master? Do you able to see the
> problem with just switch disconnections? What is the test case?
I had this problem on some copy of master that wasn't updated.
After updating it I can't see this problem happening again.
But the hop count in not cleared there too, so even if I can't
recreate this problem (or even if the new flow solves this particular
bug), I think we do agree that it would be better to clear hop count
anyway.
-- Yevgeny
> Sasha
>
More information about the general
mailing list