[ofa-general] Re: [PATCH] opensm/osm_switch.h: use updated LFT for routing

Yevgeny Kliteynik kliteyn at dev.mellanox.co.il
Sun Nov 23 05:24:37 PST 2008


Sasha,

Sasha Khapyorsky wrote:
> On 14:20 Sun 23 Nov     , Yevgeny Kliteynik wrote:
>>>> One immediate outcome of this bug is opensm.fdbs file - when it
>>>> is dumped from the switch LFT (and not from lft_buf),
>>> Why this bug is triggered only now?
>> I had sometimes errors in simulations, and after aome analysis
>> I decided that they are timing problems with the tests.
>> Now that I did some stress testing of ucast cache, I started
>> to see more of these errors.
> 
> If you are sure that this is simulator or test problems then just close
> #1406 as invalid. Obviously we don't need such patch then.

No, I'm not sure. My original patch has eliminated this problem.
I any case, deeper investigation is needed.

-- Yevgeny

>>>> it sometimes
>>>> doesn't match the lst file.
>>> What this "sometimes" mean? I think the case should be investigated
>>> deeper. By such patch we are just trying to hide a possible issue.
>>> As far as I understand opensm.fdbs (and other routing dump) are
>>> generated only after all LinFwdTbl responses are arrived, when some of
>>> them failed 'subnet_initialization_error' flag is up and OpenSM will
>>> resweep. If so why is 'opensm.fdbs' broken? It is not immediately
>>> clear for me.
>> I didn't see 'subnet_initialization_error' in such cases.
>> Anyway, here's what I can do: at the end of each ucast_mgr_process
>> I'll compare lft and lft_buf (something that the other patch is
>> doing, the one that frees lft_buf), and if there is a difference,
>> then we have a problem. In not - then I'll look for the cause
>> elsewhere.
> 
> Yes, seems deeper investigation is needed here. Thanks.
> 
> Sasha
> 




More information about the general mailing list