[ofa-general] opensm dumps core when using LASH for routing
    Sasha Khapyorsky 
    sashak at voltaire.com
       
    Sun Jan 13 12:17:47 PST 2008
    
    
  
On 22:25 Sun 13 Jan     , Max Matveev wrote:
> >>>>> "sashak" == Sasha Khapyorsky writes:
> 
>  sashak> I suspect that the failure scenario is different. This switch
>  sashak> was just connected/discovered by OpenSM (it has hops = 0x0
>  sashak> yet - this indicates that it does not pass lid matrix
>  sashak> generation stage yet) and it still be uninitialized by
>  sashak> LASH. If it is really so checking ->priv for NULL looks like
>  sashak> valid fix.
> 
> Should opensm ignore requests while it's initializing?
It is initialized, except a newly added switch.
I did some tests today in order to reproduce the failure with simulator,
but without big success - PathRecord query should be rejected when it
passes non-prepared switches. At least it is with master branch.
>  sashak> Is this reproducible failure?
> 
> We've hit it twice - first time cores were disabled, so I only know
> what opensm died in get_lash_id() but I don't know where it was called
> from. And this is the second time.
Would be interesting to know in which OpenSM state it happens.
Could you send me the core file and exact git tree hash? I would like
to investigate this deeper.
Sasha
    
    
More information about the general
mailing list