[ofa-general] opensm dumps core when using LASH for routing

Sasha Khapyorsky sashak at voltaire.com
Sun Jan 13 12:17:47 PST 2008


On 22:25 Sun 13 Jan     , Max Matveev wrote:
> >>>>> "sashak" == Sasha Khapyorsky writes:
> 
>  sashak> I suspect that the failure scenario is different. This switch
>  sashak> was just connected/discovered by OpenSM (it has hops = 0x0
>  sashak> yet - this indicates that it does not pass lid matrix
>  sashak> generation stage yet) and it still be uninitialized by
>  sashak> LASH. If it is really so checking ->priv for NULL looks like
>  sashak> valid fix.
> 
> Should opensm ignore requests while it's initializing?

It is initialized, except a newly added switch.

I did some tests today in order to reproduce the failure with simulator,
but without big success - PathRecord query should be rejected when it
passes non-prepared switches. At least it is with master branch.

>  sashak> Is this reproducible failure?
> 
> We've hit it twice - first time cores were disabled, so I only know
> what opensm died in get_lash_id() but I don't know where it was called
> from. And this is the second time.

Would be interesting to know in which OpenSM state it happens.

Could you send me the core file and exact git tree hash? I would like
to investigate this deeper.

Sasha



More information about the general mailing list