[ofa-general] opensm dumps core when using LASH for routing
Sasha Khapyorsky
sashak at voltaire.com
Sun Jan 13 12:17:47 PST 2008
On 22:25 Sun 13 Jan , Max Matveev wrote:
> >>>>> "sashak" == Sasha Khapyorsky writes:
>
> sashak> I suspect that the failure scenario is different. This switch
> sashak> was just connected/discovered by OpenSM (it has hops = 0x0
> sashak> yet - this indicates that it does not pass lid matrix
> sashak> generation stage yet) and it still be uninitialized by
> sashak> LASH. If it is really so checking ->priv for NULL looks like
> sashak> valid fix.
>
> Should opensm ignore requests while it's initializing?
It is initialized, except a newly added switch.
I did some tests today in order to reproduce the failure with simulator,
but without big success - PathRecord query should be rejected when it
passes non-prepared switches. At least it is with master branch.
> sashak> Is this reproducible failure?
>
> We've hit it twice - first time cores were disabled, so I only know
> what opensm died in get_lash_id() but I don't know where it was called
> from. And this is the second time.
Would be interesting to know in which OpenSM state it happens.
Could you send me the core file and exact git tree hash? I would like
to investigate this deeper.
Sasha
More information about the general
mailing list