[ofa-general] Re: pkey.sim.tcl

Sasha Khapyorsky sashak at voltaire.com
Sun Jul 29 09:47:47 PDT 2007


Hi Eitan,

On 12:11 Sun 29 Jul     , Eitan Zahavi wrote:
> Regarding the test :
> Once I will know the exact condition causing a full re-sweep I would use
> it in the test.
> In OFED 1.2 it was enough to set one switch ChangeBit to force a full
> reconfiguration.

You can set PortState where pkey table was modified to INIT and this will
trigger update.

> Regarding incremental flow in general:
> 1. Yes - it is good.

Ok.

> 2. But we must make sure it is robust enough that we do not loose some
> nodes or functionality 
>     under extreme cases of reboot or HW errors.

Testing reports are welcomed (as usual). I'm testing too.

> 3. We should have a way to force a full sweep without killing the SM:
> As the size of the clusters grow there is a growing chance that "soft
> errors" will hit the devices.
> Most of the device memory is guarded and would be auto detected if
> affected. 
> However I think it is wise to allow for the user to force full
> reconfiguration without making the SM "go away".

We can add config option to force update unconditionally. Would it be
sufficient?

> Regarding OpenSM does not respond to SA queries during sweep:
> It is due to the fact there is no "double buffer" for the internal DB.
> So whenever the SM starts a sweep the SA will see an "empty" DB. 

Specific problem was due to fact that OpenSM DB is in "locked" state
most of the time during sweep and SA is waiting to get access.

> The solution for that problem may be having a "previous" DB during
> sweeps. 
> I suspect using that approach will also enable a fine grain incremental
> capability too.

I agree, this could be good direction too. As well as some others like
more granular locking etc..

Sasha



More information about the general mailing list