[ofa-general] RE: pkey.sim.tcl

Eitan Zahavi eitan at mellanox.co.il
Sun Jul 29 02:11:05 PDT 2007


Regarding the test :
Once I will know the exact condition causing a full re-sweep I would use
it in the test.
In OFED 1.2 it was enough to set one switch ChangeBit to force a full
reconfiguration.

Regarding incremental flow in general:
1. Yes - it is good.
2. But we must make sure it is robust enough that we do not loose some
nodes or functionality 
    under extreme cases of reboot or HW errors.
3. We should have a way to force a full sweep without killing the SM:
As the size of the clusters grow there is a growing chance that "soft
errors" will hit the devices.
Most of the device memory is guarded and would be auto detected if
affected. 
However I think it is wise to allow for the user to force full
reconfiguration without making the SM "go away".

Regarding OpenSM does not respond to SA queries during sweep:
It is due to the fact there is no "double buffer" for the internal DB.
So whenever the SM starts a sweep the SA will see an "empty" DB. 
The solution for that problem may be having a "previous" DB during
sweeps. 
I suspect using that approach will also enable a fine grain incremental
capability too.

Eitan


Eitan Zahavi
Senior Engineering Director, Software Architect
Mellanox Technologies LTD
Tel:+972-4-9097208
Fax:+972-4-9593245
P.O. Box 586 Yokneam 20692 ISRAEL

 

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com] 
> Sent: Sunday, July 29, 2007 12:55 AM
> To: Eitan Zahavi
> Cc: Yevgeny Kliteynik; Hal Rosenstock; general at lists.openfabrics.org
> Subject: Re: pkey.sim.tcl
> 
> Hi Eitan,
> 
> On 07:56 Fri 27 Jul     , Eitan Zahavi wrote:
> > > 
> > > On 09:26 Thu 26 Jul     , Eitan Zahavi wrote:
> > > > 
> > > > I am happy you actually use the simulator.
> > > > Please provide more info regarding the failure. You should tar 
> > > > compress the /tmp/ibmgtsim.XXXX of your run.
> > > 
> > > I can send this for you if you want, but the failure is trivial.
> > No need if you already know where the bug is...
> > > 
> > > Yes, and it is due (6), where default Pkey is removed 
> "externally". 
> > > I'm not sure that OpenSM should handle the case when pkey 
> table is 
> > > modified externally by something which is not SM.
> > > 
> > 
> > For a few years it just worked fine. So I wonder why this 
> > fucntionality was removed ?
> > It is a real BAD case where Pkeys are altered but I think would be 
> > wise to "refresh" these tables on heavy seep.
> 
> We discussed how and when port tables refresh should be done 
> just few days ago in this thread. My impression was that we 
> are "in sync" about this.
> 
> > In general it seems OpenSM has lost its "heavy sweep" 
> concept. Now it 
> > does not refresh the fabric setup even on heavy sweep.
> 
> Not on each heavy sweep, but it does when it needed or when 
> data could change. I don't think the concept was changed, 
> just optimized. Let just look at the numbers:
> 
> $ time ./opensm/opensm -e -f ./osm.log -o ...
> SUBNET UP
> Exiting SM
> 
> real    0m7.995s
> user    0m4.488s
> sys     0m6.072s
> 
> $ time ./opensm/opensm -e -f ./osm.log -o --qos ...
> SUBNET UP
> Exiting SM
> 
> real    0m22.521s
> user    0m10.921s
> sys     0m17.173s
> 
> 
> This is simulated runs (with ibsim), the fabric is ~1300 nodes.
> 
> The difference there is '--qos' flag, so OpenSM skips SL2VL 
> and VLArb update in first run and does it in the second - 
> sweep times are 8 against 22 seconds.
> 
> > This is assuming a "perfect" HW and software and I would 
> really this 
> > we should have preserved that capability.
> 
> What about an option? Now with subn->need_update flag (which 
> always enforces updates) it is trivial to implement.
> 
> > Note that a "heavy sweep" does not happen unless somethng 
> changed or 
> > trapped.
> 
> Yes, for example some port was connected/disconnected, some 
> node rebooted, etc.. OpenSM starts huge heavy sweep, it takes 
> a while, SA is not responsive most the time, TCP connection 
> over IPoIB timeouted, applications failed. This is production 
> experiences... :(
> 
> Sasha
> 



More information about the general mailing list