[openib-general] nightly osm_sim report 2006-12-18:normal completion

Eitan Zahavi eitan at mellanox.co.il
Mon Dec 18 05:33:50 PST 2006


Hi Sasha,

The failure analysis takes time and is manual...
The logs and related files are pretty big and will take space to upload.

Today I simulated with OpenSM that was compiled on the side (my bad -
should have incorporated my patches on the clone but I was not sure this
is not going to "contaminate" that git tree forever) with the fixes for
DONE/DONE_PENDING. 

The tests that failed today are actually false violations:
1. The IS1-16 failed due to lack of free sockets to connect to the
server. Still not clear why. I will increase the number of sockets the
client/server try to connect on.
2. The IS3-128 fail due to temporary replacement of the opensm with the
one that have my fixes for DONE/DONE_PENDING. This was a mistake I did
manually by compiling the "clone". As I was watching the log I have
noticed that the same wrong signal was happening.

BTW: The DONE/DONE_PENDING bug was discovered by a change in simulator
dispatcher that I did. The change introduced a BUG that caused the
machine to be overloaded with busy loop in the simulator dispatcher.
Apparently this brought up some different timing and found these bugs.

EZ

> -----Original Message-----
> From: Sasha Khapyorsky [mailto:sashak at voltaire.com]
> Sent: Monday, December 18, 2006 3:31 PM
> To: Eitan Zahavi
> Cc: Eitan Zahavi; Yevgeny Kliteynik; halr at voltaire.com; openib-
> general at openib.org
> Subject: Re: nightly osm_sim report 2006-12-18:normal completion
> 
> Hi Eitan,
> 
> On 13:19 Mon 18 Dec     , Eitan Zahavi wrote:
> > OSM Simulation Regression Summary
> > OpenSM rev = Fri_Dec_15_20:29:07_2006 d5e724 ibutils rev =
> > Thu_Dec_14_21:48:18_2006 fd82d4 MOD_FILES=1
> > Total=221 Pass=219 Fail=2
> >
> > Pass:
> > 31 LidMgr IS1-16.topo
> > 30 Stability IS1-16.topo
> > 30 Pkey IS1-16.topo
> > 30 Multicast IS1-16.topo
> > 29 OsmStress IS1-16.topo
> > 10 Stability IS3-loop.topo
> > 10 Stability IS3-128.topo
> > 10 Pkey IS3-128.topo
> > 10 Multicast IS3-loop.topo
> > 10 Multicast IS3-128.topo
> > 10 LidMgr IS3-128.topo
> > 9 OsmStress IS3-128.topo
> >
> > Failures:
> > 1 OsmStress IS3-128.topo
> > 1 OsmStress IS1-16.topo
> 
> Is it possible to have more details about failures (in case when it is
real
> failures)? Probably to upload the logs to somewhere?
> 
> Sasha




More information about the general mailing list