[openib-general] nightly osm_sim report 2006-12-14:normal completion

Hal Rosenstock halr at voltaire.com
Thu Dec 14 12:48:11 PST 2006


On Thu, 2006-12-14 at 15:24, Eitan Zahavi wrote:
> Hal Rosenstock wrote:
> > On Thu, 2006-12-14 at 14:53, Eitan Zahavi wrote:
> >   
> >> Update on analysis of failures:
> >>
> >> Eitan Zahavi wrote:
> >>     
> >>> Hal Rosenstock wrote:
> >>>   
> >>>       
> >>>> Hi Eitan,
> >>>>
> >>>> On Thu, 2006-12-14 at 02:11, Eitan Zahavi wrote:
> >>>>   
> >>>>     
> >>>>         
> >>>>> OSM Simulation Regression Summary
> >>>>> OpenSM rev = ____  
> >>>>> ibutils rev = ____  
> >>>>> Total=264 Pass=261 Fail=3
> >>>>>
> >>>>> Pass:
> >>>>> 36 Stability IS1-16.topo
> >>>>> 36 Pkey IS1-16.topo
> >>>>> 36 Multicast IS1-16.topo
> >>>>> 36 LidMgr IS1-16.topo
> >>>>> 35 OsmStress IS1-16.topo
> >>>>> 12 Stability IS3-loop.topo
> >>>>> 12 Stability IS3-128.topo
> >>>>> 12 Pkey IS3-128.topo
> >>>>> 12 OsmStress IS3-128.topo
> >>>>> 12 Multicast IS3-loop.topo
> >>>>> 11 Multicast IS3-128.topo
> >>>>> 11 LidMgr IS3-128.topo
> >>>>>
> >>>>> Failures:
> >>>>> 1 OsmStress IS1-16.topo
> >>>>>       
> >>>>>           
> >> Job was killed in the middle. Just an accident.
> >>     
> >
> > Is that always the case ? This one has been consistently failing.
> > I think you had written something about this failure back in July. I can
> > dig it out if you want.
> >
> >   
> >>>>> 1 Multicast IS3-128.topo
> >>>>>       
> >>>>>           
> >> A single packet was dropped on the way to the SM. Still not clear where.
> >> However, I have seen a perfectly good link reported by the drop manager 
> >> as missing.
> >>     
> >
> > I think I may have seen this as well on some rare occasions. I could
> > never figure out why this happened.
> >
> >   
> >> I will rerun some tests with valgrind as  I think this might be a memory 
> >> corruption issue.
> >>     
> >
> > OK.
> >
> >   
> >>>>> 1 LidMgr IS3-128.topo
> >>>>>       
> >>>>>           
> >> Seems like the last sweep started before the last change in LID was 
> >> made. So it missed one of the nodes.
> >> Additional sweep was enforced at the end of the test - just to make sure 
> >> all changes are handled.
> >>     
> >
> > So is this being reported as a failure improperly then ?
> >   
> Well the test failed. The fix was committed.

Which fix ? Are you referring to the one Yevgeny just sent ?

-- Hal

>  We will see in the next few 
> days if it is really a false alarm.
> > -- Hal
> >
> >   
> >>>>>     
> >>>>>       
> >>>>>           
> >>>> There are now 2 more failures. You had previously explained OsmStress
> >>>> failure as needing more investigation. Now there is a Multicast and
> >>>> LidMgr failure yet nothing really changed since the previous run the
> >>>> night before. Are these new tests ? What were the failures ?
> >>>>   
> >>>>     
> >>>>         
> >>> The tests use random seeds and thus can catch other bugs in each run.
> >>> I am investigating these failures. Some might be due to bugs in the 
> >>> checker code too.
> >>>
> >>> Please pay attention the failure rate is low (LidMgr pass 36+11 runs 
> >>> failed 1 test).
> >>> This to imply the bug is a hard to find one.
> >>>   
> >>>       
> >>>> The repetitions have also been reduced from previous reports. Are these
> >>>> the same or different tests ?
> >>>>   
> >>>>     
> >>>>         
> >>> Number of repetitions depends on runtime. The regression started later 
> >>> thus run less iterations.
> >>> I run the "same" tests ("same" means same code not same random sequence).
> >>>   
> >>>       
> >>>> -- Hal
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> openib-general mailing list
> >>>> openib-general at openib.org
> >>>> http://openib.org/mailman/listinfo/openib-general
> >>>>
> >>>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>>>   
> >>>>     
> >>>>         
> >>> _______________________________________________
> >>> openib-general mailing list
> >>> openib-general at openib.org
> >>> http://openib.org/mailman/listinfo/openib-general
> >>>
> >>> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >>>   
> >>>       
> >
> >
> > _______________________________________________
> > openib-general mailing list
> > openib-general at openib.org
> > http://openib.org/mailman/listinfo/openib-general
> >
> > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
> >   
> 





More information about the general mailing list