[openib-general] OpenSM Debug

Tue Nov 22 10:29:42 PST 2005

On Sun, Nov 20, 2005 at 09:18:27AM -0800, Fab Tillier wrote:
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Sunday, November 20, 2005 4:59 AM
> > 
> > Hi Fab,
> > 
> > On Sat, 2005-11-19 at 13:50, Fab Tillier wrote:
> > >
> > > That's correct - structure definitions change between the debug and
> > > release builds of complib.  The code above is there because in Linux,
> > > the library created by complib has the same name in debug and release
> > > builds, so it is possible to have a mismatch between the type of
> > > build for opensm and complib.  In Windows, I solved this by adding a
> > > debug-only suffix to the library name (complibd vs. complib) so that
> > > the risk of linkage errors is eliminated.  I have suggested in the
> > > past that the Linux complib adopt a similar naming scheme and
> > > that doing runtime checks for linkage errors was indicative of a
> > > poor design.
> > >
> > > This has been the basis for me pushing back on adding the
> > > cl_is_debug function to the Windows version of complib.
> > 
> > Is there a convention for naming debug libraries in Linux ?
> 
> I'm no Linux expert, so I have no clue here.  Perhaps the C libraries already
> have some method?
> 
> > Is there any reason why the 2 versions of the libraries (with different
> > names) shouldn't be allowed concurrently to exist and just link with the
> > desired one ?
> 
> There is none that I can think of.  In fact, the Windows drivers allow both the
> debug and release versions of the user-mode components to co-exist, as well as
> mixing debug and release kernel drivers.  This makes it easy to debug a single
> component without affecting timings in the whole stack.

How much timing overhead does debug add anyway? Based on what I saw at
supercomputing, OpenSM spent more time in the kernel and doing context
/thread switches than actually doing a lot of computations.

At the moment, I'd prefer that debug was enabled by default, and we had
a way to dump a stack trace and restart if something asserted. I'm going
to speculate that in 99% of the cases, a debug build on a PC will have
no trouble. For those really large clusters, people that know what they
are doing can enable optimizations.