[openib-general] OpenSM Debug
Troy Benjegerdes
hozer at hozed.org
Tue Nov 22 10:29:42 PST 2005
On Sun, Nov 20, 2005 at 09:18:27AM -0800, Fab Tillier wrote:
> > From: Hal Rosenstock [mailto:halr at voltaire.com]
> > Sent: Sunday, November 20, 2005 4:59 AM
> >
> > Hi Fab,
> >
> > On Sat, 2005-11-19 at 13:50, Fab Tillier wrote:
> > >
> > > That's correct - structure definitions change between the debug and
> > > release builds of complib. The code above is there because in Linux,
> > > the library created by complib has the same name in debug and release
> > > builds, so it is possible to have a mismatch between the type of
> > > build for opensm and complib. In Windows, I solved this by adding a
> > > debug-only suffix to the library name (complibd vs. complib) so that
> > > the risk of linkage errors is eliminated. I have suggested in the
> > > past that the Linux complib adopt a similar naming scheme and
> > > that doing runtime checks for linkage errors was indicative of a
> > > poor design.
> > >
> > > This has been the basis for me pushing back on adding the
> > > cl_is_debug function to the Windows version of complib.
> >
> > Is there a convention for naming debug libraries in Linux ?
>
> I'm no Linux expert, so I have no clue here. Perhaps the C libraries already
> have some method?
>
> > Is there any reason why the 2 versions of the libraries (with different
> > names) shouldn't be allowed concurrently to exist and just link with the
> > desired one ?
>
> There is none that I can think of. In fact, the Windows drivers allow both the
> debug and release versions of the user-mode components to co-exist, as well as
> mixing debug and release kernel drivers. This makes it easy to debug a single
> component without affecting timings in the whole stack.
How much timing overhead does debug add anyway? Based on what I saw at
supercomputing, OpenSM spent more time in the kernel and doing context
/thread switches than actually doing a lot of computations.
At the moment, I'd prefer that debug was enabled by default, and we had
a way to dump a stack trace and restart if something asserted. I'm going
to speculate that in 99% of the cases, a debug build on a PC will have
no trouble. For those really large clusters, people that know what they
are doing can enable optimizations.
More information about the general
mailing list