[openib-general] Announce: preview RPMs for FC-4 and RHEL-4available

Hal Rosenstock halr at voltaire.com
Sat Nov 19 06:39:43 PST 2005


On Fri, 2005-11-18 at 23:12, Thomas Moschny wrote:
> Thomas Moschny wrote:
> > Exiting SM
> >
> > *** glibc detected *** double free or corruption (!prev): 0x6000000000067970 ***
> > Aborted

On what processor architecture is opensm running ?

Note that some better handling of opensm exiting went in at r3977 which
is slightly past this (r3965).

> > Subsequent runs of opensm hang in flush_cpu_workqueue or rwsem_down_failed_common.

Sounds like something isn't cleanup up properly when the previous
instance exits. After the error, is there an opensm instance still
around ? If so, it wouldn't clean up some MAD registrations.

> Doug Ledford wrote:
> > BTW, can you try forcing opensm to run single threaded on it's first
> > invocation and see if that fixes this?
>
> Did you mean calling opensm with -d1?

That would force single thread mode. You should see something like this
when opensm starts up:

opensm -d1
-------------------------------------------------
OpenSM Rev:openib-1.1.0
Command Line Arguments:
 d level = 0x1
 Debug mode: Forcing Single Thread
 Log File: /var/log/osm.log
-------------------------------------------------
OpenSM Rev:openib-1.1.0

>  Well, currently I can't see any
> consistent behavior, but if called with -d1 on the *second* -o run,

What is the state of the subnet ?

> it doesn't seem to hang (unless there are already some unkillable instances
> on this machine from earlier runs).

Did you check with ps for opensm instances ?

-- Hal






More information about the general mailing list