[openib-general] Kernel Oops in user-mad, mad

Jack Morgenstein jackm at dev.mellanox.co.il
Tue Oct 3 06:46:03 PDT 2006


On Tuesday 03 October 2006 12:46, Hal Rosenstock wrote:
> On Tue, 2006-10-03 at 03:46, Jack Morgenstein wrote:
> > On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote:
> > > Quoting r. Jack Morgenstein <jackm at dev.mellanox.co.il>:
> > > > Subject: Kernel Oops in user-mad, mad
> > > > 
> > > > We received the following kernel Oops while running regression
> > > > (see console picture attached).
> > > > 
> > > > This looks like a possible race condition between handling umad send completions
> > > > and ib_unregister_mad_agent.
> > > > 
> > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186)
> > > > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler.
> > > > 
> > > > Is there a possibility that there is a double deletion from a list somewhere?
> > > > 
> > > > Jack
> > > > 
> > > > 
> > > > 
> > > 
> > > Was this during module unload?
> > No.
> 
> What caused the ib_unregister_mad_agent routine to be invoked ? Was
> OpenSM shutting down when this occurred ? Can you provide any more
> details on the scenario which caused this ?
> 
> -- Hal

This was during the testing of MPI.  Opensm is invoked once (also shut down) 
before running an MPI test;
Evidently, this occurred between MPI tests. We don't have any info beyond this.

- Jack




More information about the general mailing list