[openib-general] Kernel Oops in user-mad, mad
Jack Morgenstein
jackm at dev.mellanox.co.il
Tue Oct 3 06:46:03 PDT 2006
On Tuesday 03 October 2006 12:46, Hal Rosenstock wrote:
> On Tue, 2006-10-03 at 03:46, Jack Morgenstein wrote:
> > On Sunday 01 October 2006 13:14, Michael S. Tsirkin wrote:
> > > Quoting r. Jack Morgenstein <jackm at dev.mellanox.co.il>:
> > > > Subject: Kernel Oops in user-mad, mad
> > > >
> > > > We received the following kernel Oops while running regression
> > > > (see console picture attached).
> > > >
> > > > This looks like a possible race condition between handling umad send completions
> > > > and ib_unregister_mad_agent.
> > > >
> > > > The Oops is at the list_del line of dequeue_send (user_mad.c: 186)
> > > > Note that ib_unregister_mad_agent invokes unregister_mad_agent->cancel_mads -> agent send handler.
> > > >
> > > > Is there a possibility that there is a double deletion from a list somewhere?
> > > >
> > > > Jack
> > > >
> > > >
> > > >
> > >
> > > Was this during module unload?
> > No.
>
> What caused the ib_unregister_mad_agent routine to be invoked ? Was
> OpenSM shutting down when this occurred ? Can you provide any more
> details on the scenario which caused this ?
>
> -- Hal
This was during the testing of MPI. Opensm is invoked once (also shut down)
before running an MPI test;
Evidently, this occurred between MPI tests. We don't have any info beyond this.
- Jack
More information about the general
mailing list