[ofw] RE: HPC head-node slow down when OpenSM is started on the head-node (RC4, svn.1691).
Leonid Keller
leonid at mellanox.co.il
Tue Oct 28 01:45:31 PDT 2008
see inline
> -----Original Message-----
> From: Smith, Stan [mailto:stan.smith at intel.com]
> Sent: Tuesday, October 28, 2008 2:11 AM
> To: Tzachi Dar; Leonid Keller; Fab Tillier
> Cc: Ishai Rabinovitz; ofw at lists.openfabrics.org
> Subject: RE: HPC head-node slow down when OpenSM is started
> on the head-node (RC4,svn.1691).
>
> Tzachi Dar wrote:
> > Hi,
> >
> > This is a bug that can come from a few reasons (see bellow) and we
> > have been able to reproduce it here although we are still having
> > different issues. If you can find out how to reproduce it without
> > another SM that could be great.
>
> The easy way to reproduce the HPC head-node failure is to do
> an install of WinOF RC4 and select the 'OpenSM Started'
> feature. As soon as the .msi installation is finished, you
> are in the slow-down mode.
>
>
> >
> > As to your problem: Generally speaking the code is stacked at
> > ipoib_port_up. There is an assumption that ipoib_port_up is called
> > after ipoib_port_down. 99% that you have found a flow in
> which this is
> > not the case. On port down we close the QPs so that all
> packets that
> > have been placed for receive are freed.
>
> Is there a possible race condition where Receive buffers can
> be posted before the port_up() PNP call fires?
>
> Actually the code almost seems incorrect w.r.t.
> recv_mgr.depth, in that the orginal code wants recv_mgr.depth
> == 0, where the initial startup value is
> p_port->p_adapter->params.rq_depth?
>
> >> while( p_port->recv_mgr.depth || p_port->send_mgr.depth )
> >> cl_thread_suspend( 0 );
>
(Leonid) as far as I see, params.rq_depth is the max value for
recv_mgr.depth.
The receive queue gets refilled in __recv_mgr_repost() till this depth.
> becomes
> // wait for sends to finish and all recv buffers reposted.
> while( (p_port->recv_mgr.depth !=
> p_port->p_adapter->params.rq_depth)
> || p_port->send_mgr.depth )
> cl_thread_suspend( 0 );
>
> >
> > Taking one step up, it seems that we are having a problem
> in the plug
> > and play mechanism (IBAL). In general you are being called from the
> > function __ipoib_pnp_cb. This function is quite complicated, as it
> > takes into account the current state of ipoib, and also the
> different
> > events.
> >
> > In order for me to have more information, pleases send me a
> log with
> > printing at the following places:
> > Please add their the following print at __ipoib_pnp_cb (As soon as
> > possibale):
> > IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_PNP,
> > ("p_pnp_rec->pnp_event = 0x%x (%s) object state %s\n",
> > p_pnp_rec->pnp_event, ib_get_pnp_event_str(
> > p_pnp_rec->pnp_event ), ib_get_pnp_event_str(Adapter->state)) );
> >
> > On the exit from this function please also print the same
> line (I want
> > to see the state changes).
> >
> > Please add a printing in ipoib_port_up, ipoib_port_down ,
> >
> > And send us the log. I hope that I'll be able to figure
> what is going
> > on there.
>
> Will do so.
>
>
> >
> > And again a simple repro will help (probably even more).
> >
> > Thanks
> > Tzachi
> >
> >
> >
> >> -----Original Message-----
> >> From: Smith, Stan [mailto:stan.smith at intel.com]
> >> Sent: Monday, October 27, 2008 10:14 PM
> >> To: Smith, Stan; Tzachi Dar; Leonid Keller; Fab Tillier
> >> Cc: Ishai Rabinovitz; ofw at lists.openfabrics.org
> >> Subject: RE: HPC head-node slow down when OpenSM is started on the
> >> head-node (RC4,svn.1691).
> >>
> >> Hello,
> >> Further debug operations with src mods, I see the
> offending call to
> >> ipoib_port_up() with a valid port_p pointer. Where I see
> the failure
> >> is in ipoib_port.c @ ~line 1584
> >>
> >> /* Wait for all work requests to get flushed. */
> >> while( p_port->recv_mgr.depth || p_port->send_mgr.depth )
> >> cl_thread_suspend( 0 );
> >>
> >> recv_mgr.depth == 512, send_mgr.depth = 0
> >>
> >> Thoughts on why recv_mgr.depth would be so high?
> >> What is preventing the recv mgr from processing work requests?
> >>
> >>
> >> Bus_pnp.c
> >> ExAcquireFastMutexUnsafe() --> ExAcquireFastMutex() +
> ExRelease....
> >>
> >> Ipoib_port.c add_local locking added
> >>
> >> --- ipoib_port.c 2008-10-27 10:18:44.882358400 -0700
> >> +++ ipoib_port.c.new 2008-10-27 13:06:11.021042300 -0700 @@
> >> -5303,9 +5303,7 @@ }
> >>
> >> /* __endpt_mgr_insert expects *one* reference to
> be held. */
> >> - cl_atomic_inc( &p_port->endpt_rdr );
> >> - status = __endpt_mgr_insert( p_port,
> >> p_port->p_adapter->params.conf_mac, p_endpt );
> >> - cl_atomic_dec( &p_port->endpt_rdr );
> >> + status = __endpt_mgr_insert_locked( p_port,
> >> + p_port->p_adapter->params.conf_mac, p_endpt );
> >> if( status != IB_SUCCESS )
> >> {
> >> IPOIB_PRINT_EXIT( TRACE_LEVEL_ERROR,
> IPOIB_DBG_ERROR,
> >>
> >>
> >> Stan.
> >>
> >> Smith, Stan wrote:
> >>> Hello All,
> >>> Below are the results of snooping around with the debugger
> >>> connected to the head-node which is operating in the
> OpenSM induced
> >>> slow-down mode.
> >>>
> >>> Interesting item is the call to ipoib_port_up() with p_port == 0,
> >>> looks to be a problem; Clobbered stack?
> >>> The captured windbg story is attached.
> >>>
> >>> Possible results of not holding the port lock from __bcast_cb()?
> >>>
> >>> Please advise on further debug.
> >>>
> >>> Stan.
> >>>
> >>> nt!DbgBreakPointWithStatus
> >>> nt!wctomb+0x4cbf
> >>> nt!KeUpdateSystemTime+0x21f (TrapFrame @ fffffa60`022e9840)
> >>> nt!KeReleaseInStackQueuedSpinLock+0x2d
> >>> nt!KeDelayExecutionThread+0x72c
> >>> ipoib!ipoib_port_up(struct _ipoib_port * p_port =
> >>> 0x00000000`00000000, struct _ib_pnp_port_rec * p_pnp_rec =
> >>> 0xfffffa60`024be780)+0x79
> >>>
> >>
> [f:\openib-windows-svn\wof2-0\rc4\trunk\ulp\ipoib\kernel\ipoib_port.c
> >>> @ 5186]
> >>> ipoib!__ipoib_pnp_cb(struct _ib_pnp_rec * p_pnp_rec =
> >>> 0xfffffa60`024be780)+0x20d
> >>>
> >>
> [f:\openib-windows-svn\wof2-0\rc4\trunk\ulp\ipoib\kernel\ipoib_adapte
> >> r
> >>> .c
> >>> @ 678]
> >>> ibbus!__pnp_notify_user(struct _al_pnp * p_reg =
> >>> 0xfffffa80`05262d90, struct _al_pnp_context * p_context =
> >>> 0xfffffa60`024be110, struct _al_pnp_ca_event * p_event_rec =
> >>> 0xfffffa80`08b65108)+0x17b
> >>>
> >> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.
> >> c @ 557]
> >>> ibbus!__pnp_process_port_forward(struct _al_pnp_ca_event *
> >>> p_event_rec = 0x00000000`00000000)+0xa6
> >>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
> >>> 1279] ibbus!__pnp_check_ports(struct _al_ci_ca * p_ci_ca =
> >>> 0xfffffa80`04bcc8c0, struct _ib_ca_attr * p_old_ca_attr =
> >>> 0x00000000`00000001)+0x14d
> >>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
> >>> 1371] ibbus!__pnp_check_events(struct _cl_async_proc_item
> * p_item =
> >>> 0xfffffa80`04bc3e98)+0x171
> >>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
> >>> 1566] ibbus!__cl_async_proc_worker(void * context =
> >>> 0xfffffa80`04bc3d60)+0x61
> >>>
> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\cl_async_proc.c
> >>> @ 153] ibbus!__cl_thread_pool_routine(void * context =
> >>> 0xfffffa80`04bc3860)+0x41
> >>>
> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\cl_threadpool.c
> >>> @ 66] ibbus!__thread_callback(struct _cl_thread * p_thread =
> >>> 0x00380031`00430032)+0x28
> >>>
> >>
> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\kernel\cl_thread.
> >>> c
> >>> @ 49]
> >>> nt!ProbeForRead+0xbd3
> >>> nt!_misaligned_access+0x4f6
> >>>
> >>>
> >>>
> >>>
> >>> Smith, Stan wrote:
> >>>> Gentlemen,
> >>>> The HPC head-node slow down is back with a
> vengeance.....instead
> >>>> of only 24% of available Processor cycles we are now up to 31%.
> >>>> Needless to say the system is unusable. Along the path
> of attaching
> >>>> a debugger I've learned the slow-down is only caused by
> OpenSM, as
> >>>> I was able to shutdown OpenSM and the slow-down persisted.
> >>>>
> >>>> The story...
> >>>>
> >>>> A functional 15 node HPC system without OpenSM on the
> head-node; SM
> >>>> supplied by SilverStorm switch with embedded SM.
> >>>> On the head-node, Run Server manager, changing OpenSM
> startup from
> >>>> 'Disabled' to 'manual'. Disconnect Silver Storm IB
> switch as it's
> >>>> daisy-chained to the Mellanox switch which connects all
> HPC nodes;
> >>>> at this point no SM is running on the fabric. From the head-node
> >>>> server manager, 'Start' OpenSM. Wait 20 seconds or so,
> pop open the
> >>>> task manager Performance view - notice large % of CPU
> utilization.
> >>>> Once the system starts running slow....from the head-node server
> >>>> manager, 'Stop' OpenSM. CPU utilization is still high.
> >>>> Reconnect SilverStorm switch + SM.
> >>>> CPU utilization is still high?
> >>>> Going to the head-node debugger, breaking in and showing
> processes
> >>>> and threads revealed little useful info?
> >>>> Debugger command suggestions ?
> >>>> Will try a checked version of ipoib.sys tomorrow.
> >>>>
> >>>> Stan.
> >>>>
> >>>> BTW, I did see a shutdown BSOD with a minidump that showed
> >>>> ipoib!__cl_asynch_processor( 0 ) being the faulting call.
> >>>> Dereferencing the *context is what caused the BSOD.
> >>>>
> >>>>
> >>>> Smith, Stan wrote:
> >>>>> Hello,
> >>>>>
> >>>>> The good news is OpenSM is working nicely on all WinOS flavors.
> >>>>> The not so good news is OpenSM on the head-node of HPC consumes
> >>>>> 25% of the system; win2k8 works fine running OpenSM.
> >>>>>
> >>>>> On our 15 node HPC cluster, if pre_RC4 OpenSM is
> started during a
> >>>>> WinOF install or if opensm is started on the head-node
> after the
> >>>>> WinOF install (OpenSM not started during the install), the
> >>>>> task-bar network icon right-click and selection of Network and
> >>>>> Sharing center fails to reach the Network and Sharing manger.
> >>>>> The best we see is the NSM GUI windows pop open and
> remains blank
> >>>>> (white). The rest of the system is functional in that command
> >>>>> windows are OK, start menu OK but you are certain to
> hang a window
> >>>>> if you access the network via a GUI interface. A command window
> >>>>> can set the IPoIB IPv4 address via net set address and ipconfig
> >>>>> works? <Cntrl-Alt-Del>->resource manager shows about 25% of the
> >>>>> system (4-cores) is running the NT kernel, followed by network
> >>>>> services. I'm guessing massive amounts of system calls from a
> >>>>> driver?
> >>>>>
> >>>>> We first started noticing similar behavior with RC2. Starting
> >>>>> OpenSM during an install always failed (Caused system
> slow-down).
> >>>>> Although if you started OpenSM after the install, the head-node
> >>>>> was OK. RC3 behaved likewise. With pre_RC4 (svn.1661) the
> >>>>> head-node now slows down when OpenSM is started after
> the install
> >>>>> or if OpenSM is started during the WinOF install.
> >>>>>
> >>>>> Again, all other WinOS flavors work fine with OpenSM started
> >>>>> during the install or afterwards. HPC works fine with the
> >>>>> SilverStorm embedded SM switch. I strongly suspect HPC
> head-node
> >>>>> would work fine if OpenSM were run from another Windows/Linux
> >>>>> system.
> >>>>>
> >>>>> Thoughts or suggestions on further diagnosis as to why running
> >>>>> OpenSM causes HPC head-node such a slow down? Part of the story
> >>>>> may have something to do with the number of HPC compute nodes.
> >>>>>
> >>>>> Any chance you could run OpenSM on your HPC head node to see if
> >>>>> you see similar behavior?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Stan.
>
>
More information about the ofw
mailing list