[ofw] RE: HPC head-node slow down when OpenSM is started on the head-node (RC4,svn.1691).

Mon Oct 27 17:11:14 PDT 2008

Tzachi Dar wrote:
> Hi,
>
> This is a bug that can come from a few reasons (see bellow) and we
> have been able to reproduce it here although we are still having
> different issues. If you can find out how to reproduce it without
> another SM that could be great.

The easy way to reproduce the HPC head-node failure is to do an install of WinOF RC4 and select the 'OpenSM Started' feature. As soon as the .msi installation is finished, you are in the slow-down mode.

>
> As to your problem: Generally speaking the code is stacked at
> ipoib_port_up. There is an assumption that ipoib_port_up is called
> after ipoib_port_down. 99% that you have found a flow in which this
> is not the case. On port down we close the QPs so that all packets
> that have been placed for receive are freed.

Is there a possible race condition where Receive buffers can be posted before the port_up() PNP call fires?

Actually the code almost seems incorrect w.r.t. recv_mgr.depth, in that the orginal code wants recv_mgr.depth == 0, where the initial startup value is p_port->p_adapter->params.rq_depth?

>>         while( p_port->recv_mgr.depth || p_port->send_mgr.depth )
>>                 cl_thread_suspend( 0 );

becomes
        // wait for sends to finish and all recv buffers reposted.
         while( (p_port->recv_mgr.depth != p_port->p_adapter->params.rq_depth)
                        || p_port->send_mgr.depth )
                 cl_thread_suspend( 0 );

>
> Taking one step up, it seems that we are having a problem in the plug
> and play mechanism (IBAL). In general you are being called from the
> function __ipoib_pnp_cb. This function is quite complicated, as it
> takes into account the current state of ipoib, and also the different
> events.
>
> In order for me to have more information, pleases send me a log with
> printing at the following places:
> Please add their the following print at __ipoib_pnp_cb
> (As soon as possibale):
>       IPOIB_PRINT( TRACE_LEVEL_ERROR, IPOIB_DBG_PNP,
>               ("p_pnp_rec->pnp_event = 0x%x (%s) object state %s\n",
>               p_pnp_rec->pnp_event, ib_get_pnp_event_str(
> p_pnp_rec->pnp_event ), ib_get_pnp_event_str(Adapter->state)) );
>
> On the exit from this function please also print the same line (I want
> to see the state changes).
>
> Please add a printing in ipoib_port_up, ipoib_port_down ,
>
> And send us the log. I hope that I'll be able to figure what is going
> on there.

Will do so.

>
> And again a simple repro will help (probably even more).
>
> Thanks
> Tzachi
>
>
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Monday, October 27, 2008 10:14 PM
>> To: Smith, Stan; Tzachi Dar; Leonid Keller; Fab Tillier
>> Cc: Ishai Rabinovitz; ofw at lists.openfabrics.org
>> Subject: RE: HPC head-node slow down when OpenSM is started
>> on the head-node (RC4,svn.1691).
>>
>> Hello,
>>  Further debug operations with src mods, I see the offending
>> call to ipoib_port_up() with a valid port_p pointer. Where I
>> see the failure is in ipoib_port.c @ ~line 1584
>>
>>         /* Wait for all work requests to get flushed. */
>>         while( p_port->recv_mgr.depth || p_port->send_mgr.depth )
>>                 cl_thread_suspend( 0 );
>>
>> recv_mgr.depth == 512, send_mgr.depth = 0
>>
>> Thoughts on why recv_mgr.depth would be so high?
>> What is preventing the recv mgr from processing work requests?
>>
>>
>> Bus_pnp.c
>>   ExAcquireFastMutexUnsafe() --> ExAcquireFastMutex() + ExRelease....
>>
>> Ipoib_port.c add_local locking added
>>
>> --- ipoib_port.c        2008-10-27 10:18:44.882358400 -0700
>> +++ ipoib_port.c.new    2008-10-27 13:06:11.021042300 -0700 @@
>>         -5303,9 +5303,7 @@ }
>>
>>         /* __endpt_mgr_insert expects *one* reference to be held. */
>> -       cl_atomic_inc( &p_port->endpt_rdr );
>> -       status = __endpt_mgr_insert( p_port,
>> p_port->p_adapter->params.conf_mac, p_endpt );
>> -       cl_atomic_dec( &p_port->endpt_rdr );
>> +       status = __endpt_mgr_insert_locked( p_port,
>> + p_port->p_adapter->params.conf_mac, p_endpt );
>>         if( status != IB_SUCCESS )
>>         {
>>                 IPOIB_PRINT_EXIT( TRACE_LEVEL_ERROR, IPOIB_DBG_ERROR,
>>
>>
>> Stan.
>>
>> Smith, Stan wrote:
>>> Hello All,
>>>   Below are the results of snooping around with the debugger
>>> connected to the head-node which is operating in the OpenSM induced
>>> slow-down mode.
>>>
>>> Interesting item is the call to ipoib_port_up() with p_port == 0,
>>> looks to be a problem; Clobbered stack?
>>> The captured windbg story is attached.
>>>
>>> Possible results of not holding the port lock from __bcast_cb()?
>>>
>>> Please advise on further debug.
>>>
>>> Stan.
>>>
>>> nt!DbgBreakPointWithStatus
>>> nt!wctomb+0x4cbf
>>> nt!KeUpdateSystemTime+0x21f (TrapFrame @ fffffa60`022e9840)
>>> nt!KeReleaseInStackQueuedSpinLock+0x2d
>>> nt!KeDelayExecutionThread+0x72c
>>> ipoib!ipoib_port_up(struct _ipoib_port * p_port =
>>> 0x00000000`00000000, struct _ib_pnp_port_rec * p_pnp_rec =
>>> 0xfffffa60`024be780)+0x79
>>>
>> [f:\openib-windows-svn\wof2-0\rc4\trunk\ulp\ipoib\kernel\ipoib_port.c
>>> @ 5186]
>>> ipoib!__ipoib_pnp_cb(struct _ib_pnp_rec * p_pnp_rec =
>>> 0xfffffa60`024be780)+0x20d
>>>
>> [f:\openib-windows-svn\wof2-0\rc4\trunk\ulp\ipoib\kernel\ipoib_adapter
>>> .c
>>> @ 678]
>>> ibbus!__pnp_notify_user(struct _al_pnp * p_reg =
>>> 0xfffffa80`05262d90, struct _al_pnp_context * p_context =
>>> 0xfffffa60`024be110, struct _al_pnp_ca_event * p_event_rec =
>>> 0xfffffa80`08b65108)+0x17b
>>>
>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.
>> c @ 557]
>>> ibbus!__pnp_process_port_forward(struct _al_pnp_ca_event *
>>> p_event_rec = 0x00000000`00000000)+0xa6
>>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
>>> 1279] ibbus!__pnp_check_ports(struct _al_ci_ca * p_ci_ca =
>>> 0xfffffa80`04bcc8c0, struct _ib_ca_attr * p_old_ca_attr =
>>> 0x00000000`00000001)+0x14d
>>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
>>> 1371] ibbus!__pnp_check_events(struct _cl_async_proc_item * p_item
>>> = 0xfffffa80`04bc3e98)+0x171
>>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\al\kernel\al_pnp.c @
>>> 1566] ibbus!__cl_async_proc_worker(void * context =
>>> 0xfffffa80`04bc3d60)+0x61
>>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\cl_async_proc.c
>>> @ 153] ibbus!__cl_thread_pool_routine(void * context =
>>> 0xfffffa80`04bc3860)+0x41
>>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\cl_threadpool.c
>>> @ 66] ibbus!__thread_callback(struct _cl_thread * p_thread =
>>> 0x00380031`00430032)+0x28
>>>
>> [f:\openib-windows-svn\wof2-0\rc4\trunk\core\complib\kernel\cl_thread.
>>> c
>>> @ 49]
>>> nt!ProbeForRead+0xbd3
>>> nt!_misaligned_access+0x4f6
>>>
>>>
>>>
>>>
>>> Smith, Stan wrote:
>>>> Gentlemen,
>>>>   The HPC head-node slow down is back with a vengeance.....instead
>>>> of only 24% of available Processor cycles we are now up to 31%.
>>>> Needless to say the system is unusable. Along the path of
>>>> attaching a debugger I've learned the slow-down is only caused by
>>>> OpenSM, as I was able to shutdown OpenSM and the slow-down
>>>> persisted.
>>>>
>>>> The story...
>>>>
>>>> A functional 15 node HPC system without OpenSM on the head-node; SM
>>>> supplied by SilverStorm switch with embedded SM.
>>>> On the head-node, Run Server manager, changing OpenSM startup from
>>>> 'Disabled' to 'manual'. Disconnect Silver Storm IB switch as it's
>>>> daisy-chained to the Mellanox switch which connects all HPC nodes;
>>>> at this point no SM is running on the fabric. From the head-node
>>>> server manager, 'Start' OpenSM. Wait 20 seconds or so, pop open
>>>> the task manager Performance view - notice large % of CPU
>>>> utilization.
>>>> Once the system starts running slow....from the head-node server
>>>> manager, 'Stop' OpenSM. CPU utilization is still high.
>>>> Reconnect SilverStorm switch + SM.
>>>> CPU utilization is still high?
>>>> Going to the head-node debugger, breaking in and showing processes
>>>> and threads revealed little useful info?
>>>> Debugger command suggestions ?
>>>> Will try a checked version of ipoib.sys tomorrow.
>>>>
>>>> Stan.
>>>>
>>>> BTW, I did see a shutdown BSOD with a minidump that showed
>>>> ipoib!__cl_asynch_processor( 0 ) being the faulting call.
>>>> Dereferencing the *context is what caused the BSOD.
>>>>
>>>>
>>>> Smith, Stan wrote:
>>>>> Hello,
>>>>>
>>>>> The good news is OpenSM is working nicely on all WinOS flavors.
>>>>> The not so good news is OpenSM on the head-node of HPC consumes
>>>>> 25% of the system; win2k8 works fine running OpenSM.
>>>>>
>>>>> On our 15 node HPC cluster, if pre_RC4 OpenSM is started during a
>>>>> WinOF install or if opensm is started on the head-node after the
>>>>> WinOF install (OpenSM not started during the install), the
>>>>> task-bar network icon right-click and selection of Network and
>>>>> Sharing center fails to reach the Network and Sharing manger.
>>>>> The best we see is the NSM GUI windows pop open and remains blank
>>>>> (white). The rest of the system is functional in that command
>>>>> windows are OK, start menu OK but you are certain to hang a window
>>>>> if you access the network via a GUI interface. A command window
>>>>> can set the IPoIB IPv4 address via net set address and ipconfig
>>>>> works? <Cntrl-Alt-Del>->resource manager shows about 25% of the
>>>>> system (4-cores) is running the NT kernel, followed by network
>>>>> services. I'm guessing massive amounts of system calls from a
>>>>> driver?
>>>>>
>>>>> We first started noticing similar behavior with RC2. Starting
>>>>> OpenSM during an install always failed (Caused system slow-down).
>>>>> Although if you started OpenSM after the install, the head-node
>>>>> was OK. RC3 behaved likewise. With pre_RC4 (svn.1661) the
>>>>> head-node now slows down when OpenSM is started after the install
>>>>> or if OpenSM is started during the WinOF install.
>>>>>
>>>>> Again, all other WinOS flavors work fine with OpenSM started
>>>>> during the install or afterwards. HPC works fine with the
>>>>> SilverStorm embedded SM switch. I strongly suspect HPC head-node
>>>>> would work fine if OpenSM were run from another Windows/Linux
>>>>> system.
>>>>>
>>>>> Thoughts or suggestions on further diagnosis as to why running
>>>>> OpenSM causes HPC head-node such a slow down? Part of the story
>>>>> may have something to do with the number of HPC compute nodes.
>>>>>
>>>>> Any chance you could run OpenSM on your HPC head node to see if
>>>>> you see similar behavior?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Stan.