[ofw] RE: ibbus - Control Device Object - bugzilla #1367

Sun Apr 26 10:44:21 PDT 2009

Hello,
  The WDTF 'cscript Disable_Enable_With_IO.wsf' was illuminating in that it provided more evidence that all is not well w.r.t. multiple HCA race conditions within the MAD layer.

The cscript Disable_Enable_With_IO.wsf runs for 2, sometimes 3 disable/enable cycles before the NPPAGEDLOOKASIDE list pool 'mad_send' sometimes 'mad_stack' become corrupted and crash on an ExAllocateFromNPagedLookasideList call.
'Devcon enable/disable =InfinibandController' after the 2nd or 3rd disable/enable cycle will also induce the corrupted MAD pool crash on ExAllocateFromNPagedLookasideList.

The corrupted MAD pool crash always occurs during the enable phase on the 2nd HCA, after the 1st HCA has been enabled, never the 1st HCA enable?

Speculation is the 1st HCA is processing MADs while the 2nd HCA is setting up to process MADs and does a MAD pool get which crashes as the MAD pool has become corrupted some how.

At first I suspected stopping IBAL when disabling the last HCAs was the problem; IBAL restarted when next HCA is enabled. In order to test this theory, I removed the al_cleanup() call when releasing the last HCA. Now AL is started once and not stopped until driver unload or when the system is shut down. Unfortunately the corrupted MAD pool crashes remain.

It appears that stopping and starting IBAL is not the immediate problem.
Starting IBAL once is a simpler debug situation so I stayed with this approach.

After looking at the al_mad_pool.c I failed to find any specific thread serialization over the MAD pool get(ExAllocateFromNPagedLookasideList) and put(ExFreeToNPagedLookasideList) calls?
Reading MS docs on NPagedLookasideList calls I did not find any references to the calls being serialized internally?
With multiple HCAs do we have serialization issues with MAD get/put calls from both HCAs when processing MADs?
As an experiment to prove or disprove the lack of MAD pool thread serialization theory I added thread serialization (mutex acquire/release) calls bracketing ExAllocateFromNPagedLookasideList and ExFreeToNPagedLookasideList when they are called from MAD get or put routines.

Separately trying WaitForSingleEvent and then GuardedMutex calls bracketing MAD pool ExAllocateFromNPagedLookasideList and ExFreeToNPagedLookasideList both failed as neither type of mutex allowed the system to boot? I rechecked the mutex alloc/release code; it's not that complicated. I need to do more crash analysis here although the bottom line result was serialization hurts more than it could ever help in this case.

Booting with HCAs disabled was enough to attempt disable/enable experiments.  The same MAD pool corruption crash as without the mutex serialization occurred on the 2nd HCA enable cycle  when the 2nd HCA was enabling.
Lessons learned:
 1) MAD layer is fragil w.r.t. thread timing
 2) pool corruption problems may not caused by multiple simultaneous MAD get/put calls?

On reviewing IBAL PNP CA removal code, I did not find the specific case of waiting for HCA posted recv MADs to be flushed and returned to the MAD pool prior to IBAL completing the CA removal function?
Where does this posted recv MAD flush-wait and MAD return to pool occur?

Can you offer any suggestions on how to debug the corrupted MAD pool problems?

Thanks,

Stan.

Leonid Keller wrote:
> Thank you a lot.
> I'll test the patches.
> Some questions.
> I understood that you are closing IBAL on power down like low-level
> driver does with HCA.
> Are you opening it up back on power up ?
> Have you tried the patched driver in standby/hibernate scenarios ?
> It is a must for WHQL.
> We have now problems with some clients which are performing WHQL
> system tests on our December-released version.
> Especially in Common_Scenario_Stress and Disable_Enable tests which
> perform disable/enable and power down/up sequences.
> Have you ever run them on the driver ?
> While WHQL-ing they used to be launched from DTM, but they are also
> found in WDK - \WinDDK\6001.18001\tools\WDTF\amd64fre\SampleScripts -
> and can be run manually. The instructions how to run are found in the
> script itself.
> Shortly, one has first to install WDTF:
>    \WinDDK\6001.18001\tools\WDTF\amd64fre\InstallWDTF.cmd
> and then to invoke
>    cscript Common_Scenario_Stress_With_IO.wsf
> or
>    cscript Disable_Enable_With_IO.wsf
>
> It would be great if you could test the patches in some ways of the
> above.
>
>> -----Original Message-----
>> From: Smith, Stan [mailto:stan.smith at intel.com]
>> Sent: Monday, April 20, 2009 10:41 PM
>> To: Leonid Keller
>> Cc: ofw at lists.openfabrics.org
>> Subject: ibbus - Control Device Object - bugzilla #1367
>>
>>
>> Hello,
>>   Please review/test-drive these files and see how they work for you.
>> My testing shows the Control Device Object implementation
>> works well over multiple enable/disable cycles (WSD needs to
>> be removed 1st in order to prevent mandatory reboot).
>> Additionally System shutdown now removes the HCA devices and
>> then shuts down IBAL so IBAL-async threads no longer attempt
>> to send/forward MADs on a shutdown HCA.
>>
>> The only problem I find is the ordering of enabling devices
>> exposes what may be a bug in AL MAD pool cleanup.
>> HCAs are numbered 1 & 2, HCA 1 is loaded/enabled 1st, then #2.
>> If you disable #2, then #1 and then enable #2, the AL MAD
>> layer blows up; corrupted LookAsideList, see enclosed file fail.txt.
>>
>> If you disable HCA #2, #1, then enable #1, #2, everything
>> works as expected.
>> Failure currently under investigation.
>>
>> The CDO code is what you forwarded to me last week with the
>> additions of global dos_name & dev_nam plus removing HCAs on
>> system shutdown; bus_driver.c & bus_pnp.c handle this.
>>
>> The bus_port_mgr.c changes are white-space alignment and the
>> use of BUS_TRACE instead of BUS_PRINT.
>>
>> Thanks,
>>
>> Stan.