[nvmewin] code review: crash dump & hibernation support

Judy Brock-SSI judy.brock at ssi.samsung.com
Sun Nov 24 08:19:45 PST 2013


Hi Yong,

I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ‘1’.
You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn’t check before calling.

I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure:



* @return BOOLEAN
*     TRUE  - If Adapter is enabled correctly
*     FALSE - If anything goes wrong
******************************************************************************/
BOOLEAN NVMeEnableAdapter(
    PNVME_DEVICE_EXTENSION pAE
)
{
    PQUEUE_INFO pQI = &pAE->QueueInfo;
    NVMe_CONTROLLER_CONFIGURATION CC = {0};
    NVMe_CONTROLLER_STATUS CSTS = {0};
    ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us;
    ULONG PollCount;

    /*
     * Program Admin queue registers before enabling the adapter:
     * Admin Queue Attributes
     */
    StorPortWriteRegisterUlong(
        pAE,
        (PULONG)(&pAE->pCtrlRegister->AQA),
        (pQI->pSubQueueInfo->SubQEntries - 1) +
        ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB));
.
.
.
(further down):

StorPortDebugPrint(INFO, "NVMeEnableAdapter:  Setting EN...\n");
    /*
     * Set up Controller Configuration Register
     */

     /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */

    for (PollCount = 0; PollCount < PollMax; PollCount++) {
        CSTS.AsUlong = StorPortReadRegisterUlong(pAE,
                           (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong));

        if (CSTS.RDY == 0) {
            /* Move on if RDY bit is cleared */
            break;
        }

        NVMeStallExecution(pAE, MAX_STATE_STALL_us);
    }

    if (CSTS.RDY != 0) {
        /* If RDY bit won't clear we can't enable the adapter */
        return FALSE;
    }

    CC.EN = 1;
    CC.CSS = NVME_CC_NVM_CMD;
    CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT);
    CC.AMS = NVME_CC_ROUND_ROBIN;
    CC.SHN = NVME_CC_SHUTDOWN_NONE;
    CC.IOSQES = NVME_CC_IOSQES;
    CC.IOCQES = NVME_CC_IOCQES;

    StorPortWriteRegisterUlong(pAE,
                               (PULONG)(&pAE->pCtrlRegister->CC),
                               CC.AsUlong);
     return TRUE;
} /* NVMeEnableAdapter */


Thanks,
Judy

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen
Sent: Friday, November 22, 2013 1:48 PM
To: Alex Chang; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] code review: crash dump & hibernation support

Hi, everyone,

I hope many are busy testing the changes on your devices. If you have any feedback to share, I’d very appreciate it. Holiday is upon us and we’d like to wrap up this much delayed soon.

Thanks,

Yong

From: Yong Chen
Sent: Wednesday, November 20, 2013 12:52 AM
To: 'Alex Chang'; Uma Parepalli
Cc: 'nvmewin at lists.openfabrics.org'
Subject: RE: [nvmewin] code review: crash dump & hibernation support

Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client).
Object #2: hibernate and then resume on all client SKUs.
+ Minor cleaning up and fixes along the way.

High-level Summary:

The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore.

The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk.

I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue  and introduced a different code path for IOs in dump mode.

Detailed Briefs:

・         nvmeInit.c
changes to manage buffers and IO queues in dump modes.

・         nvmeIo.c
minor tweak in dump mode where only boot-CPU is available

・         nvmeSnti.c
fix existing bug where FLUSH cmd should include NSID (all NSs in this case).

・         nvmeStat.c
helper function change due to some timer related API not available in dump mode.

・         nvmeStd.c
A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1.
B: introduced same waiting logic after clearing CC.EN = 0.
C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced.

Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok’s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change.
(Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.)

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, November 19, 2013 5:04 PM
To: Uma Parepalli; Yong Chen
Subject: RE: [nvmewin] code review: crash dump & hibernation support

Hi Yong,

Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs.

Thanks,
Alex

From: Uma Parepalli [mailto:uma.parepalli at skhms.com]
Sent: Tuesday, November 19, 2013 4:05 PM
To: Alex Chang
Subject: RE: [nvmewin] code review: crash dump & hibernation support

Is there a change log file or something that explains what changes are made without looking at the code?
Uma

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Tuesday, November 19, 2013 4:05 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] code review: crash dump & hibernation support

Hi all,

Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks.

Thanks,
Alex

From: Yong Chen [mailto:Yong.sc.Chen at huawei.com]
Sent: Tuesday, November 19, 2013 3:59 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Alex Chang
Subject: RE: code review: crash dump & hibernation support

<Resending to due to attachment issue>
Hi, all,

Please download the source code from the link in the attached email (you need to have Silverlight installed).

Or to save the trouble for everyone,… Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests.

Thanks,

Yong

From: Yong Chen
Sent: Monday, November 18, 2013 4:13 PM
To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang'
Hi, Alex and all,

Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting.

Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and  fixed it at least for the hardware I have.

Thanks,

Yong

Tests that I have gone thru:

1.       manual crash dump.
KD>.crash and then reboot and KD -z -v MEMORY.DMP, do “!process -1 f”

2.       manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4

3.       SCSICompliance (log attached).

4.       stresses: iostress, sdstress (log attached)

5.       hibernation has been tested on win8.0 as well, but not extensively.

6.       Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver.

7.       All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can’t do hibernation (no S4).

________________________________
Yong Chen
Storage Architect
华为技术有限公司 Huawei Technologies Co., Ltd
[Company_logo]Office:  408-330-5482
Mobile:  425-922-0658
Email: yong.sc.chen at huawei.com<mailto:yong.sc.chen at huawei.com>
2330 Central Expressway
Santa Clara, CA 95050
http://www.huawei.com
本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20131124/372f5d74/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 6737 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20131124/372f5d74/attachment.jpg>


More information about the nvmewin mailing list