[nvmewin] code review: crash dump & hibernation support
    Yong Chen 
    Yong.sc.Chen at huawei.com
       
    Mon Dec 16 15:36:29 PST 2013
    
    
  
Hi, Alex and all,
I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex!
The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick.
The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along.
I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain:
A: installing Win8 now;
B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don¡¯t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests.
Please let me or Alex know if you need the binaries as well.
Thanks,
Yong
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen
Sent: Tuesday, November 26, 2013 2:13 PM
To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] code review: crash dump & hibernation support
See inline.
Thanks,
Yong
From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com]
Sent: Tuesday, November 26, 2013 12:09 PM
To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] code review: crash dump & hibernation support
See my comments below in green.  Also I found two more issues:
According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode.  We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it¡¯s true.
[Yong>] are you talking about this block? You are right, that is exactly the case.
    /*
     * When Crashdump/Hibernation driver is being loaded, need to complete the
     * entire initialization here. In the case of normal driver loading, enable
     * passive initialization and let NVMePassiveInitialization handle the rest
     * of the initialization
     */
    if (pAE->ntldrDump == FALSE) {
¡
        /* Call StorPortPassiveInitialization to enable passive init */
        StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize);
The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning.  This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning.  In hibernate mode we have limited memory and this could cause issues.  I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call.  It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed.
[Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it.
Carolyn
From: Yong Chen [mailto:Yong.sc.Chen at huawei.com]
Sent: Monday, November 25, 2013 11:23 PM
To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Hi, Carolyn,
Thanks for the input.
1.       There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don¡¯t expect DPCs to work properly on Win7?  I couldn¡¯t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks.  Specifically in NVMePassiveInitialize.
[Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set.  NVMePassiveInitialize (&DPC initialization calls)  won¡¯t be issued in dump mode.  see first comment.
2.       The small buffer is reserved during normal driver load. In dump mode you can¡¯t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases.  I¡¯m specifically asking about line 189 in nvmeInit.  It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path.  Can you confirm that function call in crashdump/hibernate mode simply fails?  If it does then I agree that a null buffer here will likely not crash the system.
[Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned.
3.       I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode.
The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step.
The timeout is predefined value STORPORT_TIMER_CB_us.
For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level.
The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values.  My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller.  I¡¯m concerned with how these changes impact the normal runtime driver.  Did you test this function during runtime?  What happens if the maximum time is spent in it?
[Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn¡¯t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly.
Hope these help,
Thanks,
Yong
From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com]
Sent: Monday, November 25, 2013 3:25 PM
To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Hi Yong, I also have some feedback and questions about the code changes.
1.      I am surprised that there are no ntldrDump checks around the DPC initialization and calls.  I wouldn¡¯t have expected the DPCs to work properly on the Windows 7 systems.
2.       In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode.  Will this cause a BSOD?  Or will it just fail?
3.       In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump.  If the controller isn¡¯t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded.  I have some concerns about this new code, what was your reasoning for adding it?
Thanks,
Carolyn
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI
Sent: Monday, November 25, 2013 3:18 PM
To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] code review: crash dump & hibernation support
Hi Yong,
Ensuring CSTS.RDY == 0  before setting  CC.EN to ¡®1¡¯ is required by the spec ¨C it is a necessary part of enabling the adapter.  Therefore it  is not overloading the function and should be kept together.  It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion.
I don¡¯t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn¡¯t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned.
Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself.
As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter  in the current code; so  we are still violating the spec if we don¡¯t enhance your current changes one way or the other.
I say, put the fix in ¨C it¡¯s fairly trivial.
Thanks,
Judy
From: Yong Chen [mailto:Yong.sc.Chen at huawei.com]
Sent: Monday, November 25, 2013 10:53 AM
To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Hi, Judy,
Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize().
Embedding into  NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future.
Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing  this major feature work.
What do other folks think?
Thanks,
Yong
From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com]
Sent: Sunday, November 24, 2013 8:20 AM
To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Hi Yong,
I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ¡®1¡¯.
You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn¡¯t check before calling.
I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure:
* @return BOOLEAN
*     TRUE  - If Adapter is enabled correctly
*     FALSE - If anything goes wrong
******************************************************************************/
BOOLEAN NVMeEnableAdapter(
    PNVME_DEVICE_EXTENSION pAE
)
{
    PQUEUE_INFO pQI = &pAE->QueueInfo;
    NVMe_CONTROLLER_CONFIGURATION CC = {0};
    NVMe_CONTROLLER_STATUS CSTS = {0};
    ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us;
    ULONG PollCount;
    /*
     * Program Admin queue registers before enabling the adapter:
     * Admin Queue Attributes
     */
    StorPortWriteRegisterUlong(
        pAE,
        (PULONG)(&pAE->pCtrlRegister->AQA),
        (pQI->pSubQueueInfo->SubQEntries - 1) +
        ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB));
.
.
.
(further down):
StorPortDebugPrint(INFO, "NVMeEnableAdapter:  Setting EN...\n");
    /*
     * Set up Controller Configuration Register
     */
     /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */
    for (PollCount = 0; PollCount < PollMax; PollCount++) {
        CSTS.AsUlong = StorPortReadRegisterUlong(pAE,
                           (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong));
        if (CSTS.RDY == 0) {
            /* Move on if RDY bit is cleared */
            break;
        }
        NVMeStallExecution(pAE, MAX_STATE_STALL_us);
    }
    if (CSTS.RDY != 0) {
        /* If RDY bit won't clear we can't enable the adapter */
        return FALSE;
    }
    CC.EN = 1;
    CC.CSS = NVME_CC_NVM_CMD;
    CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT);
    CC.AMS = NVME_CC_ROUND_ROBIN;
    CC.SHN = NVME_CC_SHUTDOWN_NONE;
    CC.IOSQES = NVME_CC_IOSQES;
    CC.IOCQES = NVME_CC_IOCQES;
    StorPortWriteRegisterUlong(pAE,
                               (PULONG)(&pAE->pCtrlRegister->CC),
                               CC.AsUlong);
     return TRUE;
} /* NVMeEnableAdapter */
Thanks,
Judy
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen
Sent: Friday, November 22, 2013 1:48 PM
To: Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] code review: crash dump & hibernation support
Hi, everyone,
I hope many are busy testing the changes on your devices. If you have any feedback to share, I¡¯d very appreciate it. Holiday is upon us and we¡¯d like to wrap up this much delayed soon.
Thanks,
Yong
From: Yong Chen
Sent: Wednesday, November 20, 2013 12:52 AM
To: 'Alex Chang'; Uma Parepalli
Cc: 'nvmewin at lists.openfabrics.org'
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client).
Object #2: hibernate and then resume on all client SKUs.
+ Minor cleaning up and fixes along the way.
High-level Summary:
The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore.
The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk.
I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue  and introduced a different code path for IOs in dump mode.
Detailed Briefs:
¡¤         nvmeInit.c
changes to manage buffers and IO queues in dump modes.
¡¤         nvmeIo.c
minor tweak in dump mode where only boot-CPU is available
¡¤         nvmeSnti.c
fix existing bug where FLUSH cmd should include NSID (all NSs in this case).
¡¤         nvmeStat.c
helper function change due to some timer related API not available in dump mode.
¡¤         nvmeStd.c
A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1.
B: introduced same waiting logic after clearing CC.EN = 0.
C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced.
Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok¡¯s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change.
(Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.)
From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, November 19, 2013 5:04 PM
To: Uma Parepalli; Yong Chen
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Hi Yong,
Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs.
Thanks,
Alex
From: Uma Parepalli [mailto:uma.parepalli at skhms.com]
Sent: Tuesday, November 19, 2013 4:05 PM
To: Alex Chang
Subject: RE: [nvmewin] code review: crash dump & hibernation support
Is there a change log file or something that explains what changes are made without looking at the code?
Uma
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Tuesday, November 19, 2013 4:05 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] code review: crash dump & hibernation support
Hi all,
Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks.
Thanks,
Alex
From: Yong Chen [mailto:Yong.sc.Chen at huawei.com]
Sent: Tuesday, November 19, 2013 3:59 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Alex Chang
Subject: RE: code review: crash dump & hibernation support
<Resending to due to attachment issue>
Hi, all,
Please download the source code from the link in the attached email (you need to have Silverlight installed).
Or to save the trouble for everyone,¡ Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests.
Thanks,
Yong
From: Yong Chen
Sent: Monday, November 18, 2013 4:13 PM
To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang'
Hi, Alex and all,
Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting.
Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and  fixed it at least for the hardware I have.
Thanks,
Yong
Tests that I have gone thru:
1.       manual crash dump.
KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ¡°!process -1 f¡±
2.       manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4
3.       SCSICompliance (log attached).
4.       stresses: iostress, sdstress (log attached)
5.       hibernation has been tested on win8.0 as well, but not extensively.
6.       Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver.
7.       All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can¡¯t do hibernation (no S4).
________________________________
Yong Chen
Storage Architect
»ªÎª¼¼ÊõÓÐÏÞ¹«Ë¾ Huawei Technologies Co., Ltd
[Company_logo]Office:  408-330-5482
Mobile:  425-922-0658
Email: yong.sc.chen at huawei.com<mailto:yong.sc.chen at huawei.com>
2330 Central Expressway
Santa Clara, CA 95050
http://www.huawei.com
±¾Óʼþ¼°Æä¸½¼þº¬ÓлªÎª¹«Ë¾µÄ±£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгöµÄ¸öÈË»òȺ×é¡£½û
Ö¹ÈÎºÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØÐ¹Â¶¡¢¸´ÖÆ¡¢»òÉ¢·¢£©±¾ÓʼþÖÐ
µÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾Óʼþ£¡
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20131216/7350e6ca/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 6737 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20131216/7350e6ca/attachment.jpg>
    
    
More information about the nvmewin
mailing list