From Alex.Chang at pmcs.com Mon Nov 3 13:25:03 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 3 Nov 2014 21:25:03 +0000 Subject: [nvmewin] Samsung Patch for Bus Reset Enhancements In-Reply-To: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> References: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> Message-ID: Hi Carolyn and Parag/Rick, Please let me know if you approve the patch. Thank you very much, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 29, 2014 5:02 AM To: Judy Brock-SSI; Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: RE: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, I have revised the patch with the following review comments incorporated - 1. Both polledMode and hwResetInProg are used for exactly same purpose. [Suman] As suggested by Judy, replaced all occurrences of hwResetInProg and polledMode with a single variable polledResetInProg. 2. There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant. [Suman] Changed code to call the StorPortResume(pAE) in Line2434 of nvmestd.c only when pAE->DriverState.NextDriverState != NVMeStartComplete. 3. To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } [Suman] Corrected as per suggestion. 4. Rename IoCompletionDpcRoutine to avoid confusion, since this routine will be called in both DPC and polled mode. [Suman] As suggested by Judy, renamed this routine to IoCompletionRoutine and added comment in header section that this routine can either be scheduled to run as a DPC or called directly. Please find attached the revised patch. Password is samsung123. Thanks all for reviewing. Thanks, Suman ------- Original Message ------- Sender : Judy Brock-SSI Date : Oct 29, 2014 05:24 (GMT+05:00) Title : RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Thank you for your feedback. It is a good idea to eliminate confusion over the currently-named routine – IoCompletionDpcRoutine - which does indeed imply running as a DPC. We could either create a new function, as you suggest or alternatively, we could just rename IoCompletionDpcRoutine directly to for example, “ProcessIoCompletions” or just “IoCompletionRoutine” and then indicate at the top of that routine that it can either be scheduled to run as a DPC or called directly, depending on context. If we take the word “Dpc” out of it, I think it might eliminate all potential confusion. If it is ok with you, I think renaming IoCompletionDpcRoutine to no longer imply the method by which its invoked might be preferable as this would also cover/clarify the case where, in dump mode, NVMeIsrMsix is currently calling IoCompletionDpcRoutine directly. If you still prefer a new routine, to cover both cases above, the name should imply the direct call nature of the function rather than polled mode per se since, if we call it from the ISR it could lead to the opposite confusion – a function that implies no ISR involvement being called from the ISR itself. We could: Create new function called ImmediateCompletionProcessing (or some other name that implies non-deferred processing) which in turn would call IoCompletionDpcRoutine. But again, I think renaming IoCompletionDpcRoutine is better because no matter what we name a new function, if it still calls a routine with the word “Dpc” in it, it will still have the potential to create confusion in (especially new) readers minds. Thanks, Judy From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, October 28, 2014 4:24 PM To: Alex Chang; Judy Brock-SSI; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, Judy and Suman, I have completed my testing of the proposed patch and verified that it works. I also agree with Judy’s comments below. Since we’re now calling a function that is normally called in a DPC, for clarity, I would like to see a new function that calls the DPC routine instead. You could give the new function a name that indicates that it will handle command completions in polled mode. Then that new function would call the DPC routine directly, and the new function would be called from RunningStartAttempt. This way it doesn’t look like the RunningStartAttempt routine is doing anything with DPCs. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, October 27, 2014 8:57 PM To: Judy Brock-SSI; Foster, Carolyn D; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Please let us know what you think. Thanks, Alex From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Thursday, October 23, 2014 6:45 PM To: Foster, Carolyn D; Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Replies inline below in blue. Thanks, Judy -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, October 22, 2014 4:29 PM To: Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, I have some feedback in addition to Alex's comments. I believe there is an issue with the loop that was added to NVMeRunningStartAttempt. The issue is that IoCompletionDpcRoutine was never meant to be called directly. It was architected and designed to always run from a DPC. [Judy] That’s because at runtime, we don’t want to be doing time-consuming request-completion work in the ISR. Therefore the work is offloaded to a DPC which runs at a lower IRQL. However, the work we need to do to process cmd completions is fixed - there is actually no innate architectural design impediment in the routine itself to calling this routine directly in the two scenarios our patch addresses – i.e., those situations where by architectural definition we are expected to finish all work before returning to the caller (and in our case, that includes sending and completing multiple commands in our init state machine ).. Those scenarios are the two that Suman listed in the change notes: a) NVMeResetBus b) NVMeAdapterControl-> ScsiRestartAdapter By design, we don’t want to schedule a DPC to handle completions for the commands generated by the init state machine in these 2 reset paths – we want to poll. That’s why we make the direct call instead. It's possible that a command from the init state machine could generate an interrupt and run the IoCompletionDpcRoutine before it can be called in RunningStartAttempt. [Judy] This can’t happen. If an interrupt is generated on behalf of a command from the init state machine during the first scenario above (NVMeResetBus), the hwResetInProg flag at the top of the ISR causes us to return immediately: NVMeIsrMsix ( … if (pAE->hwResetInProg) return TRUE; The second scenario above (NVMeAdapterControl-> ScsiRestartAdapter) is not interrupt-driven by definition. That is, at the time it is called, interrupts aren’t enabled. But even if it they were, the hwResetInProg flag would catch it. A better solution would be to have a loop similar to the one at the end of NVMePassiveInitialize where RunningStartAttempt is called, and is followed by a loop that waits for the state machine to complete. [Judy] This is actually the first approach we took and were intending to use but we found it didn’t work. The reason was the loop you refer to is periodic timer-driven but the timer was not getting scheduled in the NVMeAdapterControl-> ScsiRestartAdapter path as there is no timer available at that point. The reason this is not an issue for the current OFA driver is because we launch the state machine but then return from the call to NVMeAdapterControl and let the state machine run asynchronously and complete outside of that context (violates the spec). As the patch is currently written I am not comfortable approving it.This change to wait for the state machine's completion could be made in the new ReinitializeController function, and then you wouldn't need the changes to RunningStartAttempt or any of the polledmode code. [Judy] The approach you propose will not work for the reason explained above. Again, we too had first hoped it would but it won’t. Hence we went to a polled-mode model. Since we have to finish all work before returning anyway and since reset bus is not a performance path, there is no downside to polling. Thanks, Carolyn -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, October 21, 2014 1:20 PM To: suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, (1) There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant because, when NextDriverState is NVMeStartComplete, in the end of NVMeRunning, StorPortResume had been called already. (2) To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 15, 2014 6:00 AM To: nvmewin at lists.openfabrics.org Cc: Alex Chang; cgps at samsung.com Subject: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Bus Reset Enhancements. Please find attached the source code. The password is samsung123 Please find the change description below - 1. There are multiple paths in the driver that reset the controller and execute the initialization state machine. Our patch is not concerned with the majority of those paths. Aside from a few additional isolated modifications, our patch focuses on the two paths that are supposed to be synchronous -i.e. they should not return to caller until all work is completed - but which currently are not so. They are: a) NVMeResetBus (and) b) NVMeAdapterControl-> ScsiRestartAdapter We have introduced a new routine NVMeReInitializeController(), which will be invoked from NVMeReseBus() and NVMeAdapterControl() - ScsiRestartAdapter. This routine will reset and initialize the controller and then complete the requests. It will not return until the initialization state machine is complete. We disallow processing of any SRB in NVMeStartIo() when NextDriverState != NVMeStateComplete. In this way we direct the PowerUp operations to be executed in NVMeAdapterControl() - ScsiRestartAdapter only. When resuming from hibernation for example, NVMeStartio() will not process the POWER SRB. Instead, the Power Up operations will be invoked in NVMeAdapterControl()->ScsiRestartAdapter. Additionally , Miniport drivers should disregard requests to reset the bus when ntldrDump is set to TRUE in NvmeResetBus(). But current implementation processes this request. 2. When pAE->ntldrDump is TRUE, in the NVMeMapCore2Queue() routine, the pPGT value is NULL. Hence a BSOD occurs when executing ULONG coreNum = (ULONG)(pPN->Number + pPGT->BaseProcessor). We fixed the problem by moving access to pPGT when ntldrDump is FALSE. 3. In ProcessIo(), when IoStatus is set to NOT_SUBMITTED, the SRB is not completed. Due to this, a BSOD was occuring when executing WHCK test "DP WLK - Hot-Add - Device test". We fixed the problem by changing the code to complete SRB when IoStatus is NOT_SUBMITTED. 4. We changed the use of StorPortBusy()/StorPortReady() to StorPortPause()/StorPortResume(), since StorPortBusy() will not prevent new IOS from coming in once the current ones in the driver have been completed. Tested the following on Win7 and Windows 2012R2. - WHCK - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter Thanks, Suman _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin [cid:image001.gif at 01CFF769.7F6BC630] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD000.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD000.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From carolyn.d.foster at intel.com Mon Nov 3 13:30:01 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Mon, 3 Nov 2014 21:30:01 +0000 Subject: [nvmewin] Samsung Patch for Bus Reset Enhancements In-Reply-To: References: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> Message-ID: Hi Alex, I approve the patch. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, November 03, 2014 2:25 PM To: suman.p at samsung.com; Judy Brock-SSI; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn and Parag/Rick, Please let me know if you approve the patch. Thank you very much, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 29, 2014 5:02 AM To: Judy Brock-SSI; Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: RE: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, I have revised the patch with the following review comments incorporated - 1. Both polledMode and hwResetInProg are used for exactly same purpose. [Suman] As suggested by Judy, replaced all occurrences of hwResetInProg and polledMode with a single variable polledResetInProg. 2. There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant. [Suman] Changed code to call the StorPortResume(pAE) in Line2434 of nvmestd.c only when pAE->DriverState.NextDriverState != NVMeStartComplete. 3. To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } [Suman] Corrected as per suggestion. 4. Rename IoCompletionDpcRoutine to avoid confusion, since this routine will be called in both DPC and polled mode. [Suman] As suggested by Judy, renamed this routine to IoCompletionRoutine and added comment in header section that this routine can either be scheduled to run as a DPC or called directly. Please find attached the revised patch. Password is samsung123. Thanks all for reviewing. Thanks, Suman ------- Original Message ------- Sender : Judy Brock-SSI> Date : Oct 29, 2014 05:24 (GMT+05:00) Title : RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Thank you for your feedback. It is a good idea to eliminate confusion over the currently-named routine – IoCompletionDpcRoutine - which does indeed imply running as a DPC. We could either create a new function, as you suggest or alternatively, we could just rename IoCompletionDpcRoutine directly to for example, “ProcessIoCompletions” or just “IoCompletionRoutine” and then indicate at the top of that routine that it can either be scheduled to run as a DPC or called directly, depending on context. If we take the word “Dpc” out of it, I think it might eliminate all potential confusion. If it is ok with you, I think renaming IoCompletionDpcRoutine to no longer imply the method by which its invoked might be preferable as this would also cover/clarify the case where, in dump mode, NVMeIsrMsix is currently calling IoCompletionDpcRoutine directly. If you still prefer a new routine, to cover both cases above, the name should imply the direct call nature of the function rather than polled mode per se since, if we call it from the ISR it could lead to the opposite confusion – a function that implies no ISR involvement being called from the ISR itself. We could: Create new function called ImmediateCompletionProcessing (or some other name that implies non-deferred processing) which in turn would call IoCompletionDpcRoutine. But again, I think renaming IoCompletionDpcRoutine is better because no matter what we name a new function, if it still calls a routine with the word “Dpc” in it, it will still have the potential to create confusion in (especially new) readers minds. Thanks, Judy From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, October 28, 2014 4:24 PM To: Alex Chang; Judy Brock-SSI; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, Judy and Suman, I have completed my testing of the proposed patch and verified that it works. I also agree with Judy’s comments below. Since we’re now calling a function that is normally called in a DPC, for clarity, I would like to see a new function that calls the DPC routine instead. You could give the new function a name that indicates that it will handle command completions in polled mode. Then that new function would call the DPC routine directly, and the new function would be called from RunningStartAttempt. This way it doesn’t look like the RunningStartAttempt routine is doing anything with DPCs. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, October 27, 2014 8:57 PM To: Judy Brock-SSI; Foster, Carolyn D; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Please let us know what you think. Thanks, Alex From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Thursday, October 23, 2014 6:45 PM To: Foster, Carolyn D; Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Replies inline below in blue. Thanks, Judy -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, October 22, 2014 4:29 PM To: Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, I have some feedback in addition to Alex's comments. I believe there is an issue with the loop that was added to NVMeRunningStartAttempt. The issue is that IoCompletionDpcRoutine was never meant to be called directly. It was architected and designed to always run from a DPC. [Judy] That’s because at runtime, we don’t want to be doing time-consuming request-completion work in the ISR. Therefore the work is offloaded to a DPC which runs at a lower IRQL. However, the work we need to do to process cmd completions is fixed - there is actually no innate architectural design impediment in the routine itself to calling this routine directly in the two scenarios our patch addresses – i.e., those situations where by architectural definition we are expected to finish all work before returning to the caller (and in our case, that includes sending and completing multiple commands in our init state machine ).. Those scenarios are the two that Suman listed in the change notes: a) NVMeResetBus b) NVMeAdapterControl-> ScsiRestartAdapter By design, we don’t want to schedule a DPC to handle completions for the commands generated by the init state machine in these 2 reset paths – we want to poll. That’s why we make the direct call instead. It's possible that a command from the init state machine could generate an interrupt and run the IoCompletionDpcRoutine before it can be called in RunningStartAttempt. [Judy] This can’t happen. If an interrupt is generated on behalf of a command from the init state machine during the first scenario above (NVMeResetBus), the hwResetInProg flag at the top of the ISR causes us to return immediately: NVMeIsrMsix ( … if (pAE->hwResetInProg) return TRUE; The second scenario above (NVMeAdapterControl-> ScsiRestartAdapter) is not interrupt-driven by definition. That is, at the time it is called, interrupts aren’t enabled. But even if it they were, the hwResetInProg flag would catch it. A better solution would be to have a loop similar to the one at the end of NVMePassiveInitialize where RunningStartAttempt is called, and is followed by a loop that waits for the state machine to complete. [Judy] This is actually the first approach we took and were intending to use but we found it didn’t work. The reason was the loop you refer to is periodic timer-driven but the timer was not getting scheduled in the NVMeAdapterControl-> ScsiRestartAdapter path as there is no timer available at that point. The reason this is not an issue for the current OFA driver is because we launch the state machine but then return from the call to NVMeAdapterControl and let the state machine run asynchronously and complete outside of that context (violates the spec). As the patch is currently written I am not comfortable approving it.This change to wait for the state machine's completion could be made in the new ReinitializeController function, and then you wouldn't need the changes to RunningStartAttempt or any of the polledmode code. [Judy] The approach you propose will not work for the reason explained above. Again, we too had first hoped it would but it won’t. Hence we went to a polled-mode model. Since we have to finish all work before returning anyway and since reset bus is not a performance path, there is no downside to polling. Thanks, Carolyn -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, October 21, 2014 1:20 PM To: suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, (1) There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant because, when NextDriverState is NVMeStartComplete, in the end of NVMeRunning, StorPortResume had been called already. (2) To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 15, 2014 6:00 AM To: nvmewin at lists.openfabrics.org Cc: Alex Chang; cgps at samsung.com Subject: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Bus Reset Enhancements. Please find attached the source code. The password is samsung123 Please find the change description below - 1. There are multiple paths in the driver that reset the controller and execute the initialization state machine. Our patch is not concerned with the majority of those paths. Aside from a few additional isolated modifications, our patch focuses on the two paths that are supposed to be synchronous -i.e. they should not return to caller until all work is completed - but which currently are not so. They are: a) NVMeResetBus (and) b) NVMeAdapterControl-> ScsiRestartAdapter We have introduced a new routine NVMeReInitializeController(), which will be invoked from NVMeReseBus() and NVMeAdapterControl() - ScsiRestartAdapter. This routine will reset and initialize the controller and then complete the requests. It will not return until the initialization state machine is complete. We disallow processing of any SRB in NVMeStartIo() when NextDriverState != NVMeStateComplete. In this way we direct the PowerUp operations to be executed in NVMeAdapterControl() - ScsiRestartAdapter only. When resuming from hibernation for example, NVMeStartio() will not process the POWER SRB. Instead, the Power Up operations will be invoked in NVMeAdapterControl()->ScsiRestartAdapter. Additionally , Miniport drivers should disregard requests to reset the bus when ntldrDump is set to TRUE in NvmeResetBus(). But current implementation processes this request. 2. When pAE->ntldrDump is TRUE, in the NVMeMapCore2Queue() routine, the pPGT value is NULL. Hence a BSOD occurs when executing ULONG coreNum = (ULONG)(pPN->Number + pPGT->BaseProcessor). We fixed the problem by moving access to pPGT when ntldrDump is FALSE. 3. In ProcessIo(), when IoStatus is set to NOT_SUBMITTED, the SRB is not completed. Due to this, a BSOD was occuring when executing WHCK test "DP WLK - Hot-Add - Device test". We fixed the problem by changing the code to complete SRB when IoStatus is NOT_SUBMITTED. 4. We changed the use of StorPortBusy()/StorPortReady() to StorPortPause()/StorPortResume(), since StorPortBusy() will not prevent new IOS from coming in once the current ones in the driver have been completed. Tested the following on Win7 and Windows 2012R2. - WHCK - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter Thanks, Suman _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin [cid:image001.gif at 01CFF772.A187FCD0] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Mon Nov 3 13:49:25 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 3 Nov 2014 21:49:25 +0000 Subject: [nvmewin] Samsung Patch for Bus Reset Enhancements In-Reply-To: References: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> Message-ID: Thanks, Carolyn. Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 03, 2014 1:30 PM To: Alex Chang; suman.p at samsung.com; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, I approve the patch. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, November 03, 2014 2:25 PM To: suman.p at samsung.com; Judy Brock-SSI; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn and Parag/Rick, Please let me know if you approve the patch. Thank you very much, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 29, 2014 5:02 AM To: Judy Brock-SSI; Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: RE: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, I have revised the patch with the following review comments incorporated - 1. Both polledMode and hwResetInProg are used for exactly same purpose. [Suman] As suggested by Judy, replaced all occurrences of hwResetInProg and polledMode with a single variable polledResetInProg. 2. There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant. [Suman] Changed code to call the StorPortResume(pAE) in Line2434 of nvmestd.c only when pAE->DriverState.NextDriverState != NVMeStartComplete. 3. To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } [Suman] Corrected as per suggestion. 4. Rename IoCompletionDpcRoutine to avoid confusion, since this routine will be called in both DPC and polled mode. [Suman] As suggested by Judy, renamed this routine to IoCompletionRoutine and added comment in header section that this routine can either be scheduled to run as a DPC or called directly. Please find attached the revised patch. Password is samsung123. Thanks all for reviewing. Thanks, Suman ------- Original Message ------- Sender : Judy Brock-SSI> Date : Oct 29, 2014 05:24 (GMT+05:00) Title : RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Thank you for your feedback. It is a good idea to eliminate confusion over the currently-named routine – IoCompletionDpcRoutine - which does indeed imply running as a DPC. We could either create a new function, as you suggest or alternatively, we could just rename IoCompletionDpcRoutine directly to for example, “ProcessIoCompletions” or just “IoCompletionRoutine” and then indicate at the top of that routine that it can either be scheduled to run as a DPC or called directly, depending on context. If we take the word “Dpc” out of it, I think it might eliminate all potential confusion. If it is ok with you, I think renaming IoCompletionDpcRoutine to no longer imply the method by which its invoked might be preferable as this would also cover/clarify the case where, in dump mode, NVMeIsrMsix is currently calling IoCompletionDpcRoutine directly. If you still prefer a new routine, to cover both cases above, the name should imply the direct call nature of the function rather than polled mode per se since, if we call it from the ISR it could lead to the opposite confusion – a function that implies no ISR involvement being called from the ISR itself. We could: Create new function called ImmediateCompletionProcessing (or some other name that implies non-deferred processing) which in turn would call IoCompletionDpcRoutine. But again, I think renaming IoCompletionDpcRoutine is better because no matter what we name a new function, if it still calls a routine with the word “Dpc” in it, it will still have the potential to create confusion in (especially new) readers minds. Thanks, Judy From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, October 28, 2014 4:24 PM To: Alex Chang; Judy Brock-SSI; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, Judy and Suman, I have completed my testing of the proposed patch and verified that it works. I also agree with Judy’s comments below. Since we’re now calling a function that is normally called in a DPC, for clarity, I would like to see a new function that calls the DPC routine instead. You could give the new function a name that indicates that it will handle command completions in polled mode. Then that new function would call the DPC routine directly, and the new function would be called from RunningStartAttempt. This way it doesn’t look like the RunningStartAttempt routine is doing anything with DPCs. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, October 27, 2014 8:57 PM To: Judy Brock-SSI; Foster, Carolyn D; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Please let us know what you think. Thanks, Alex From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Thursday, October 23, 2014 6:45 PM To: Foster, Carolyn D; Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Replies inline below in blue. Thanks, Judy -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, October 22, 2014 4:29 PM To: Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, I have some feedback in addition to Alex's comments. I believe there is an issue with the loop that was added to NVMeRunningStartAttempt. The issue is that IoCompletionDpcRoutine was never meant to be called directly. It was architected and designed to always run from a DPC. [Judy] That’s because at runtime, we don’t want to be doing time-consuming request-completion work in the ISR. Therefore the work is offloaded to a DPC which runs at a lower IRQL. However, the work we need to do to process cmd completions is fixed - there is actually no innate architectural design impediment in the routine itself to calling this routine directly in the two scenarios our patch addresses – i.e., those situations where by architectural definition we are expected to finish all work before returning to the caller (and in our case, that includes sending and completing multiple commands in our init state machine ).. Those scenarios are the two that Suman listed in the change notes: a) NVMeResetBus b) NVMeAdapterControl-> ScsiRestartAdapter By design, we don’t want to schedule a DPC to handle completions for the commands generated by the init state machine in these 2 reset paths – we want to poll. That’s why we make the direct call instead. It's possible that a command from the init state machine could generate an interrupt and run the IoCompletionDpcRoutine before it can be called in RunningStartAttempt. [Judy] This can’t happen. If an interrupt is generated on behalf of a command from the init state machine during the first scenario above (NVMeResetBus), the hwResetInProg flag at the top of the ISR causes us to return immediately: NVMeIsrMsix ( … if (pAE->hwResetInProg) return TRUE; The second scenario above (NVMeAdapterControl-> ScsiRestartAdapter) is not interrupt-driven by definition. That is, at the time it is called, interrupts aren’t enabled. But even if it they were, the hwResetInProg flag would catch it. A better solution would be to have a loop similar to the one at the end of NVMePassiveInitialize where RunningStartAttempt is called, and is followed by a loop that waits for the state machine to complete. [Judy] This is actually the first approach we took and were intending to use but we found it didn’t work. The reason was the loop you refer to is periodic timer-driven but the timer was not getting scheduled in the NVMeAdapterControl-> ScsiRestartAdapter path as there is no timer available at that point. The reason this is not an issue for the current OFA driver is because we launch the state machine but then return from the call to NVMeAdapterControl and let the state machine run asynchronously and complete outside of that context (violates the spec). As the patch is currently written I am not comfortable approving it.This change to wait for the state machine's completion could be made in the new ReinitializeController function, and then you wouldn't need the changes to RunningStartAttempt or any of the polledmode code. [Judy] The approach you propose will not work for the reason explained above. Again, we too had first hoped it would but it won’t. Hence we went to a polled-mode model. Since we have to finish all work before returning anyway and since reset bus is not a performance path, there is no downside to polling. Thanks, Carolyn -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, October 21, 2014 1:20 PM To: suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, (1) There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant because, when NextDriverState is NVMeStartComplete, in the end of NVMeRunning, StorPortResume had been called already. (2) To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 15, 2014 6:00 AM To: nvmewin at lists.openfabrics.org Cc: Alex Chang; cgps at samsung.com Subject: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Bus Reset Enhancements. Please find attached the source code. The password is samsung123 Please find the change description below - 1. There are multiple paths in the driver that reset the controller and execute the initialization state machine. Our patch is not concerned with the majority of those paths. Aside from a few additional isolated modifications, our patch focuses on the two paths that are supposed to be synchronous -i.e. they should not return to caller until all work is completed - but which currently are not so. They are: a) NVMeResetBus (and) b) NVMeAdapterControl-> ScsiRestartAdapter We have introduced a new routine NVMeReInitializeController(), which will be invoked from NVMeReseBus() and NVMeAdapterControl() - ScsiRestartAdapter. This routine will reset and initialize the controller and then complete the requests. It will not return until the initialization state machine is complete. We disallow processing of any SRB in NVMeStartIo() when NextDriverState != NVMeStateComplete. In this way we direct the PowerUp operations to be executed in NVMeAdapterControl() - ScsiRestartAdapter only. When resuming from hibernation for example, NVMeStartio() will not process the POWER SRB. Instead, the Power Up operations will be invoked in NVMeAdapterControl()->ScsiRestartAdapter. Additionally , Miniport drivers should disregard requests to reset the bus when ntldrDump is set to TRUE in NvmeResetBus(). But current implementation processes this request. 2. When pAE->ntldrDump is TRUE, in the NVMeMapCore2Queue() routine, the pPGT value is NULL. Hence a BSOD occurs when executing ULONG coreNum = (ULONG)(pPN->Number + pPGT->BaseProcessor). We fixed the problem by moving access to pPGT when ntldrDump is FALSE. 3. In ProcessIo(), when IoStatus is set to NOT_SUBMITTED, the SRB is not completed. Due to this, a BSOD was occuring when executing WHCK test "DP WLK - Hot-Add - Device test". We fixed the problem by changing the code to complete SRB when IoStatus is NOT_SUBMITTED. 4. We changed the use of StorPortBusy()/StorPortReady() to StorPortPause()/StorPortResume(), since StorPortBusy() will not prevent new IOS from coming in once the current ones in the driver have been completed. Tested the following on Win7 and Windows 2012R2. - WHCK - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter Thanks, Suman _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin [cid:image001.gif at 01CFF76C.E715FBE0] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From parag.sheth at seagate.com Mon Nov 3 15:53:48 2014 From: parag.sheth at seagate.com (Parag Sheth) Date: Mon, 3 Nov 2014 15:53:48 -0800 Subject: [nvmewin] Samsung Patch for Bus Reset Enhancements In-Reply-To: References: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> Message-ID: Hi Alex, The changes look good. I approve the patch. Thanks Parag Sheth On Mon, Nov 3, 2014 at 1:49 PM, Alex Chang wrote: > Thanks, Carolyn. > > > > Alex > > > > *From:* Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] > *Sent:* Monday, November 03, 2014 1:30 PM > *To:* Alex Chang; suman.p at samsung.com; Judy Brock-SSI; > nvmewin at lists.openfabrics.org > *Subject:* RE: RE: Samsung Patch for Bus Reset Enhancements > > > > Hi Alex, I approve the patch. > > > > Carolyn > > > > *From:* Alex Chang [mailto:Alex.Chang at pmcs.com ] > *Sent:* Monday, November 03, 2014 2:25 PM > *To:* suman.p at samsung.com; Judy Brock-SSI; Foster, Carolyn D; > nvmewin at lists.openfabrics.org > *Subject:* RE: RE: Samsung Patch for Bus Reset Enhancements > > > > Hi Carolyn and Parag/Rick, > > > > Please let me know if you approve the patch. > > > > Thank you very much, > > Alex > > > > *From:* SUMAN PRAKASH B [mailto:suman.p at samsung.com ] > > *Sent:* Wednesday, October 29, 2014 5:02 AM > *To:* Judy Brock-SSI; Foster, Carolyn D; Alex Chang; > nvmewin at lists.openfabrics.org > *Subject:* Re: RE: Samsung Patch for Bus Reset Enhancements > > > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > Date: %%SENT_DATE%% > > Subject: Suspect Message Quarantined > > > > > > > > WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: > > > > %%DESC%% > > > > The full message and the attachment have been stored in the quarantine. > > > > The identifier for this message is '%%QID%%'. > > > > Access the quarantine at: > > https://puremessage.pmc-sierra.bc.ca:28443/ > > > > For more information on PMC's Anti-Spam system: > > http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ > > > > IT Services > > PureMessage Admin > > > > Hi Alex, > > I have revised the patch with the following review comments incorporated - > > 1. Both polledMode and hwResetInProg are used for exactly same purpose. > [Suman] As suggested by Judy, replaced all occurrences of hwResetInProg > and polledMode with a single variable polledResetInProg. > > 2. There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which > is redundant. > [Suman] Changed code to call the StorPortResume(pAE) in Line2434 of > nvmestd.c only when pAE->DriverState.NextDriverState != NVMeStartComplete. > > 3. To comply with our agreed coding style and make the logic easier, may I > suggest changing Line#184 of nvmestat.c to: > if (pAE->ntldrDump == FALSE) { > if (pAE->polledMode == FALSE) { > NVMeRunning(pAE); > } else { > /* > * we poll if we're launching the reinit state machine from > HwStorResetBus > * or HwStorAdapterControl->ScsiRestartAdapter path > */ > NVMeRunning(pAE); > /* TO val is based on CAP register plus a few, 5, seconds to init > post RDY */ > passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * > MICRO_TO_SEC); > ... > return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? > TRUE : FALSE; > } > } else { > PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; > ..... > } > [Suman] Corrected as per suggestion. > > 4. Rename IoCompletionDpcRoutine to avoid confusion, since this routine > will be called in both DPC and polled mode. > [Suman] As suggested by Judy, renamed this routine to IoCompletionRoutine > and added comment in header section that this routine can either be > scheduled to run as a DPC or called directly. > > Please find attached the revised patch. Password is samsung123. > > Thanks all for reviewing. > > Thanks, > Suman > > > > > > ------- *Original Message* ------- > > *Sender* : Judy Brock-SSI > > *Date* : Oct 29, 2014 05:24 (GMT+05:00) > > *Title* : RE: Samsung Patch for Bus Reset Enhancements > > > > Hi Carolyn, > > Thank you for your feedback. It is a good idea to eliminate confusion over > the currently-named routine – IoCompletionDpcRoutine - which does indeed > imply running as a DPC. > > We could either create a new function, as you suggest or alternatively, we > could just rename IoCompletionDpcRoutine directly to for example, “ > ProcessIoCompletions” or just “IoCompletionRoutine” and then indicate at > the top of that routine that it can either be scheduled to run as a DPC or > called directly, depending on context. If we take the word “Dpc” out of it, > I think it might eliminate all potential confusion. > > If it is ok with you, I think renaming IoCompletionDpcRoutine to no > longer imply the method by which its invoked might be preferable as this > would also cover/clarify the case where, in dump mode, NVMeIsrMsix is > currently calling IoCompletionDpcRoutine directly. > > If you still prefer a new routine, to cover both cases above, the name > should imply the direct call nature of the function rather than polled mode > per se since, if we call it from the ISR it could lead to the opposite > confusion – a function that implies no ISR involvement being called from > the ISR itself. We could: > > Create new function called ImmediateCompletionProcessing (or some other > name that implies non-deferred processing) which in turn would call > IoCompletionDpcRoutine. > > But again, I think renaming IoCompletionDpcRoutine is better because no > matter what we name a new function, if it still calls a routine with the > word “Dpc” in it, it will still have the potential to create confusion in > (especially new) readers minds. > > Thanks, > > Judy > > *From:* Foster, Carolyn D [mailto:carolyn.d.foster at intel.com > ] > *Sent:* Tuesday, October 28, 2014 4:24 PM > *To:* Alex Chang; Judy Brock-SSI; suman.p at samsung.com; > nvmewin at lists.openfabrics.org > *Subject:* RE: Samsung Patch for Bus Reset Enhancements > > Hi Alex, Judy and Suman, > > I have completed my testing of the proposed patch and verified that it > works. I also agree with Judy’s comments below. Since we’re now calling a > function that is normally called in a DPC, for clarity, I would like to see > a new function that calls the DPC routine instead. You could give the new > function a name that indicates that it will handle command completions in > polled mode. Then that new function would call the DPC routine directly, > and the new function would be called from RunningStartAttempt. This way > it doesn’t look like the RunningStartAttempt routine is doing anything with > DPCs. > > Thanks, > > Carolyn > > *From:* Alex Chang [mailto:Alex.Chang at pmcs.com ] > *Sent:* Monday, October 27, 2014 8:57 PM > *To:* Judy Brock-SSI; Foster, Carolyn D; suman.p at samsung.com; > nvmewin at lists.openfabrics.org > *Subject:* RE: Samsung Patch for Bus Reset Enhancements > > Hi Carolyn, > > Please let us know what you think. > > Thanks, > > Alex > > *From:* Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com > ] > *Sent:* Thursday, October 23, 2014 6:45 PM > *To:* Foster, Carolyn D; Alex Chang; suman.p at samsung.com; > nvmewin at lists.openfabrics.org > *Subject:* RE: Samsung Patch for Bus Reset Enhancements > > Hi Carolyn, > > Replies inline below in blue. > > Thanks, > > Judy > > -----Original Message----- > From: nvmewin-bounces at lists.openfabrics.org [ > mailto:nvmewin-bounces at lists.openfabrics.org > ] On Behalf Of Foster, Carolyn D > Sent: Wednesday, October 22, 2014 4:29 PM > To: Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org > Cc: cgps at samsung.com > Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements > > Hi Suman, > > I have some feedback in addition to Alex's comments. I believe there is > an issue with the loop that was added to NVMeRunningStartAttempt. The > issue is that IoCompletionDpcRoutine was never meant to be called > directly. It was architected and designed to always run from a DPC. > > [Judy] > > That’s because at runtime, we don’t want to be doing > time-consuming request-completion work in the ISR. Therefore the work is > offloaded to a DPC which runs at a lower IRQL. However, the work we need > to do to process cmd completions is fixed - there is actually no innate > architectural design impediment in the routine itself to calling this > routine directly in the two scenarios our patch addresses – i.e., those > situations where by architectural definition we are expected to finish all > work before returning to the caller (and in our case, that includes > sending and completing multiple commands in our init state machine ).. > Those scenarios are the two that Suman listed in the change notes: > > a) NVMeResetBus > > b) NVMeAdapterControl-> ScsiRestartAdapter > > By design, we don’t want to schedule a DPC to handle completions for the > commands generated by the init state machine in these 2 reset paths – we > want to poll. That’s why we make the direct call instead. > > It's possible that a command from the init state machine could generate > an interrupt and run the IoCompletionDpcRoutine before it can be called in > RunningStartAttempt. > > [Judy] > > This can’t happen. > > If an interrupt is generated on behalf of a command from the init state > machine during the first scenario above (NVMeResetBus), the hwResetInProg > flag at the top of the ISR causes us to return immediately: > > NVMeIsrMsix ( > > … > > if (pAE->hwResetInProg) > > return TRUE; > > The second scenario above (NVMeAdapterControl-> ScsiRestartAdapter) is not > interrupt-driven by definition. That is, at the time it is called, > interrupts aren’t enabled. But even if it they were, the hwResetInProg flag > would catch it. > > A better solution would be to have a loop similar to the one at the end of > NVMePassiveInitialize where RunningStartAttempt is called, and is followed > by a loop that waits for the state machine to complete. > > [Judy] This is actually the first approach we took and were intending to > use but we found it didn’t work. The reason was the loop you refer to is > periodic timer-driven but the timer was not getting scheduled in the > NVMeAdapterControl-> ScsiRestartAdapter path as there is no timer available > at that point. The reason this is not an issue for the current OFA driver > is because we launch the state machine but then return from the call to > NVMeAdapterControl and let the state machine run asynchronously and > complete outside of that context (violates the spec). > > As the patch is currently written I am not comfortable approving it.This > change to wait for the state machine's completion could be made in the new > ReinitializeController function, and then you wouldn't need the changes to > RunningStartAttempt or any of the polledmode code. > > [Judy] The approach you propose will not work for the reason explained > above. Again, we too had first hoped it would but it won’t. Hence we went > to a polled-mode model. Since we have to finish all work before returning > anyway and since reset bus is not a performance path, there is no downside > to polling. > > Thanks, > > Carolyn > > -----Original Message----- > > From: nvmewin-bounces at lists.openfabrics.org [ > mailto:nvmewin-bounces at lists.openfabrics.org > ] On Behalf Of Alex Chang > > Sent: Tuesday, October 21, 2014 1:20 PM > > To: suman.p at samsung.com; nvmewin at lists.openfabrics.org > > Cc: cgps at samsung.com > > Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements > > Hi Suman, > > (1) There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which > is redundant because, when NextDriverState is NVMeStartComplete, in the end > of NVMeRunning, StorPortResume had been called already. > > (2) To comply with our agreed coding style and make the logic easier, may > I suggest changing Line#184 of nvmestat.c to: > > if (pAE->ntldrDump == FALSE) { > > if (pAE->polledMode == FALSE) { > > NVMeRunning(pAE); > > } else { > > /* > > * we poll if we're launching the reinit state machine from > HwStorResetBus > > * or HwStorAdapterControl->ScsiRestartAdapter path > > */ > > NVMeRunning(pAE); > > /* TO val is based on CAP register plus a few, 5, seconds to init > post RDY */ > > passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * > MICRO_TO_SEC); > > ... > > return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? > TRUE : FALSE; > > } > > } else { > > PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; > > ..... > > } > > Thank you! > > Alex > > From: SUMAN PRAKASH B [mailto:suman.p at samsung.com ] > > Sent: Wednesday, October 15, 2014 6:00 AM > > To: nvmewin at lists.openfabrics.org > > Cc: Alex Chang; cgps at samsung.com > > Subject: Samsung Patch for Bus Reset Enhancements > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > Date: %%SENT_DATE%% > > Subject: Suspect Message Quarantined > > WARNING: The virus scanner was unable to scan an attachment in an email > message sent to you. This attachment could possibly contain viruses or > other malicious programs. The attachment could not be scanned for the > following reasons: > > %%DESC%% > > The full message and the attachment have been stored in the quarantine. > > The identifier for this message is '%%QID%%'. > > Access the quarantine at: > > https://puremessage.pmc-sierra.bc.ca:28443/ > > > For more information on PMC's Anti-Spam system: > > http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ > > > IT Services > > PureMessage Admin > > Hi Everyone, > > We have a patch for the Bus Reset Enhancements. > > Please find attached the source code. The password is samsung123 > > Please find the change description below - > > 1. There are multiple paths in the driver that reset the controller and > execute the initialization state machine. Our patch is not concerned with > the majority of those paths. Aside from a few additional isolated > modifications, our patch focuses on the two paths that are supposed to be > synchronous -i.e. they should not return to caller until all work is > completed - but which currently are not so. They are: > > a) NVMeResetBus (and) > > b) NVMeAdapterControl-> ScsiRestartAdapter We have introduced a new > routine NVMeReInitializeController(), which will be invoked from > NVMeReseBus() and NVMeAdapterControl() - ScsiRestartAdapter. This routine > will reset and initialize the controller and then complete the requests. It > will not return until the initialization state machine is complete. > > We disallow processing of any SRB in NVMeStartIo() when NextDriverState != > NVMeStateComplete. In this way we direct the PowerUp operations to be > executed in NVMeAdapterControl() - ScsiRestartAdapter only. When resuming > from hibernation for example, NVMeStartio() will not process the POWER SRB. > Instead, the Power Up operations will be invoked in > NVMeAdapterControl()->ScsiRestartAdapter. > > Additionally , Miniport drivers should disregard requests to reset the bus > when ntldrDump is set to TRUE in NvmeResetBus(). But current implementation > processes this request. > > 2. When pAE->ntldrDump is TRUE, in the NVMeMapCore2Queue() routine, the > pPGT value is NULL. Hence a BSOD occurs when executing ULONG coreNum = > (ULONG)(pPN->Number + pPGT->BaseProcessor). We fixed the problem by moving > access to pPGT when ntldrDump is FALSE. > > 3. In ProcessIo(), when IoStatus is set to NOT_SUBMITTED, the SRB is not > completed. Due to this, a BSOD was occuring when executing WHCK test "DP > WLK - Hot-Add - Device test". We fixed the problem by changing the code to > complete SRB when IoStatus is NOT_SUBMITTED. > > 4. We changed the use of StorPortBusy()/StorPortReady() to > StorPortPause()/StorPortResume(), since StorPortBusy() will not prevent new > IOS from coming in once the current ones in the driver have been completed. > > Tested the following on Win7 and Windows 2012R2. > > - WHCK > > - Install/Uninstall, Enable/Disable, FS Format > > - Hibernation/Resume, Sleep/Resume > > - IOmeter > > Thanks, > > Suman > > _______________________________________________ > > nvmewin mailing list > > nvmewin at lists.openfabrics.org > > http://lists.openfabrics.org/mailman/listinfo/nvmewin > > > _______________________________________________ > > nvmewin mailing list > > nvmewin at lists.openfabrics.org > > http://lists.openfabrics.org/mailman/listinfo/nvmewin > > > > > > > [image: Image removed by sender.] > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=6bm0ONW6oD10WH5fcm7if9lz_4yuDxwQDOPCn5mU9Yw&s=Ju1jOe05ck9uEvZgQNGbhlcz8MS97eK9bwyDdJlF8SQ&e= > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: not available URL: From Alex.Chang at pmcs.com Mon Nov 3 16:51:15 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 4 Nov 2014 00:51:15 +0000 Subject: [nvmewin] Samsung Patch for Bus Reset Enhancements In-Reply-To: References: <14308_1414584116_5450D734_14308_11885_1_62.08.14702.F27D0545@epcpsbgx3.samsung.com> Message-ID: Thank you, Parag. Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Monday, November 03, 2014 3:54 PM To: Alex Chang Cc: Foster, Carolyn D; suman.p at samsung.com; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Alex, The changes look good. I approve the patch. Thanks Parag Sheth On Mon, Nov 3, 2014 at 1:49 PM, Alex Chang > wrote: Thanks, Carolyn. Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 03, 2014 1:30 PM To: Alex Chang; suman.p at samsung.com; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, I approve the patch. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, November 03, 2014 2:25 PM To: suman.p at samsung.com; Judy Brock-SSI; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn and Parag/Rick, Please let me know if you approve the patch. Thank you very much, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 29, 2014 5:02 AM To: Judy Brock-SSI; Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: RE: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, I have revised the patch with the following review comments incorporated - 1. Both polledMode and hwResetInProg are used for exactly same purpose. [Suman] As suggested by Judy, replaced all occurrences of hwResetInProg and polledMode with a single variable polledResetInProg. 2. There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant. [Suman] Changed code to call the StorPortResume(pAE) in Line2434 of nvmestd.c only when pAE->DriverState.NextDriverState != NVMeStartComplete. 3. To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } [Suman] Corrected as per suggestion. 4. Rename IoCompletionDpcRoutine to avoid confusion, since this routine will be called in both DPC and polled mode. [Suman] As suggested by Judy, renamed this routine to IoCompletionRoutine and added comment in header section that this routine can either be scheduled to run as a DPC or called directly. Please find attached the revised patch. Password is samsung123. Thanks all for reviewing. Thanks, Suman ------- Original Message ------- Sender : Judy Brock-SSI> Date : Oct 29, 2014 05:24 (GMT+05:00) Title : RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Thank you for your feedback. It is a good idea to eliminate confusion over the currently-named routine – IoCompletionDpcRoutine - which does indeed imply running as a DPC. We could either create a new function, as you suggest or alternatively, we could just rename IoCompletionDpcRoutine directly to for example, “ProcessIoCompletions” or just “IoCompletionRoutine” and then indicate at the top of that routine that it can either be scheduled to run as a DPC or called directly, depending on context. If we take the word “Dpc” out of it, I think it might eliminate all potential confusion. If it is ok with you, I think renaming IoCompletionDpcRoutine to no longer imply the method by which its invoked might be preferable as this would also cover/clarify the case where, in dump mode, NVMeIsrMsix is currently calling IoCompletionDpcRoutine directly. If you still prefer a new routine, to cover both cases above, the name should imply the direct call nature of the function rather than polled mode per se since, if we call it from the ISR it could lead to the opposite confusion – a function that implies no ISR involvement being called from the ISR itself. We could: Create new function called ImmediateCompletionProcessing (or some other name that implies non-deferred processing) which in turn would call IoCompletionDpcRoutine. But again, I think renaming IoCompletionDpcRoutine is better because no matter what we name a new function, if it still calls a routine with the word “Dpc” in it, it will still have the potential to create confusion in (especially new) readers minds. Thanks, Judy From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, October 28, 2014 4:24 PM To: Alex Chang; Judy Brock-SSI; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Alex, Judy and Suman, I have completed my testing of the proposed patch and verified that it works. I also agree with Judy’s comments below. Since we’re now calling a function that is normally called in a DPC, for clarity, I would like to see a new function that calls the DPC routine instead. You could give the new function a name that indicates that it will handle command completions in polled mode. Then that new function would call the DPC routine directly, and the new function would be called from RunningStartAttempt. This way it doesn’t look like the RunningStartAttempt routine is doing anything with DPCs. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, October 27, 2014 8:57 PM To: Judy Brock-SSI; Foster, Carolyn D; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Please let us know what you think. Thanks, Alex From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Thursday, October 23, 2014 6:45 PM To: Foster, Carolyn D; Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Subject: RE: Samsung Patch for Bus Reset Enhancements Hi Carolyn, Replies inline below in blue. Thanks, Judy -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, October 22, 2014 4:29 PM To: Alex Chang; suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, I have some feedback in addition to Alex's comments. I believe there is an issue with the loop that was added to NVMeRunningStartAttempt. The issue is that IoCompletionDpcRoutine was never meant to be called directly. It was architected and designed to always run from a DPC. [Judy] That’s because at runtime, we don’t want to be doing time-consuming request-completion work in the ISR. Therefore the work is offloaded to a DPC which runs at a lower IRQL. However, the work we need to do to process cmd completions is fixed - there is actually no innate architectural design impediment in the routine itself to calling this routine directly in the two scenarios our patch addresses – i.e., those situations where by architectural definition we are expected to finish all work before returning to the caller (and in our case, that includes sending and completing multiple commands in our init state machine ).. Those scenarios are the two that Suman listed in the change notes: a) NVMeResetBus b) NVMeAdapterControl-> ScsiRestartAdapter By design, we don’t want to schedule a DPC to handle completions for the commands generated by the init state machine in these 2 reset paths – we want to poll. That’s why we make the direct call instead. It's possible that a command from the init state machine could generate an interrupt and run the IoCompletionDpcRoutine before it can be called in RunningStartAttempt. [Judy] This can’t happen. If an interrupt is generated on behalf of a command from the init state machine during the first scenario above (NVMeResetBus), the hwResetInProg flag at the top of the ISR causes us to return immediately: NVMeIsrMsix ( … if (pAE->hwResetInProg) return TRUE; The second scenario above (NVMeAdapterControl-> ScsiRestartAdapter) is not interrupt-driven by definition. That is, at the time it is called, interrupts aren’t enabled. But even if it they were, the hwResetInProg flag would catch it. A better solution would be to have a loop similar to the one at the end of NVMePassiveInitialize where RunningStartAttempt is called, and is followed by a loop that waits for the state machine to complete. [Judy] This is actually the first approach we took and were intending to use but we found it didn’t work. The reason was the loop you refer to is periodic timer-driven but the timer was not getting scheduled in the NVMeAdapterControl-> ScsiRestartAdapter path as there is no timer available at that point. The reason this is not an issue for the current OFA driver is because we launch the state machine but then return from the call to NVMeAdapterControl and let the state machine run asynchronously and complete outside of that context (violates the spec). As the patch is currently written I am not comfortable approving it.This change to wait for the state machine's completion could be made in the new ReinitializeController function, and then you wouldn't need the changes to RunningStartAttempt or any of the polledmode code. [Judy] The approach you propose will not work for the reason explained above. Again, we too had first hoped it would but it won’t. Hence we went to a polled-mode model. Since we have to finish all work before returning anyway and since reset bus is not a performance path, there is no downside to polling. Thanks, Carolyn -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, October 21, 2014 1:20 PM To: suman.p at samsung.com; nvmewin at lists.openfabrics.org Cc: cgps at samsung.com Subject: Re: [nvmewin] Samsung Patch for Bus Reset Enhancements Hi Suman, (1) There is a call of StorPortResume(pAE) in Line2434 of nvmestd.c, which is redundant because, when NextDriverState is NVMeStartComplete, in the end of NVMeRunning, StorPortResume had been called already. (2) To comply with our agreed coding style and make the logic easier, may I suggest changing Line#184 of nvmestat.c to: if (pAE->ntldrDump == FALSE) { if (pAE->polledMode == FALSE) { NVMeRunning(pAE); } else { /* * we poll if we're launching the reinit state machine from HwStorResetBus * or HwStorAdapterControl->ScsiRestartAdapter path */ NVMeRunning(pAE); /* TO val is based on CAP register plus a few, 5, seconds to init post RDY */ passiveTimeout = pAE->uSecCrtlTimeout + (STORPORT_TIMER_CB_us * MICRO_TO_SEC); ... return (pAE->DriverState.NextDriverState == NVMeStartComplete) ? TRUE : FALSE; } } else { PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl; ..... } Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, October 15, 2014 6:00 AM To: nvmewin at lists.openfabrics.org Cc: Alex Chang; cgps at samsung.com Subject: Samsung Patch for Bus Reset Enhancements Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Bus Reset Enhancements. Please find attached the source code. The password is samsung123 Please find the change description below - 1. There are multiple paths in the driver that reset the controller and execute the initialization state machine. Our patch is not concerned with the majority of those paths. Aside from a few additional isolated modifications, our patch focuses on the two paths that are supposed to be synchronous -i.e. they should not return to caller until all work is completed - but which currently are not so. They are: a) NVMeResetBus (and) b) NVMeAdapterControl-> ScsiRestartAdapter We have introduced a new routine NVMeReInitializeController(), which will be invoked from NVMeReseBus() and NVMeAdapterControl() - ScsiRestartAdapter. This routine will reset and initialize the controller and then complete the requests. It will not return until the initialization state machine is complete. We disallow processing of any SRB in NVMeStartIo() when NextDriverState != NVMeStateComplete. In this way we direct the PowerUp operations to be executed in NVMeAdapterControl() - ScsiRestartAdapter only. When resuming from hibernation for example, NVMeStartio() will not process the POWER SRB. Instead, the Power Up operations will be invoked in NVMeAdapterControl()->ScsiRestartAdapter. Additionally , Miniport drivers should disregard requests to reset the bus when ntldrDump is set to TRUE in NvmeResetBus(). But current implementation processes this request. 2. When pAE->ntldrDump is TRUE, in the NVMeMapCore2Queue() routine, the pPGT value is NULL. Hence a BSOD occurs when executing ULONG coreNum = (ULONG)(pPN->Number + pPGT->BaseProcessor). We fixed the problem by moving access to pPGT when ntldrDump is FALSE. 3. In ProcessIo(), when IoStatus is set to NOT_SUBMITTED, the SRB is not completed. Due to this, a BSOD was occuring when executing WHCK test "DP WLK - Hot-Add - Device test". We fixed the problem by changing the code to complete SRB when IoStatus is NOT_SUBMITTED. 4. We changed the use of StorPortBusy()/StorPortReady() to StorPortPause()/StorPortResume(), since StorPortBusy() will not prevent new IOS from coming in once the current ones in the driver have been completed. Tested the following on Win7 and Windows 2012R2. - WHCK - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter Thanks, Suman _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/mailman/listinfo/nvmewin [cid:image001.gif at 01CFF786.4D82CD90] [Image removed by sender.] _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AAICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=6bm0ONW6oD10WH5fcm7if9lz_4yuDxwQDOPCn5mU9Yw&s=Ju1jOe05ck9uEvZgQNGbhlcz8MS97eK9bwyDdJlF8SQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Mon Nov 3 16:52:45 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 4 Nov 2014 00:52:45 +0000 Subject: [nvmewin] NVMe Windows DB Is LOCKED - Pushing Samsung Patch For Reset Enhancement Message-ID: Locking NVMe Windows DB. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Mon Nov 3 17:14:21 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 4 Nov 2014 01:14:21 +0000 Subject: [nvmewin] NVMe Windows DB Is UNLOCKED - Pushing Patch From Samsung For Bus Reset Enhancement Message-ID: Dear all, Thank you for reviewing/testing the patch from Samsung. Many thanks to Suman and Judy for contributing the patch. The patch had been pushed into the source base and a new tag called "Patch#30_Bus_Reset_Enhancement" had been created under "tags" directory. Samsung is scheduled to submit next patch after hot plug/remove with most current sources in the repository. Hi Suman and Judy, You may send out the patch when it's available for the community to review and test. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From suman.p at samsung.com Tue Nov 4 02:49:44 2014 From: suman.p at samsung.com (SUMAN PRAKASH B) Date: Tue, 04 Nov 2014 10:49:44 +0000 (GMT) Subject: [nvmewin] Samsung patch for Hot plug fixes Message-ID: <7A.8C.22636.84FA8545@epcpsbgx2.samsung.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411041620189_OC322OJW.gif Type: image/gif Size: 13168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Samung_Patch3_v1_11042014.zip Type: application/octet-stream Size: 188245 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ChangeDescription.txt Type: application/octet-stream Size: 5829 bytes Desc: not available URL: From Alex.Chang at pmcs.com Tue Nov 4 08:09:41 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 4 Nov 2014 16:09:41 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Thank you, Suman, for the swift response in getting the patch ready. Dear all, Please review/test the patch and provide feedback/comments if you have any. Thank you all, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF806.9B0F5800] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD303.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD303.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From Alex.Chang at pmcs.com Tue Nov 4 12:41:00 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 4 Nov 2014 20:41:00 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don't need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF81A.F641CFA0] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD393.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD393.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From judy.brock at ssi.samsung.com Tue Nov 4 17:02:03 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 5 Nov 2014 01:02:03 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B665F92@SSIEXCH-MB3.ssi.samsung.com> Hi Alex, Suman may have more to say about this but let me take a crack at answering your questions: 1. In NVMeInitCallback function, why we don't need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? The read cmds we send during learning are not necessarily expected to succeed - in fact we expect they very well may fail since it's early on in initialization so there is a great likely hood that namespaces are not ready at that point. We don't care about getting NAMESPACE_NOT_READY status back - we just want to get a cmd out to each submission queue in order to take note of which logical processor is running when the command completion associated with that command occurs. Since each queue pair is tied to a particular MSI-x vector , this allows us to "learn" what the optimal logical processor-to-queue pair relationship should be and we adjust our internal tables accordingly so as to ensure that IOs are launched and completed on the same logical processor to avoid the overhead of processor context switching, etc. That is the relationship we are trying to "learn" about - again, we don't care if the read cmds complete with success or error status, we just care about what the current logical processor is at the time of cmd completion, we just send them as a mechanism to discover what logical processor is associated with the completion queue associated with a given submission queue. 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Suman can speak to the decision to use this exact number but I believe one of the reasons is to ensure that when a device is surprise removed, it disappears from Device Manager right away. Thanks, Judy From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 04, 2014 12:41 PM To: suman.p at samsung.com; nvmewin at lists.openfabrics.org; Judy Brock-SSI Subject: RE: Samsung patch for Hot plug fixes Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don't need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF850.87C6AB50] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Wed Nov 5 08:02:41 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 5 Nov 2014 16:02:41 +0000 Subject: [nvmewin] FW: RE: Samsung patch for Hot plug fixes In-Reply-To: <74.1F.15273.6001A545@epcpsbgx1.samsung.com> References: <74.1F.15273.6001A545@epcpsbgx1.samsung.com> Message-ID: Dear all, Please see the reply from Suman and provide your thoughts. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, November 05, 2014 3:55 AM To: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: RE: Samsung patch for Hot plug fixes Hi Alex, 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? [Suman] As Judy has mentioned, the namespace may not be ready early on during initialization. We have tested with different devices and observed that one of the device returns NAMEPSACE_NOT_READY. And since this is not a fatal error, we can ignore checking the ST and SCT bits. Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? [Suman] Generally, reading device register in a PCIe SSD is costy and can degrade the performance. When we implemented this logic, there was no impact in performance when driver reads device register frequently, may be because the register read was in a separate thread and not in IO path. But still we did not want to keep reading the register very frequently and also when we hot remove the device during IO, device should not take more time to be removed from device manager. Hence 1 second delay was a trade off between "not reading device register frequently" and "remove the device from device manager as soon as possible". Thanks, Suman ------- Original Message ------- Sender : Alex Chang Date : Nov 05, 2014 01:41 (GMT+05:00) Title : RE: Samsung patch for Hot plug fixes Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:T9SZN3WZA6X7 at namo.co.kr] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411051725290_XOK0LK7C.gif Type: image/gif Size: 13168 bytes Desc: 201411051725290_XOK0LK7C.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD176.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD176.jpg URL: From judy.brock at ssi.samsung.com Wed Nov 5 13:55:33 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 5 Nov 2014 21:55:33 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <74.1F.15273.6001A545@epcpsbgx1.samsung.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B66638F@SSIEXCH-MB3.ssi.samsung.com> Hi All, >>And since this is not a fatal error, we can ignore checking the ST and SCT bits. And again, just to emphasize, the purpose of learning is not to see if we can access media w/out errors – the purpose is just to establish the MSI-x/logical processor relationship. This can be confirmed with the authors of the original learning code. >> Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. To clarify, the above describes the behavior of the driver if it does not ignore the ST and SCT bits. This is not the behavior of the submitted hot-plug patch driver. Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, November 05, 2014 8:03 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] FW: RE: Samsung patch for Hot plug fixes Dear all, Please see the reply from Suman and provide your thoughts. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, November 05, 2014 3:55 AM To: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: RE: Samsung patch for Hot plug fixes Hi Alex, 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? [Suman] As Judy has mentioned, the namespace may not be ready early on during initialization. We have tested with different devices and observed that one of the device returns NAMEPSACE_NOT_READY. And since this is not a fatal error, we can ignore checking the ST and SCT bits. Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? [Suman] Generally, reading device register in a PCIe SSD is costy and can degrade the performance. When we implemented this logic, there was no impact in performance when driver reads device register frequently, may be because the register read was in a separate thread and not in IO path. But still we did not want to keep reading the register very frequently and also when we hot remove the device during IO, device should not take more time to be removed from device manager. Hence 1 second delay was a trade off between "not reading device register frequently" and "remove the device from device manager as soon as possible". Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 05, 2014 01:41 (GMT+05:00) Title : RE: Samsung patch for Hot plug fixes Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF900.00A80000] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From raymond.c.robles at intel.com Wed Nov 5 14:09:50 2014 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Wed, 5 Nov 2014 22:09:50 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A516B66638F@SSIEXCH-MB3.ssi.samsung.com> References: <74.1F.15273.6001A545@epcpsbgx1.samsung.com> <36E8D38D6B771A4BBDB1C0D800158A516B66638F@SSIEXCH-MB3.ssi.samsung.com> Message-ID: <49158E750348AA499168FD41D88983606B5412A1@fmsmsx117.amr.corp.intel.com> The purpose of learning mode is to map the logical CPU core to the correct MSI-x vector. If there is a media error during the series of learning mode read commands, that does not affect the mapping. However, it does affect the functionality of the driver and software stack. The expectation when we first coded this was that the learning I/O was sent, and any failures would be ignored… with the understanding that as soon as the OS starts sending Report LUNs, Inquiries, Mode Sense, and Read/Writes… any media, or SC/SCT errors, would be caught during this phase. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Wednesday, November 05, 2014 2:56 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi All, >>And since this is not a fatal error, we can ignore checking the ST and SCT bits. And again, just to emphasize, the purpose of learning is not to see if we can access media w/out errors – the purpose is just to establish the MSI-x/logical processor relationship. This can be confirmed with the authors of the original learning code. >> Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. To clarify, the above describes the behavior of the driver if it does not ignore the ST and SCT bits. This is not the behavior of the submitted hot-plug patch driver. Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, November 05, 2014 8:03 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] FW: RE: Samsung patch for Hot plug fixes Dear all, Please see the reply from Suman and provide your thoughts. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, November 05, 2014 3:55 AM To: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: RE: Samsung patch for Hot plug fixes Hi Alex, 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? [Suman] As Judy has mentioned, the namespace may not be ready early on during initialization. We have tested with different devices and observed that one of the device returns NAMEPSACE_NOT_READY. And since this is not a fatal error, we can ignore checking the ST and SCT bits. Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? [Suman] Generally, reading device register in a PCIe SSD is costy and can degrade the performance. When we implemented this logic, there was no impact in performance when driver reads device register frequently, may be because the register read was in a separate thread and not in IO path. But still we did not want to keep reading the register very frequently and also when we hot remove the device during IO, device should not take more time to be removed from device manager. Hence 1 second delay was a trade off between "not reading device register frequently" and "remove the device from device manager as soon as possible". Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 05, 2014 01:41 (GMT+05:00) Title : RE: Samsung patch for Hot plug fixes Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF90A.8A705300] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From judy.brock at ssi.samsung.com Wed Nov 5 15:18:52 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 5 Nov 2014 23:18:52 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <49158E750348AA499168FD41D88983606B5412A1@fmsmsx117.amr.corp.intel.com> References: <74.1F.15273.6001A545@epcpsbgx1.samsung.com> <36E8D38D6B771A4BBDB1C0D800158A516B66638F@SSIEXCH-MB3.ssi.samsung.com> <49158E750348AA499168FD41D88983606B5412A1@fmsmsx117.amr.corp.intel.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B6663E2@SSIEXCH-MB3.ssi.samsung.com> Hi Ray, That is consistent with our understanding. That is why we modified the driver to ignore I/O failures only during learning – i.e. during the specific series of learning-mode read commands , per the original intent described below. All other errors – ie as soon as the OS starts sending Report LUNs, Inquiries, Mode Sense, and Read/Writes… any media, or SC/SCT errors, etc – are always reported. Thanks, Judy From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Wednesday, November 05, 2014 2:10 PM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Samsung patch for Hot plug fixes The purpose of learning mode is to map the logical CPU core to the correct MSI-x vector. If there is a media error during the series of learning mode read commands, that does not affect the mapping. However, it does affect the functionality of the driver and software stack. The expectation when we first coded this was that the learning I/O was sent, and any failures would be ignored… with the understanding that as soon as the OS starts sending Report LUNs, Inquiries, Mode Sense, and Read/Writes… any media, or SC/SCT errors, would be caught during this phase. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Wednesday, November 05, 2014 2:56 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi All, >>And since this is not a fatal error, we can ignore checking the ST and SCT bits. And again, just to emphasize, the purpose of learning is not to see if we can access media w/out errors – the purpose is just to establish the MSI-x/logical processor relationship. This can be confirmed with the authors of the original learning code. >> Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. To clarify, the above describes the behavior of the driver if it does not ignore the ST and SCT bits. This is not the behavior of the submitted hot-plug patch driver. Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, November 05, 2014 8:03 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] FW: RE: Samsung patch for Hot plug fixes Dear all, Please see the reply from Suman and provide your thoughts. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Wednesday, November 05, 2014 3:55 AM To: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: RE: Samsung patch for Hot plug fixes Hi Alex, 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? [Suman] As Judy has mentioned, the namespace may not be ready early on during initialization. We have tested with different devices and observed that one of the device returns NAMEPSACE_NOT_READY. And since this is not a fatal error, we can ignore checking the ST and SCT bits. Also when device returns NAMESPACE_NOT_READY, driver does not complete the learning cores and this may result in performance degradation as context switch will happen when device interrupts. 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? [Suman] Generally, reading device register in a PCIe SSD is costy and can degrade the performance. When we implemented this logic, there was no impact in performance when driver reads device register frequently, may be because the register read was in a separate thread and not in IO path. But still we did not want to keep reading the register very frequently and also when we hot remove the device during IO, device should not take more time to be removed from device manager. Hence 1 second delay was a trade off between "not reading device register frequently" and "remove the device from device manager as soon as possible". Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 05, 2014 01:41 (GMT+05:00) Title : RE: Samsung patch for Hot plug fixes Hi Suman, I have couple of questions for you: 1. In NVMeInitCallback function, why we don’t need to check SC and SCT anymore for NVMeWaitOnLearnMapping case? 2. The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others? Thank you! Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFF90B.CE481260] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Thu Nov 13 15:43:49 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 13 Nov 2014 23:43:49 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Hi Carolyn and Parag/Rick, If you approve the patch, please let me know. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFFF58.7F9CEF70] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD173.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD173.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From parag.sheth at seagate.com Thu Nov 13 17:31:33 2014 From: parag.sheth at seagate.com (Parag Sheth) Date: Thu, 13 Nov 2014 17:31:33 -0800 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Other than these 2, your changes look good. Thanks Parag Sheth On Thu, Nov 13, 2014 at 3:43 PM, Alex Chang wrote: > Hi Carolyn and Parag/Rick, > > If you approve the patch, please let me know. > > Thanks, > > Alex > > > > *From:* SUMAN PRAKASH B [mailto:suman.p at samsung.com] > *Sent:* Tuesday, November 04, 2014 2:50 AM > *To:* nvmewin at lists.openfabrics.org; Alex Chang; > judy.brock at ssi.samsung.com > *Subject:* Samsung patch for Hot plug fixes > > > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > Date: %%SENT_DATE%% > > Subject: Suspect Message Quarantined > > > > > > > > WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: > > > > %%DESC%% > > > > The full message and the attachment have been stored in the quarantine. > > > > The identifier for this message is '%%QID%%'. > > > > Access the quarantine at: > > https://puremessage.pmc-sierra.bc.ca:28443/ > > > > For more information on PMC's Anti-Spam system: > > http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ > > > > IT Services > > PureMessage Admin > > > > Hi Everyone, > > > We have a patch for the Hot plug fixes. > > Please find attached the source code. The password is samsung123 > > > > *Please find the change description below - * > > > > *1) Surprise removal while IOs are in progress.* > > *To reproduce this scenario -* > Connect the disk and execute IOmeter on the disk volume. When IOs are in > progress, surprise remove the device. User expects that the device should > be removed from device manager immediately and iometer should increase the > error count field. This does not happen since we don't handle this scenario > in OFA driver. > > *Resolution -* > a. Added a new function IsdeviceRemoved(). This is a recursive function. > Compares the values of Version Register values with old value and incase of > mismatch complete the outstanding commands with SRB_STATUS_ERROR. > (nvmeStd.c/IsDeviceRemoved) > b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set > to StartComplete.(nvmeStat.c/NVMeRunning) > c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. > (nvmeStat.c/NVMeAdapterControl) > d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. > (nvmeStat.c/NVMeAdapterControl) > e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. > (nvmeStd.c/NVMeBuildIo) > f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with > SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once > the device has been surprise removed. (nvmeStdc/NVMeBuildIo) > g. Modified the prototype of NVMeDetectPendingCmds function. When device > is surprise removed when IOs are pending, the outstanding IOs has to be > completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) > h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. > (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, > nvmeStd.c/RecoveryDpcRoutine) > > > > *2) Memory leak issues.* > > *To reproduce this scenario -* > a. Memory leak observed during hot removal in Resource monitor->Non-paged > pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) > b. Memory leak observed during disable/enable the NVMe controller in > device manager. > > *Resolution -* > To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is > StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove > device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and > then NVMeFreeBuffers is invoked to free the memory. At this point, since > the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is > done during NVMeAdapterControl() - ScsiStopAdapter -> > NVMeAdapterControlPowerDown(). > > > > *3) Surprise Removal during Disk Initialization* > > *To reproduce this scenario - * > Hot insert the device and hot remove the device immediately. At this > point, our driver might be executing the initialization state machine in > NVMePassiveInitialize. The device will not be immediately removed from the > device manger. The while loop will be active till passiveTimeout happens, > then system BSOD. > > *Resolution -* > a. Read the Version register. This is used to compare against the value in > version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) > b. Read the Version Register and compare with old Version Register > value(i.e. value read in NVMeFindAdapter). Mismatch in these values means > surprise removal. (nvmeStd.c/NVMePassiveInitialize) > c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO > to TRUE and return TRUE from NVMePassiveInitialize. > d. Driver may get commands in NVMeBuildIo, where driver returns > SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. > e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. > > > > *4) Delay in removing the device from device manager after hot removal of > device. *When device is hot removed, the NVMeAdapterControlPowerDown() -> > NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets > > the EN bit to 0 and waits for RDY bit to become 0. Since the device is > physically removed, the memory mapped registers will be come all 1's and > the RDY bit will never become 0. Hence the while loop in > NVMeWaitForCtrlRDY() is active for some time even after device removal and > hence device is not removed from device manager immediately. > > *Resolution -* > Check for the value of CSTS. If its 0xFFFFFFFF, then device has been > surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) > > > > *5) Avoid redundant call of NVMeResetAdapter()* > a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the > NVMeResetAdapter() function call from NVMeEnableAdapter() as this is > redundant. The NVMeResetAdapter() is being invoked in the > RecoveryDpcRouitne() and then again its being invoked in the > NVMeEnableAdapter. > b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before > the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does > again the same functionality. > > > > *6) *When testing hot insertion with different devices, we observed some > devices returned NAMESPACE_NOT_READY for IO commands during learning cores > and disk initialization(report luns, inquiry, etc). To address this issue > and provide support for these devices in the driver, we have done the > following changes. > a. During learning cores, driver sends read commands on all the queues to > get the core to MSI-x mapping. When the read commands are interrupted, in > the NVMeInitCallback(), if the SC and SCT values are not 0, then the > learning cores is not completed. This check is not required as driver wants > only the core to MSI-x mapping. Since this is not a fatal error, we can > skip reading the SC and SCT values, as this will impact the performance. > (nvmeInit.c/NVMeInitCallback). > b. Following the above, when the initialization state machine is complete > and kernel starts sending SCSI commands for disk initialization, and when > device returns NAMESPACE_NOT_READY, this has to be translated to the > corresponding SCSI sense data so that the commands will be re-tried after > some time. (nvmeSnti.c/genericCommandStatusTable[]). > > > *Tested the following.* > > - WHCK on Win7 and 2012R2 > - Install/Uninstall, Enable/Disable, FS Format > - Hibernation/Resume, Sleep/Resume > - IOmeter > - Hot removal which iometer is running. > - Hot removal immediately after hot insertion. > - Continous hot insert and remove operations. > - Check for device removal after following sequence - Hot insert, system > hibernation, Hot remove, system resume. > - Check for device presense after following sequence - System hibernation, > hot insert, system resume. > - Memory leaks during hot plug operations and disable/enable. > > > > Thanks, > Suman > > > > [image: Image removed by sender.] > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=rnmski2jYd1H6IRrh39Hr9NnmnJ4uxPihLMFtc2S26w&s=qG6N7pG14tIF4qKcGWFXM52doegl5fefT0BHkh4qUO0&e= > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD173.jpg Type: image/jpeg Size: 823 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: not available URL: From carolyn.d.foster at intel.com Fri Nov 14 09:12:51 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Fri, 14 Nov 2014 17:12:51 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Hi Alex, and Suman, I don’t have any additional feedback for this patch from what Parag has already provided. Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Parag Sheth Sent: Thursday, November 13, 2014 6:32 PM To: Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Other than these 2, your changes look good. Thanks Parag Sheth On Thu, Nov 13, 2014 at 3:43 PM, Alex Chang > wrote: Hi Carolyn and Parag/Rick, If you approve the patch, please let me know. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01CFFFF3.8AC97CB0] [Image removed by sender.] _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=rnmski2jYd1H6IRrh39Hr9NnmnJ4uxPihLMFtc2S26w&s=qG6N7pG14tIF4qKcGWFXM52doegl5fefT0BHkh4qUO0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From parag.sheth at seagate.com Fri Nov 14 13:55:53 2014 From: parag.sheth at seagate.com (Parag Sheth) Date: Fri, 14 Nov 2014 13:55:53 -0800 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: Message-ID: Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B wrote: > Hi Parag, > > > > Thanks for your feedback. Please find my comments inline. > > > > Thanks, > > Suman > > > > ------- *Original Message* ------- > > *Sender* : Parag Sheth > > *Date* : Nov 14, 2014 06:31 (GMT+05:00) > > *Title* : Re: [nvmewin] Samsung patch for Hot plug fixes > > > Hi Suman, > > Here are my observations > > 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is > removed. And as per your explanation, this is to avoid delay in removing > device from device manager. But returning TRUE is not intuitive. This > function actually failed and hence we should be returning FALSE. And if > this breaks your flow than the max delay would be 500 milliseconds. I would > say that is negligible from user display point of view. > *[Suman] *Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() > when device is removed. We changed it to TRUE for the following reason. > When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> > NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() > was returning FALSE because of which the pAE->ShutdownInProgess was not set > to TRUE in NVMeAdaptercontrolPowerDown. > After this when NVMeAdapterControl() -> ScsiStopAdapter -> > NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess > was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and > NVMeResetAdapter() is executed again. This should not be executed second > time. > To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that > pAE->ShutdownInProgress is set to TRUE. > > To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the > pAE->ShutdownInProgress inside the else part and return FALSE when device > is removed from NVMeWaitForCtrlRDY() as follows. > NVMeAdapterControlPowerDown() > { > ... > if (pAE->ShutdownInProgress == TRUE) { > /* Shutdown */ > status = TRUE; > } else { > pAE->ShutdownInProgress = TRUE; > /* Hibernate or Sleep - sanity check that there is no cmd pending > */ > if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == > TRUE) > return status; > /* Stop the controller, but do not free the resources */ > if (NVMeResetAdapter(pAE) != TRUE) { > return (FALSE); > } > } > ... > } > Kindly let us know your opinion. > > > 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the > 3rd parameter. > *[Suman] *Yes. We will change this. > > > Other than these 2, your changes look good. > > Thanks > Parag Sheth > > On Thu, Nov 13, 2014 at 3:43 PM, Alex Chang wrote: > >> Hi Carolyn and Parag/Rick, >> >> If you approve the patch, please let me know. >> >> Thanks, >> >> Alex >> >> >> >> *From:* SUMAN PRAKASH B [mailto:suman.p at samsung.com] >> *Sent:* Tuesday, November 04, 2014 2:50 AM >> *To:* nvmewin at lists.openfabrics.org; Alex Chang; >> judy.brock at ssi.samsung.com >> *Subject:* Samsung patch for Hot plug fixes >> >> >> >> Content-Type: text/plain; charset=UTF-8 >> >> Content-Transfer-Encoding: 8bit >> >> Date: %%SENT_DATE%% >> >> Subject: Suspect Message Quarantined >> >> >> >> >> >> >> >> WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: >> >> >> >> %%DESC%% >> >> >> >> The full message and the attachment have been stored in the quarantine. >> >> >> >> The identifier for this message is '%%QID%%'. >> >> >> >> Access the quarantine at: >> >> https://puremessage.pmc-sierra.bc.ca:28443/ >> >> >> >> For more information on PMC's Anti-Spam system: >> >> http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ >> >> >> >> IT Services >> >> PureMessage Admin >> >> >> >> Hi Everyone, >> >> >> We have a patch for the Hot plug fixes. >> >> Please find attached the source code. The password is samsung123 >> >> >> >> *Please find the change description below - * >> >> >> >> *1) Surprise removal while IOs are in progress.* >> >> *To reproduce this scenario -* >> Connect the disk and execute IOmeter on the disk volume. When IOs are in >> progress, surprise remove the device. User expects that the device should >> be removed from device manager immediately and iometer should increase the >> error count field. This does not happen since we don't handle this scenario >> in OFA driver. >> >> *Resolution -* >> a. Added a new function IsdeviceRemoved(). This is a recursive function. >> Compares the values of Version Register values with old value and incase of >> mismatch complete the outstanding commands with SRB_STATUS_ERROR. >> (nvmeStd.c/IsDeviceRemoved) >> b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set >> to StartComplete.(nvmeStat.c/NVMeRunning) >> c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. >> (nvmeStat.c/NVMeAdapterControl) >> d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. >> (nvmeStat.c/NVMeAdapterControl) >> e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. >> (nvmeStd.c/NVMeBuildIo) >> f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with >> SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once >> the device has been surprise removed. (nvmeStdc/NVMeBuildIo) >> g. Modified the prototype of NVMeDetectPendingCmds function. When device >> is surprise removed when IOs are pending, the outstanding IOs has to be >> completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) >> h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. >> (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, >> nvmeStd.c/RecoveryDpcRoutine) >> >> >> >> *2) Memory leak issues.* >> >> *To reproduce this scenario -* >> a. Memory leak observed during hot removal in Resource monitor->Non-paged >> pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) >> b. Memory leak observed during disable/enable the NVMe controller in >> device manager. >> >> *Resolution -* >> To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is >> StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove >> device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and >> then NVMeFreeBuffers is invoked to free the memory. At this point, since >> the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is >> done during NVMeAdapterControl() - ScsiStopAdapter -> >> NVMeAdapterControlPowerDown(). >> >> >> >> *3) Surprise Removal during Disk Initialization* >> >> *To reproduce this scenario - * >> Hot insert the device and hot remove the device immediately. At this >> point, our driver might be executing the initialization state machine in >> NVMePassiveInitialize. The device will not be immediately removed from the >> device manger. The while loop will be active till passiveTimeout happens, >> then system BSOD. >> >> *Resolution -* >> a. Read the Version register. This is used to compare against the value >> in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) >> b. Read the Version Register and compare with old Version Register >> value(i.e. value read in NVMeFindAdapter). Mismatch in these values means >> surprise removal. (nvmeStd.c/NVMePassiveInitialize) >> c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO >> to TRUE and return TRUE from NVMePassiveInitialize. >> d. Driver may get commands in NVMeBuildIo, where driver returns >> SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. >> e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. >> >> >> >> *4) Delay in removing the device from device manager after hot removal of >> device. *When device is hot removed, the NVMeAdapterControlPowerDown() >> -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets >> >> the EN bit to 0 and waits for RDY bit to become 0. Since the device is >> physically removed, the memory mapped registers will be come all 1's and >> the RDY bit will never become 0. Hence the while loop in >> NVMeWaitForCtrlRDY() is active for some time even after device removal and >> hence device is not removed from device manager immediately. >> >> *Resolution -* >> Check for the value of CSTS. If its 0xFFFFFFFF, then device has been >> surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) >> >> >> >> *5) Avoid redundant call of NVMeResetAdapter()* >> a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the >> NVMeResetAdapter() function call from NVMeEnableAdapter() as this is >> redundant. The NVMeResetAdapter() is being invoked in the >> RecoveryDpcRouitne() and then again its being invoked in the >> NVMeEnableAdapter. >> b. In the NVMeInitialize() function the EN and RDY bit are set to 0 >> before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() >> does again the same functionality. >> >> >> >> *6) *When testing hot insertion with different devices, we observed some >> devices returned NAMESPACE_NOT_READY for IO commands during learning cores >> and disk initialization(report luns, inquiry, etc). To address this issue >> and provide support for these devices in the driver, we have done the >> following changes. >> a. During learning cores, driver sends read commands on all the queues to >> get the core to MSI-x mapping. When the read commands are interrupted, in >> the NVMeInitCallback(), if the SC and SCT values are not 0, then the >> learning cores is not completed. This check is not required as driver wants >> only the core to MSI-x mapping. Since this is not a fatal error, we can >> skip reading the SC and SCT values, as this will impact the performance. >> (nvmeInit.c/NVMeInitCallback). >> b. Following the above, when the initialization state machine is complete >> and kernel starts sending SCSI commands for disk initialization, and when >> device returns NAMESPACE_NOT_READY, this has to be translated to the >> corresponding SCSI sense data so that the commands will be re-tried after >> some time. (nvmeSnti.c/genericCommandStatusTable[]). >> >> >> *Tested the following.* >> >> - WHCK on Win7 and 2012R2 >> - Install/Uninstall, Enable/Disable, FS Format >> - Hibernation/Resume, Sleep/Resume >> - IOmeter >> - Hot removal which iometer is running. >> - Hot removal immediately after hot insertion. >> - Continous hot insert and remove operations. >> - Check for device removal after following sequence - Hot insert, system >> hibernation, Hot remove, system resume. >> - Check for device presense after following sequence - System >> hibernation, hot insert, system resume. >> - Memory leaks during hot plug operations and disable/enable. >> >> >> >> Thanks, >> Suman >> >> >> >> [image: Image removed by sender.] >> >> _______________________________________________ >> nvmewin mailing list >> nvmewin at lists.openfabrics.org >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=rnmski2jYd1H6IRrh39Hr9NnmnJ4uxPihLMFtc2S26w&s=qG6N7pG14tIF4qKcGWFXM52doegl5fefT0BHkh4qUO0&e= >> >> > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411141410434_9220TQUP.gif Type: image/gif Size: 13168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411141410423_QZUWXYH6.jpg Type: image/jpeg Size: 823 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411141410413_Y5W7Z1SF.gif Type: image/gif Size: 13168 bytes Desc: not available URL: From Alex.Chang at pmcs.com Fri Nov 14 14:34:59 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 14 Nov 2014 22:34:59 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <13088_1415098194_5458AF52_13088_13811_1_FA.8C.22636.84FA8545@epcpsbgx2.samsung.com> Message-ID: Thanks, Carolyn. Have a great weekend! Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, November 14, 2014 9:13 AM To: Parag Sheth; Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: RE: [nvmewin] Samsung patch for Hot plug fixes Hi Alex, and Suman, I don’t have any additional feedback for this patch from what Parag has already provided. Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Parag Sheth Sent: Thursday, November 13, 2014 6:32 PM To: Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Other than these 2, your changes look good. Thanks Parag Sheth On Thu, Nov 13, 2014 at 3:43 PM, Alex Chang > wrote: Hi Carolyn and Parag/Rick, If you approve the patch, please let me know. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01D00018.0B0868E0] [Image removed by sender.] _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=rnmski2jYd1H6IRrh39Hr9NnmnJ4uxPihLMFtc2S26w&s=qG6N7pG14tIF4qKcGWFXM52doegl5fefT0BHkh4qUO0&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Fri Nov 14 14:36:14 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 14 Nov 2014 22:36:14 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: Message-ID: Hi Suman, Could you please revise the codes, test and send it out at your earliest convenience? Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Friday, November 14, 2014 1:56 PM To: suman.p at samsung.com Cc: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: Hi Parag, Thanks for your feedback. Please find my comments inline. Thanks, Suman ------- Original Message ------- Sender : Parag Sheth> Date : Nov 14, 2014 06:31 (GMT+05:00) Title : Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. [Suman] Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() when device is removed. We changed it to TRUE for the following reason. When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() was returning FALSE because of which the pAE->ShutdownInProgess was not set to TRUE in NVMeAdaptercontrolPowerDown. After this when NVMeAdapterControl() -> ScsiStopAdapter -> NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and NVMeResetAdapter() is executed again. This should not be executed second time. To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that pAE->ShutdownInProgress is set to TRUE. To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the pAE->ShutdownInProgress inside the else part and return FALSE when device is removed from NVMeWaitForCtrlRDY() as follows. NVMeAdapterControlPowerDown() { ... if (pAE->ShutdownInProgress == TRUE) { /* Shutdown */ status = TRUE; } else { pAE->ShutdownInProgress = TRUE; /* Hibernate or Sleep - sanity check that there is no cmd pending */ if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == TRUE) return status; /* Stop the controller, but do not free the resources */ if (NVMeResetAdapter(pAE) != TRUE) { return (FALSE); } } ... } Kindly let us know your opinion. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. [Suman] Yes. We will change this. Other than these 2, your changes look good. Thanks Parag Sheth On Thu, Nov 13, 2014 at 3:43 PM, Alex Chang > wrote: Hi Carolyn and Parag/Rick, If you approve the patch, please let me know. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Tuesday, November 04, 2014 2:50 AM To: nvmewin at lists.openfabrics.org; Alex Chang; judy.brock at ssi.samsung.com Subject: Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Everyone, We have a patch for the Hot plug fixes. Please find attached the source code. The password is samsung123 Please find the change description below - 1) Surprise removal while IOs are in progress. To reproduce this scenario - Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver. Resolution - a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved) b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning) c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl) d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl) e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo) f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo) g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds) h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine) 2) Memory leak issues. To reproduce this scenario - a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool) b. Memory leak observed during disable/enable the NVMe controller in device manager. Resolution - To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown(). 3) Surprise Removal during Disk Initialization To reproduce this scenario - Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD. Resolution - a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter) b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize) c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize. d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE. e. Then NVMeAdapterControl() - ScsiStopAdapter is executed. 4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately. Resolution - Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY) 5) Avoid redundant call of NVMeResetAdapter() a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant. The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter. b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality. 6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes. a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback). b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]). Tested the following. - WHCK on Win7 and 2012R2 - Install/Uninstall, Enable/Disable, FS Format - Hibernation/Resume, Sleep/Resume - IOmeter - Hot removal which iometer is running. - Hot removal immediately after hot insertion. - Continous hot insert and remove operations. - Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume. - Check for device presense after following sequence - System hibernation, hot insert, system resume. - Memory leaks during hot plug operations and disable/enable. Thanks, Suman [cid:image001.gif at 01D00018.37EF2920] [Image removed by sender.] _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openfabrics.org_mailman_listinfo_nvmewin&d=AwICAg&c=IGDlg0lD0b-nebmJJ0Kp8A&r=QOwFo5M7MYyQeT06CcSuSQHSUdSO20xC9GZe6-T9Svk&m=rnmski2jYd1H6IRrh39Hr9NnmnJ4uxPihLMFtc2S26w&s=qG6N7pG14tIF4qKcGWFXM52doegl5fefT0BHkh4qUO0&e= [cid:image001.gif at 01D00018.37EF2920] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD400.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD400.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From suman.p at samsung.com Mon Nov 17 01:17:22 2014 From: suman.p at samsung.com (SUMAN PRAKASH B) Date: Mon, 17 Nov 2014 09:17:22 +0000 (GMT) Subject: [nvmewin] Samsung patch for Hot plug fixes Message-ID: <5F.98.14702.22DB9645@epcpsbgx3.samsung.com> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411171448343_T9SZN3WZ.gif Type: image/gif Size: 13168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Samung_Patch3_v2_11172014.zip Type: application/octet-stream Size: 188273 bytes Desc: not available URL: From Alex.Chang at pmcs.com Mon Nov 17 08:21:14 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Nov 2014 16:21:14 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> References: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> Message-ID: Dear all, Please review/test the revised patch and provide your feedback. I will start to collect approvals on Thursday. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Monday, November 17, 2014 1:17 AM To: nvmewin at lists.openfabrics.org; Alex Chang; parag.sheth at seagate.com; judy.brock at ssi.samsung.com Subject: RE: Re: [nvmewin] Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, Please find attached the revised code with the following review comments incorporated. Password is samsung123. 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. a. Moved the pAE->ShutdownInProgress inside the else part in NVMeAdapterControlPowerDown(). b. Returned FALSE from NVMeWaitForCtrlRDY() when device is removed. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Function brief updated for NVMeDetectPendingCmds() for the 3rd parameter. Thanks, Suman ------- Original Message ------- Sender : Alex Chang Date : Nov 15, 2014 03:36 (GMT+05:00) Title : RE: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Could you please revise the codes, test and send it out at your earliest convenience? Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Friday, November 14, 2014 1:56 PM To: suman.p at samsung.com Cc: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: Hi Parag, Thanks for your feedback. Please find my comments inline. Thanks, Suman ------- Original Message ------- Sender : Parag Sheth> Date : Nov 14, 2014 06:31 (GMT+05:00) Title : Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. [Suman] Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() when device is removed. We changed it to TRUE for the following reason. When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() was returning FALSE because of which the pAE->ShutdownInProgess was not set to TRUE in NVMeAdaptercontrolPowerDown. After this when NVMeAdapterControl() -> ScsiStopAdapter -> NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and NVMeResetAdapter() is executed again. This should not be executed second time. To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that pAE->ShutdownInProgress is set to TRUE. To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the pAE->ShutdownInProgress inside the else part and return FALSE when device is removed from NVMeWaitForCtrlRDY() as follows. NVMeAdapterControlPowerDown() { ... if (pAE->ShutdownInProgress == TRUE) { /* Shutdown */ status = TRUE; } else { pAE->ShutdownInProgress = TRUE; /* Hibernate or Sleep - sanity check that there is no cmd pending */ if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == TRUE) return status; /* Stop the controller, but do not free the resources */ if (NVMeResetAdapter(pAE) != TRUE) { return (FALSE); } } ... } Kindly let us know your opinion. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. [Suman] Yes. We will change this. Other than these 2, your changes look good. Thanks Parag Sheth [cid:image001.gif at 01D0023F.70A856A0] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD359.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD359.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From parag.sheth at seagate.com Mon Nov 17 11:44:47 2014 From: parag.sheth at seagate.com (Parag Sheth) Date: Mon, 17 Nov 2014 11:44:47 -0800 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> Message-ID: Hi Alex, These changes look good. we approve the patch. Thanks Parag Sheth On Mon, Nov 17, 2014 at 8:21 AM, Alex Chang wrote: > Dear all, > > > > Please review/test the revised patch and provide your feedback. I will > start to collect approvals on Thursday. > > > > Thanks, > > Alex > > > > *From:* SUMAN PRAKASH B [mailto:suman.p at samsung.com] > *Sent:* Monday, November 17, 2014 1:17 AM > *To:* nvmewin at lists.openfabrics.org; Alex Chang; parag.sheth at seagate.com; > judy.brock at ssi.samsung.com > *Subject:* RE: Re: [nvmewin] Samsung patch for Hot plug fixes > > > > Content-Type: text/plain; charset=UTF-8 > > Content-Transfer-Encoding: 8bit > > Date: %%SENT_DATE%% > > Subject: Suspect Message Quarantined > > > > > > > > WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: > > > > %%DESC%% > > > > The full message and the attachment have been stored in the quarantine. > > > > The identifier for this message is '%%QID%%'. > > > > Access the quarantine at: > > https://puremessage.pmc-sierra.bc.ca:28443/ > > > > For more information on PMC's Anti-Spam system: > > http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ > > > > IT Services > > PureMessage Admin > > > > Hi Alex, > > > > Please find attached the revised code with the following review comments > incorporated. Password is *samsung123*. > > > > 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is > removed. And as per your explanation, this is to avoid delay in removing > device from device manager. But returning TRUE is not intuitive. This > function actually failed and hence we should be returning FALSE. And if > this breaks your flow than the max delay would be 500 milliseconds. I would > say that is negligible from user display point of view. > > a. Moved the pAE->ShutdownInProgress inside the else part in > NVMeAdapterControlPowerDown(). > b. Returned FALSE from NVMeWaitForCtrlRDY() when device is removed. > > > > 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the > 3rd parameter. > Function brief updated for NVMeDetectPendingCmds() for the 3rd parameter. > > > > Thanks, > > Suman > > > > > > > > ------- *Original Message* ------- > > *Sender* : Alex Chang > > *Date* : Nov 15, 2014 03:36 (GMT+05:00) > > *Title* : RE: Re: [nvmewin] Samsung patch for Hot plug fixes > > Hi Suman, > > > > Could you please revise the codes, test and send it out at your earliest > convenience? > > > > Thank you! > > Alex > > > > *From:* Parag Sheth [mailto:parag.sheth at seagate.com] > *Sent:* Friday, November 14, 2014 1:56 PM > *To:* suman.p at samsung.com > *Cc:* Alex Chang; nvmewin at lists.openfabrics.org; > judy.brock at ssi.samsung.com > *Subject:* Re: Re: [nvmewin] Samsung patch for Hot plug fixes > > > > Hi Suman, > > > > As long as this change passes all your test cases - I am ok with that. > > > > Thanks > > Parag Sheth > > > > On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: > > Hi Parag, > > > > Thanks for your feedback. Please find my comments inline. > > > > Thanks, > > Suman > > > > ------- *Original Message* ------- > > *Sender* : Parag Sheth > > *Date* : Nov 14, 2014 06:31 (GMT+05:00) > > *Title* : Re: [nvmewin] Samsung patch for Hot plug fixes > > > > Hi Suman, > > > > Here are my observations > > > > 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is > removed. And as per your explanation, this is to avoid delay in removing > device from device manager. But returning TRUE is not intuitive. This > function actually failed and hence we should be returning FALSE. And if > this breaks your flow than the max delay would be 500 milliseconds. I would > say that is negligible from user display point of view. > > *[Suman] *Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() > when device is removed. We changed it to TRUE for the following reason. > > When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> > NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() > was returning FALSE because of which the pAE->ShutdownInProgess was not set > to TRUE in NVMeAdaptercontrolPowerDown. > After this when NVMeAdapterControl() -> ScsiStopAdapter -> > NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess > was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and > NVMeResetAdapter() is executed again. This should not be executed second > time. > To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that > pAE->ShutdownInProgress is set to TRUE. > > > > To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the > pAE->ShutdownInProgress inside the else part and return FALSE when device > is removed from NVMeWaitForCtrlRDY() as follows. > > NVMeAdapterControlPowerDown() > { > ... > if (pAE->ShutdownInProgress == TRUE) { > /* Shutdown */ > status = TRUE; > } else { > pAE->ShutdownInProgress = TRUE; > > /* Hibernate or Sleep - sanity check that there is no cmd pending > */ > if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == > TRUE) > return status; > > /* Stop the controller, but do not free the resources */ > if (NVMeResetAdapter(pAE) != TRUE) { > return (FALSE); > } > } > ... > } > > Kindly let us know your opinion. > > > > > 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the > 3rd parameter. > > *[Suman] *Yes. We will change this. > > > > > Other than these 2, your changes look good. > > > > Thanks > > Parag Sheth > > > > [image: Image removed by sender.] > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD359.jpg Type: image/jpeg Size: 823 bytes Desc: not available URL: From carolyn.d.foster at intel.com Mon Nov 17 12:00:32 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Mon, 17 Nov 2014 20:00:32 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> Message-ID: Hi Alex, we approve the patch as well. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Parag Sheth Sent: Monday, November 17, 2014 12:45 PM To: Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Alex, These changes look good. we approve the patch. Thanks Parag Sheth On Mon, Nov 17, 2014 at 8:21 AM, Alex Chang > wrote: Dear all, Please review/test the revised patch and provide your feedback. I will start to collect approvals on Thursday. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Monday, November 17, 2014 1:17 AM To: nvmewin at lists.openfabrics.org; Alex Chang; parag.sheth at seagate.com; judy.brock at ssi.samsung.com Subject: RE: Re: [nvmewin] Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, Please find attached the revised code with the following review comments incorporated. Password is samsung123. 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. a. Moved the pAE->ShutdownInProgress inside the else part in NVMeAdapterControlPowerDown(). b. Returned FALSE from NVMeWaitForCtrlRDY() when device is removed. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Function brief updated for NVMeDetectPendingCmds() for the 3rd parameter. Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 15, 2014 03:36 (GMT+05:00) Title : RE: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Could you please revise the codes, test and send it out at your earliest convenience? Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Friday, November 14, 2014 1:56 PM To: suman.p at samsung.com Cc: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: Hi Parag, Thanks for your feedback. Please find my comments inline. Thanks, Suman ------- Original Message ------- Sender : Parag Sheth> Date : Nov 14, 2014 06:31 (GMT+05:00) Title : Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. [Suman] Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() when device is removed. We changed it to TRUE for the following reason. When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() was returning FALSE because of which the pAE->ShutdownInProgess was not set to TRUE in NVMeAdaptercontrolPowerDown. After this when NVMeAdapterControl() -> ScsiStopAdapter -> NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and NVMeResetAdapter() is executed again. This should not be executed second time. To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that pAE->ShutdownInProgress is set to TRUE. To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the pAE->ShutdownInProgress inside the else part and return FALSE when device is removed from NVMeWaitForCtrlRDY() as follows. NVMeAdapterControlPowerDown() { ... if (pAE->ShutdownInProgress == TRUE) { /* Shutdown */ status = TRUE; } else { pAE->ShutdownInProgress = TRUE; /* Hibernate or Sleep - sanity check that there is no cmd pending */ if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == TRUE) return status; /* Stop the controller, but do not free the resources */ if (NVMeResetAdapter(pAE) != TRUE) { return (FALSE); } } ... } Kindly let us know your opinion. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. [Suman] Yes. We will change this. Other than these 2, your changes look good. Thanks Parag Sheth [cid:image001.gif at 01D00266.7602F750] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Mon Nov 17 12:03:29 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Nov 2014 20:03:29 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> Message-ID: Great ! Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Monday, November 17, 2014 11:45 AM To: Alex Chang Cc: suman.p at samsung.com; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Alex, These changes look good. we approve the patch. Thanks Parag Sheth On Mon, Nov 17, 2014 at 8:21 AM, Alex Chang > wrote: Dear all, Please review/test the revised patch and provide your feedback. I will start to collect approvals on Thursday. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Monday, November 17, 2014 1:17 AM To: nvmewin at lists.openfabrics.org; Alex Chang; parag.sheth at seagate.com; judy.brock at ssi.samsung.com Subject: RE: Re: [nvmewin] Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, Please find attached the revised code with the following review comments incorporated. Password is samsung123. 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. a. Moved the pAE->ShutdownInProgress inside the else part in NVMeAdapterControlPowerDown(). b. Returned FALSE from NVMeWaitForCtrlRDY() when device is removed. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Function brief updated for NVMeDetectPendingCmds() for the 3rd parameter. Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 15, 2014 03:36 (GMT+05:00) Title : RE: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Could you please revise the codes, test and send it out at your earliest convenience? Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Friday, November 14, 2014 1:56 PM To: suman.p at samsung.com Cc: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: Hi Parag, Thanks for your feedback. Please find my comments inline. Thanks, Suman ------- Original Message ------- Sender : Parag Sheth> Date : Nov 14, 2014 06:31 (GMT+05:00) Title : Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. [Suman] Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() when device is removed. We changed it to TRUE for the following reason. When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() was returning FALSE because of which the pAE->ShutdownInProgess was not set to TRUE in NVMeAdaptercontrolPowerDown. After this when NVMeAdapterControl() -> ScsiStopAdapter -> NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and NVMeResetAdapter() is executed again. This should not be executed second time. To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that pAE->ShutdownInProgress is set to TRUE. To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the pAE->ShutdownInProgress inside the else part and return FALSE when device is removed from NVMeWaitForCtrlRDY() as follows. NVMeAdapterControlPowerDown() { ... if (pAE->ShutdownInProgress == TRUE) { /* Shutdown */ status = TRUE; } else { pAE->ShutdownInProgress = TRUE; /* Hibernate or Sleep - sanity check that there is no cmd pending */ if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == TRUE) return status; /* Stop the controller, but do not free the resources */ if (NVMeResetAdapter(pAE) != TRUE) { return (FALSE); } } ... } Kindly let us know your opinion. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. [Suman] Yes. We will change this. Other than these 2, your changes look good. Thanks Parag Sheth [cid:image001.gif at 01D0025E.7F04B260] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Mon Nov 17 12:03:52 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Nov 2014 20:03:52 +0000 Subject: [nvmewin] Samsung patch for Hot plug fixes In-Reply-To: References: <5942_1416215872_5469BD3F_5942_3698_1_AF.98.14702.22DB9645@epcpsbgx3.samsung.com> Message-ID: Great ! Thank you, Carolyn. Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 17, 2014 12:01 PM To: Parag Sheth; Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: RE: [nvmewin] Samsung patch for Hot plug fixes Hi Alex, we approve the patch as well. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Parag Sheth Sent: Monday, November 17, 2014 12:45 PM To: Alex Chang Cc: nvmewin at lists.openfabrics.org; suman.p at samsung.com Subject: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Alex, These changes look good. we approve the patch. Thanks Parag Sheth On Mon, Nov 17, 2014 at 8:21 AM, Alex Chang > wrote: Dear all, Please review/test the revised patch and provide your feedback. I will start to collect approvals on Thursday. Thanks, Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Monday, November 17, 2014 1:17 AM To: nvmewin at lists.openfabrics.org; Alex Chang; parag.sheth at seagate.com; judy.brock at ssi.samsung.com Subject: RE: Re: [nvmewin] Samsung patch for Hot plug fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, Please find attached the revised code with the following review comments incorporated. Password is samsung123. 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. a. Moved the pAE->ShutdownInProgress inside the else part in NVMeAdapterControlPowerDown(). b. Returned FALSE from NVMeWaitForCtrlRDY() when device is removed. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. Function brief updated for NVMeDetectPendingCmds() for the 3rd parameter. Thanks, Suman ------- Original Message ------- Sender : Alex Chang> Date : Nov 15, 2014 03:36 (GMT+05:00) Title : RE: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Could you please revise the codes, test and send it out at your earliest convenience? Thank you! Alex From: Parag Sheth [mailto:parag.sheth at seagate.com] Sent: Friday, November 14, 2014 1:56 PM To: suman.p at samsung.com Cc: Alex Chang; nvmewin at lists.openfabrics.org; judy.brock at ssi.samsung.com Subject: Re: Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, As long as this change passes all your test cases - I am ok with that. Thanks Parag Sheth On Fri, Nov 14, 2014 at 12:40 AM, SUMAN PRAKASH B > wrote: Hi Parag, Thanks for your feedback. Please find my comments inline. Thanks, Suman ------- Original Message ------- Sender : Parag Sheth> Date : Nov 14, 2014 06:31 (GMT+05:00) Title : Re: [nvmewin] Samsung patch for Hot plug fixes Hi Suman, Here are my observations 1. In function NVMeWaitForCtrlRDY(), you return TRUE when device is removed. And as per your explanation, this is to avoid delay in removing device from device manager. But returning TRUE is not intuitive. This function actually failed and hence we should be returning FALSE. And if this breaks your flow than the max delay would be 500 milliseconds. I would say that is negligible from user display point of view. [Suman] Agreed. We initially returned FALSE from NVMeWaitForCtrlRDY() when device is removed. We changed it to TRUE for the following reason. When device is removed, the NVMeBuildIo() -> SRB_FUNCTION_PNP -> NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() was returning FALSE because of which the pAE->ShutdownInProgess was not set to TRUE in NVMeAdaptercontrolPowerDown. After this when NVMeAdapterControl() -> ScsiStopAdapter -> NVMeAdapterControlPowerDown() is invoked, since the pAE->ShutdownInProgess was not set to TRUE in NVMeBuildIo(), the NVMeDetectPendingCmds() and NVMeResetAdapter() is executed again. This should not be executed second time. To avoid this we retured TRUE from NVMeWaitForCtrlRDY() so that pAE->ShutdownInProgress is set to TRUE. To resolve this, in NVMeAdapterControlPowerDown(), we can move setting the pAE->ShutdownInProgress inside the else part and return FALSE when device is removed from NVMeWaitForCtrlRDY() as follows. NVMeAdapterControlPowerDown() { ... if (pAE->ShutdownInProgress == TRUE) { /* Shutdown */ status = TRUE; } else { pAE->ShutdownInProgress = TRUE; /* Hibernate or Sleep - sanity check that there is no cmd pending */ if (NVMeDetectPendingCmds(pAE, FALSE, SRB_STATUS_BUS_RESET) == TRUE) return status; /* Stop the controller, but do not free the resources */ if (NVMeResetAdapter(pAE) != TRUE) { return (FALSE); } } ... } Kindly let us know your opinion. 2. Function brief needs to be updated for NVMeDetectPendingCmds() for the 3rd parameter. [Suman] Yes. We will change this. Other than these 2, your changes look good. Thanks Parag Sheth [cid:image001.gif at 01D0025E.8D113180] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 823 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Mon Nov 17 15:47:43 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Nov 2014 23:47:43 +0000 Subject: [nvmewin] NVMe Windows DB Is LOCKED - Pushing Samsung Patch For Hot Plug Fixes Message-ID: Locking NVMe Windows DB. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Mon Nov 17 16:09:02 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 18 Nov 2014 00:09:02 +0000 Subject: [nvmewin] NVMe Windows DB Is UNLOCKED - Pushing Patch From Samsung For CPU Hot Plug Fixes Message-ID: Dear all, Thank you for reviewing/testing the patch from Samsung. Many thanks to Suman and Judy for contributing the patch. The patch had been pushed into the source base and a new tag called "Patch#31_CPU_Hot_Plug_Fixes" had been created under "tags" directory. PMC is scheduled to submit next patch for Format NVM/WHQL fixes after re-basing with most current sources in the repository. I will send it out for review/test by the end of tomorrow. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Tue Nov 18 15:30:52 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 18 Nov 2014 23:30:52 +0000 Subject: [nvmewin] New Patch From PMC Ready For Review/Test Message-ID: Dear all, Please find the attached patch from PMC. Password is pmc123. The following tests had been completed on Windows 7, 8, 8.1, Windows Server 2012, 2012 R2. - Disk Formats (Quick and Full formats) - IOMeter read/write accesses - SDStress - NVMe SCSI Compliance - Install/Uninstall driver - Disable/Enable driver - All WHQL tests passed with HCK 8.100.26795 on Windows Server 2012 R2, except the following tests exempted via Errata ID 4693: 1. NVMe Device Capabilities 2. NVMe Queue Pause-Resume 3. NVMe Queue Utilization The changes of this patch includes two portions: WHQL related changes and Format NVM changes. < WHQL related changes > 1. For "Static Tools Logo test" on server systems, it requires Driver Verification Logs. Most current WDK bundled with VS2013 has a known issue that has a conflict in function prototype definition for StorPortReadRegisterUlong64. Due to the conflict, Code Analysis fails and can't generate Code Analysis log. Therefore, we have to fall back to use StorPortReadRegisterUlong for the time being in nvmeInit.c and nvmestd.c 2. Added function type declarations in nvmeStd.h in order to pass Static Driver Verifier and generate SDV log. 3. In order to pass Inquiry command, Device Identification VPD Page (part of NVMe SCSI Compliance test), the driver needs to report SCSI Name String Designation Descriptor (Type 8). For the time being, I added reporting the string as "SCSINVMe". You may change it to whatever strings as you wish. Judy from Samsung is working on definition of the string as well. < Format NVM Changes > 1. Added storportdebugprints in nvmesnti.c. 2. In nvmestd.c and nvmestd.h : - Added NVMeIsReadWritCmd - In NVMeBuildIo, block Read/write commands while Format NVM is in progress. - Added one parameter of NVMeIsNamespaceVisible to allow specifying the target namespaceID. - Added NVMeFormatNVMHotRemoveNamespace and NVMeFormatNVMHotAddNamespace - Changed the state machine transitions/processes in NVMeIoctlFormatNVMCallback Please review/test the patch and provide your feedback at your earliest convenience. Thank you very much. Regards, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PMC_FormatNVM_WHQL_Patch_V1_11182014.zip Type: application/x-zip-compressed Size: 190825 bytes Desc: PMC_FormatNVM_WHQL_Patch_V1_11182014.zip URL: From judy.brock at ssi.samsung.com Thu Nov 20 02:35:59 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Thu, 20 Nov 2014 10:35:59 +0000 Subject: [nvmewin] New Patch From PMC Ready For Review/Test In-Reply-To: References: Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B66A6AB@SSIEXCH-MB3.ssi.samsung.com> Hi Alex, >> In order to pass Inquiry command, Device Identification VPD Page (part of NVMe SCSI Compliance test), the driver needs to report SCSI Name String Designation Descriptor (Type 8). For the time being, I added reporting the string as "SCSINVMe". You may change it to whatever strings as you wish. Judy from Samsung is working on definition of the string as well. As you know, there is a v1.4 draft proposal of the NVMe-to-SCSI translation reference under development in NVMe committee right now. We hope to have a vote for 30 day approval within the next couple of weeks. Below is the Samsung-proposed section on SCSI Name String Designator Descriptor which has been through several rounds of review – maybe the patch should sync up with this definition in anticipation of ratification over the next couple of months: 1.1.1.1 SCSI Name String Designator Table 6‑8: SCSI name string Designation Descriptor SCSI Name String Descriptor Field Notes and References PROTOCOL IDENTIFIER Shall be set to 0h. PIV field shall indicate this field is reserved as no specific protocol to be identified. CODE SET Shall be set to 3h indicating associated fields are in UTF-8 format. PIV Shall be set to 0b indicating PROTOCOL IDENTIFIER field is reserved. ASSOCIATION Shall be set to 00b indicating DESIGNATOR field is associated with logical unit. DESIGNATOR TYPE Shall be set to 8h indicating SCSI name string format and assignment authority. DESIGNATOR LENGTH Shall be set to size of SCSI NAME STRING field. SCSI NAME STRING See section 6.1.4.4.1. 1.1.1.1.1 SCSI NAME STRING field For NVMe devices compliant with revision 1.1 or later: If the 64 bit EUI64 field in the NVMe Identify Namespace data structure is used to specify a globally unique namespace identifier when the namespace is created: Shall be set to a 20 byte UTF-8 character field comprised of the four UTF-8 characters 'eui.' concatenated with UTF-8 representation of the 16 hexadecimal digits corresponding to the 64 bit EUI64 field of the Identify Namespace Data Structure. The first hexadecimal digit shall be the most significant four bits of the first byte (i.e., most significant byte) of the EUI-64 field. If the 128 bit NGUID field in the Identify Namespace data structure is used to specify a globally unique namespace identifier when the namespace is created: Shall be set to a 36 byte UTF-8 character field comprised of the four UTF-8 characters 'eui.' concatenated with UTF-8 representation of the 32 hexadecimal digits corresponding to the 128 bit NGUID field of the Identify Namespace Data Structure. The first hexadecimal digit shall be the most significant four bits of the first byte (i.e., most significant byte) of the NGUID field. For NVMe devices compliant with revision 1.0: Shall be set to a 68 byte UTF-8 character field comprised of 4 bytes of UTF-8 representation of 2 byte PCI Vendor ID, plus 40 bytes of Model Number, plus 4 bytes of UTF-8 representation of Namespace ID, plus 20 bytes of Serial Number. Note: the start of the string (MSB) is at the lowest byte offset and the end of the string (LSB) is at the highest byte offset: · Bytes 67:48: 20 bytes of Serial Number(bytes 23:04 of Identify Controller data structure) · Bytes 47:44: 4 bytes of Namespace ID (UTF-8 representation) · Bytes 43:04: 40 bytes of Model Number(bytes 63:24 of Identify Controller data structure) · Bytes 03:00: 4 bytes of PCI Vendor ID (UTF-8 representation) (bytes 01:00 of Identify Controller converted to 4 UTF-8 characters Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 18, 2014 3:31 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] New Patch From PMC Ready For Review/Test Dear all, Please find the attached patch from PMC. Password is pmc123. The following tests had been completed on Windows 7, 8, 8.1, Windows Server 2012, 2012 R2. - Disk Formats (Quick and Full formats) - IOMeter read/write accesses - SDStress - NVMe SCSI Compliance - Install/Uninstall driver - Disable/Enable driver - All WHQL tests passed with HCK 8.100.26795 on Windows Server 2012 R2, except the following tests exempted via Errata ID 4693: 1. NVMe Device Capabilities 2. NVMe Queue Pause-Resume 3. NVMe Queue Utilization The changes of this patch includes two portions: WHQL related changes and Format NVM changes. < WHQL related changes > 1. For "Static Tools Logo test" on server systems, it requires Driver Verification Logs. Most current WDK bundled with VS2013 has a known issue that has a conflict in function prototype definition for StorPortReadRegisterUlong64. Due to the conflict, Code Analysis fails and can't generate Code Analysis log. Therefore, we have to fall back to use StorPortReadRegisterUlong for the time being in nvmeInit.c and nvmestd.c 2. Added function type declarations in nvmeStd.h in order to pass Static Driver Verifier and generate SDV log. 3. In order to pass Inquiry command, Device Identification VPD Page (part of NVMe SCSI Compliance test), the driver needs to report SCSI Name String Designation Descriptor (Type 8). For the time being, I added reporting the string as "SCSINVMe". You may change it to whatever strings as you wish. Judy from Samsung is working on definition of the string as well. < Format NVM Changes > 1. Added storportdebugprints in nvmesnti.c. 2. In nvmestd.c and nvmestd.h : - Added NVMeIsReadWritCmd - In NVMeBuildIo, block Read/write commands while Format NVM is in progress. - Added one parameter of NVMeIsNamespaceVisible to allow specifying the target namespaceID. - Added NVMeFormatNVMHotRemoveNamespace and NVMeFormatNVMHotAddNamespace - Changed the state machine transitions/processes in NVMeIoctlFormatNVMCallback Please review/test the patch and provide your feedback at your earliest convenience. Thank you very much. Regards, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From suman.p at samsung.com Thu Nov 20 04:38:10 2014 From: suman.p at samsung.com (SUMAN PRAKASH B) Date: Thu, 20 Nov 2014 12:38:10 +0000 (GMT) Subject: [nvmewin] New Patch From PMC Ready For Review/Test Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 201411201808957_OC322OJW.gif Type: image/gif Size: 13168 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SntiTranslateDeviceIdentificationPage.txt Type: application/octet-stream Size: 5856 bytes Desc: not available URL: From Alex.Chang at pmcs.com Thu Nov 20 08:09:52 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 20 Nov 2014 16:09:52 +0000 Subject: [nvmewin] New Patch From PMC Ready For Review/Test In-Reply-To: <9F.9D.14702.1B0ED645@epcpsbgx3.samsung.com> References: <9F.9D.14702.1B0ED645@epcpsbgx3.samsung.com> Message-ID: Thank you, Suman and Judy, for your prompt response. I will take a look at it and get back to you. Alex From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Thursday, November 20, 2014 4:38 AM To: Alex Chang; nvmewin at lists.openfabrics.org Cc: judy.brock at ssi.samsung.com Subject: Re: Re: New Patch From PMC Ready For Review/Test Hi Alex, As Judy has mentioned, the patch is not in sync with the v1.4 draft proposal. Also, the patch has 2 function implementations for SntiTranslateDeviceIdentificationPage() protected with #if, which might not be required. Please find attached the implementation for SntiTranslateDeviceIdentificationPage() which complies with v1.4 draft, except for the NVMe v1.2 spec NGUID support. Kindly check if this code can be re-used. Thanks, Suman [cid:image001.gif at 01D00499.56ECA0C0] [Image removed by sender.] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ~WRD337.jpg Type: image/jpeg Size: 823 bytes Desc: ~WRD337.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From judy.brock at ssi.samsung.com Thu Nov 20 08:20:36 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Thu, 20 Nov 2014 16:20:36 +0000 Subject: [nvmewin] New Patch From PMC Ready For Review/Test In-Reply-To: References: Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B66A83D@SSIEXCH-MB3.ssi.samsung.com> Just fyi, the NVMe technical workgroup voted this morning to approve the v1.4 draft proposal to start 30 day review period. If it passes without comment, the next step would be ratification. Thanks, Judy From: SUMAN PRAKASH B [mailto:suman.p at samsung.com] Sent: Thursday, November 20, 2014 4:38 AM To: alex.chang at pmcs.com; nvmewin at lists.openfabrics.org Cc: Judy Brock-SSI Subject: Re: Re: New Patch From PMC Ready For Review/Test Hi Alex, As Judy has mentioned, the patch is not in sync with the v1.4 draft proposal. Also, the patch has 2 function implementations for SntiTranslateDeviceIdentificationPage() protected with #if, which might not be required. Please find attached the implementation for SntiTranslateDeviceIdentificationPage() which complies with v1.4 draft, except for the NVMe v1.2 spec NGUID support. Kindly check if this code can be re-used. Thanks, Suman [cid:image001.gif at 01D0049A.DC031A90] [http://ext.samsung.net/mailcheck/SeenTimeChecker?do=e3e1a57eba32dd3d15778e802ee7d1279b4e2ca6030fddd8e43a67974839a6b6b9e395d996cf6f207af838006e78a267abf4e953e577a95fa728c55b39cc59eacf878f9a26ce15a0] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 13168 bytes Desc: image001.gif URL: From Alex.Chang at pmcs.com Mon Nov 24 14:42:28 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 24 Nov 2014 22:42:28 +0000 Subject: [nvmewin] Revised PMC Patch Message-ID: Dear all, It's been a week since I sent out the first version of PMC patch. Thanks to Suman and Judy for the feedback. Here comes the revised patch (password:pmc123) and the changes include: 1. New members added in Identify Namespace structure (nvme.h) 2. Changed _NVMe_VERSION from struct to union (nvmeReg.h) 3. New translation for Device Identification Page for Inquiry command, which will be approved soon in SCSI-to-NVMe Translation Specification V1.4 (nvmeSntiTypes.h and nvmeSnti.c) 4. Added including stdlib.h in order to compile the changes in nvmeSnti.c (precomp.h) Please review/test it at your earliest convenience and provide you feedback. I will start collecting approvals early December. Thank you very much and Happy Thanksgiving! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PMC_FormatNVM_WHQL_Patch_V2_11242014.zip Type: application/x-zip-compressed Size: 192367 bytes Desc: PMC_FormatNVM_WHQL_Patch_V2_11242014.zip URL: From thomas.freeman at hgst.com Tue Nov 25 12:21:51 2014 From: thomas.freeman at hgst.com (Thomas Freeman) Date: Tue, 25 Nov 2014 20:21:51 +0000 Subject: [nvmewin] FW: Revised PMC Patch In-Reply-To: References: Message-ID: Alex, I have a question about FORMAT NVM. When the drive is first powered on, the driver looks to see if the namespace is using metadata. If the namespace has metadata, it is not reported to Windows. metadataSize = pLunExt->identifyData.LBAFx[flbas].MS; But, if the user were to format a namespace with an LBAF that supports metadata, the driver would call NVMeFormatNVMHotAddNamespace, making that namespace available to Windows. This issue existed before your fix, but it seems like now might be a good time to fix it. Tom Freeman Software Engineer, Device Manager and Driver Development HGST, a Western Digital company Thomas.Freeman at hgst.com 507-322-2311 From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, November 24, 2014 4:42 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Revised PMC Patch Dear all, It's been a week since I sent out the first version of PMC patch. Thanks to Suman and Judy for the feedback. Here comes the revised patch (password:pmc123) and the changes include: 1. New members added in Identify Namespace structure (nvme.h) 2. Changed _NVMe_VERSION from struct to union (nvmeReg.h) 3. New translation for Device Identification Page for Inquiry command, which will be approved soon in SCSI-to-NVMe Translation Specification V1.4 (nvmeSntiTypes.h and nvmeSnti.c) 4. Added including stdlib.h in order to compile the changes in nvmeSnti.c (precomp.h) Please review/test it at your earliest convenience and provide you feedback. I will start collecting approvals early December. Thank you very much and Happy Thanksgiving! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PMC_FormatNVM_WHQL_Patch_V2_11242014.zip Type: application/x-zip-compressed Size: 192367 bytes Desc: PMC_FormatNVM_WHQL_Patch_V2_11242014.zip URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT00001.txt URL: From Alex.Chang at pmcs.com Wed Nov 26 15:01:05 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 26 Nov 2014 23:01:05 +0000 Subject: [nvmewin] Revised PMC Patch In-Reply-To: <13178_1416868956_5473B45C_13178_10826_1_E1729D5DBAB9E948BA87B76FDFA1298A398F69F5@BBYEXM01.pmc-sierra.internal> References: <13178_1416868956_5473B45C_13178_10826_1_E1729D5DBAB9E948BA87B76FDFA1298A398F69F5@BBYEXM01.pmc-sierra.internal> Message-ID: Hi all, Per the feedback from Thomas Freeman at HGST, since the driver doesn't support metadata processing now and to avoid the potential buffer corruption, we will not report the namespace that had been formatted using metadata via Format NVM command. I made the related change in SntiTranslateReportLUNs of nvmeSnti.c and finished all required tests. Please find the revised patch in attachment and the password is pmc123. Thank you very much, Tom. Regards, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, November 24, 2014 2:42 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Revised PMC Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Dear all, It's been a week since I sent out the first version of PMC patch. Thanks to Suman and Judy for the feedback. Here comes the revised patch (password:pmc123) and the changes include: 1. New members added in Identify Namespace structure (nvme.h) 2. Changed _NVMe_VERSION from struct to union (nvmeReg.h) 3. New translation for Device Identification Page for Inquiry command, which will be approved soon in SCSI-to-NVMe Translation Specification V1.4 (nvmeSntiTypes.h and nvmeSnti.c) 4. Added including stdlib.h in order to compile the changes in nvmeSnti.c (precomp.h) Please review/test it at your earliest convenience and provide you feedback. I will start collecting approvals early December. Thank you very much and Happy Thanksgiving! Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PMC_FormatNVM_WHQL_Patch_V3_11262014.zip Type: application/x-zip-compressed Size: 192437 bytes Desc: PMC_FormatNVM_WHQL_Patch_V3_11262014.zip URL: