From Dharani.Kotte at sandisk.com Wed Mar 5 09:02:47 2014 From: Dharani.Kotte at sandisk.com (Dharani Kotte) Date: Wed, 5 Mar 2014 17:02:47 +0000 Subject: [nvmewin] ***UNCHECKED*** [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> Message-ID: <23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source_sndk_03_05_2014.zip Type: application/x-zip-compressed Size: 177048 bytes Desc: source_sndk_03_05_2014.zip URL: From Alex.Chang at pmcs.com Wed Mar 5 10:13:29 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 5 Mar 2014 18:13:29 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Dharani.Kotte at sandisk.com Wed Mar 5 10:15:17 2014 From: Dharani.Kotte at sandisk.com (Dharani Kotte) Date: Wed, 5 Mar 2014 18:15:17 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: <23EC73C80FB59046A6B7B8EB7B3826593DA795AD@SACMBXIP01.sdcorp.global.sandisk.com> It is minor change in the function NVMeDetectPendingCmds() we should not check for pSrb == NULL, internal commands has pSrb set to NULL like Identify command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 10:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From carolyn.d.foster at intel.com Fri Mar 7 14:16:24 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Fri, 7 Mar 2014 22:16:24 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Alex.Chang at pmcs.com Fri Mar 7 14:40:26 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 7 Mar 2014 22:40:26 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Dharani.Kotte at sandisk.com Sat Mar 8 11:59:25 2014 From: Dharani.Kotte at sandisk.com (Dharani Kotte) Date: Sat, 8 Mar 2014 19:59:25 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: <23EC73C80FB59046A6B7B8EB7B3826593DA86AA6@SACMBXIP01.sdcorp.global.sandisk.com> Sure I will try this on Monday try to reproduce this issue on my system. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Dharani.Kotte at sandisk.com Wed Mar 12 08:59:39 2014 From: Dharani.Kotte at sandisk.com (Dharani Kotte) Date: Wed, 12 Mar 2014 15:59:39 +0000 Subject: [nvmewin] ***UNCHECKED*** [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: <23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source_sndk_03_12_2014.zip Type: application/x-zip-compressed Size: 177065 bytes Desc: source_sndk_03_12_2014.zip URL: From Alex.Chang at pmcs.com Wed Mar 12 09:49:24 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 12 Mar 2014 16:49:24 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From judy.brock at ssi.samsung.com Sun Mar 16 07:42:32 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Sun, 16 Mar 2014 14:42:32 +0000 Subject: [nvmewin] Static Driver Verification In-Reply-To: References: Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B61093F@SSIEXCH-MB3.ssi.samsung.com> >> To work around this, I went into sdv_storport.h (WindowsKits/8.1/Tools/sdv/osmodel/storport). I made the >> StorPortReadRegisterUlong64 prototype there match the prototype in storport.h. >> I've reported this to Microsoft and they are working on a permanent fix. We hit this some time back too. Rather than modify an MS header file we commented out the usage of the StorPortReadRegisterUlong64 (as highlighted below) since SDV needs to be error free for reasons mentioned below. I think we should consider changing the driver to do the same to avoid the problem. #if 0 //#if (NTDDI_VERSION > NTDDI_WIN7) && defined(_WIN64) CAP.AsUlonglong = StorPortReadRegisterUlong64(pAE, (PULONG64)(&pAE->pCtrlRegister->CAP)); #else CAP.HighPart = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CAP.HighPart)); CAP.LowPart = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CAP.LowPart)); #endif Thanks, Judy -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Thomas.Freeman at hgst.com Sent: Tuesday, February 04, 2014 6:54 AM To: Speer, Kenny Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Static Driver Verification I'm working with code from revision_1.2. Tom Freeman SSD Device Drivers/Firmware HGST, a Western Digital company Thomas.Freeman at HGST.com 507-322-2311/232311 "Speer, Kenny" To "Thomas.Freeman at hgst.com" 02/03/2014 06:15 > PM cc "nvmewin at lists.openfabrics.org" > Subject RE: [nvmewin] Static Driver Verification I did briefly but ran into other issues. Which branch are you building from? -----Original Message----- From: Thomas.Freeman at hgst.com [mailto:Thomas.Freeman at hgst.com] Sent: Monday, February 3, 2014 4:12 PM To: Speer, Kenny Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Static Driver Verification Kenny.... Not sure if you've tried to run this yet.... I ran into a problem running the SDV in a Windows 8/x64 config. I get a message saying that there aren't enough parms being passed into StorPortReadRegisterUlong64. To work around this, I went into sdv_storport.h (WindowsKits/8.1/Tools/sdv/osmodel/storport). I made the StorPortReadRegisterUlong64 prototype there match the prototype in storport.h. I've reported this to Microsoft and they are working on a permanent fix. Tom Freeman SSD Device Drivers/Firmware HGST, a Western Digital company Thomas.Freeman at HGST.com 507-322-2311/232311 "Speer, Kenny" To "Thomas.Freeman at hgst.com" 02/01/2014 05:37 >, PM "nvmewin at lists.openfabrics.org" > cc Subject RE: [nvmewin] Static Driver Verification Yes and IMHO, those need to be fixed and the tools should run error/warning free. The majority of the warnings are due to annotations which exist in the definitions of the functions but not in their declarations. It's good practice, again, IMO, to annotate and to do it in both. See http://msdn.microsoft.com/en-us/library/windows/desktop/aa383701 (v=vs.85).aspx for more info on SAL annotations. Haven't run SDV yet, I'll do that now, not sure how long it takes on this solution but on my other projects it takes 8 hours on a 4 core i7 with 32G mem. ~kenny -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Thomas.Freeman at hgst.com Sent: Saturday, February 1, 2014 2:02 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Static Driver Verification I'm new to HCK testing, but I understand part of the testing requires a Driver Verification Log (which contains the results from Code Analysis and Static Driver Verifier). This is required for the Static Tools Logo test (under the Device.Fundamentals category). When I run those tools, I'm getting a large number of errors. The code analysis shows over 50 warnings and the Static Driver Verifier fails because the Callback roletypes (sp_DRIVER_INITIALIZE, HW_INITIALIZE, etc) are not defined. Fixing the roletypes allows a new set of errors to appear. I'm running with VS 2013 and WDK 8.1. Am I missing something? Has anyone else run these tools against the code? Tom Freeman SSD Device Drivers/Firmware HGST, a Western Digital company Thomas.Freeman at HGST.com 507-322-2311/232311 _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Sun Mar 16 10:36:14 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Sun, 16 Mar 2014 17:36:14 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B6109B3@SSIEXCH-MB3.ssi.samsung.com> Hi Alex, I was wondering why the following highlighted lines which are present in rev 83 have been deleted in the patch: VOID SntiBuildFlushCmd( PNVME_SRB_EXTENSION pSrbExt ) { /* Set up common portions of the NVMe WRITE command */ memset(&pSrbExt->nvmeSqeUnit, 0, sizeof(NVMe_COMMAND)); /* this Cmd is currently called not specific to one particular, but all namespaces */ pSrbExt->nvmeSqeUnit.NSID = 0xFFFFFFFF; pSrbExt->nvmeSqeUnit.CDW0.OPC = NVM_FLUSH; Thanks, Judy From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, February 06, 2014 6:18 PM To: nvmewin at lists.openfabrics.org; Truong Nguyen-SSI Subject: RE: PMC New Patch Hi all, Thanks to Carolyn for a quick feedback that CoreNum is no longer defined in Core Table, which causes a compiling error in debug build (Line# 1893 of nvmeinit.c). Sorry about the confusion and any inconveniences it may have caused. Please find the attached revised patch and password remains same. Thanks, Alex From: Alex Chang Sent: Friday, January 31, 2014 3:43 PM To: nvmewin at lists.openfabrics.org; Truong Nguyen-SSI (tru.nguyen at ssi.samsung.com) Subject: PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we plan to collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Paramlist length problem. Files changed: nvmeSnti.c and nvmeSntiTypes.h (Program DW10 in submission entry in DWORDs for Write Buffer command translation) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. NVMe reset handling issue. File changed: nvmeStd.c (Need to wait until RDY bit is cleared to 0 after changing EN bit from 1 to 0) 9. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress 5. Quick/full disk formats Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Mon Mar 17 09:09:33 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Mar 2014 16:09:33 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A516B6109B3@SSIEXCH-MB3.ssi.samsung.com> References: <36E8D38D6B771A4BBDB1C0D800158A516B6109B3@SSIEXCH-MB3.ssi.samsung.com> Message-ID: Thank you, Judy. I added changes into nvmesnti.c on top of revision 73. The highlighted codes below was added in revision 82 by Yong, which was pushed after I sent out the patch. I certainly will re-base again after pushing Sandisk's patch and send it out for review/test. Regards, Alex From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, March 16, 2014 10:36 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, I was wondering why the following highlighted lines which are present in rev 83 have been deleted in the patch: VOID SntiBuildFlushCmd( PNVME_SRB_EXTENSION pSrbExt ) { /* Set up common portions of the NVMe WRITE command */ memset(&pSrbExt->nvmeSqeUnit, 0, sizeof(NVMe_COMMAND)); /* this Cmd is currently called not specific to one particular, but all namespaces */ pSrbExt->nvmeSqeUnit.NSID = 0xFFFFFFFF; pSrbExt->nvmeSqeUnit.CDW0.OPC = NVM_FLUSH; Thanks, Judy From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, February 06, 2014 6:18 PM To: nvmewin at lists.openfabrics.org; Truong Nguyen-SSI Subject: RE: PMC New Patch Hi all, Thanks to Carolyn for a quick feedback that CoreNum is no longer defined in Core Table, which causes a compiling error in debug build (Line# 1893 of nvmeinit.c). Sorry about the confusion and any inconveniences it may have caused. Please find the attached revised patch and password remains same. Thanks, Alex From: Alex Chang Sent: Friday, January 31, 2014 3:43 PM To: nvmewin at lists.openfabrics.org; Truong Nguyen-SSI (tru.nguyen at ssi.samsung.com) Subject: PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we plan to collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Paramlist length problem. Files changed: nvmeSnti.c and nvmeSntiTypes.h (Program DW10 in submission entry in DWORDs for Write Buffer command translation) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. NVMe reset handling issue. File changed: nvmeStd.c (Need to wait until RDY bit is cleared to 0 after changing EN bit from 1 to 0) 9. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress 5. Quick/full disk formats Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Mon Mar 17 16:46:38 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 17 Mar 2014 23:46:38 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Alex.Chang at pmcs.com Thu Mar 20 08:27:38 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 20 Mar 2014 15:27:38 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: Hi Rick and Carolyn, Just a friendly reminder, please let me know if you approve the patch. Thanks, Alex From: Alex Chang Sent: Monday, March 17, 2014 4:47 PM To: 'Dharani Kotte'; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Rick.Knoblaugh at lsi.com Thu Mar 20 11:07:08 2014 From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick) Date: Thu, 20 Mar 2014 18:07:08 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> Message-ID: <9e8453dc345f46d39f9cc38bb1541d4c@DM2PR07MB285.namprd07.prod.outlook.com> Hi Alex, We are good with this patch. Also, just wanted to mention that I will be on vacation next week -- Parag will be here if anything needed from us. Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 8:28 AM To: Dharani Kotte; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Rick and Carolyn, Just a friendly reminder, please let me know if you approve the patch. Thanks, Alex From: Alex Chang Sent: Monday, March 17, 2014 4:47 PM To: 'Dharani Kotte'; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Alex.Chang at pmcs.com Thu Mar 20 11:12:17 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 20 Mar 2014 18:12:17 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: <9e8453dc345f46d39f9cc38bb1541d4c@DM2PR07MB285.namprd07.prod.outlook.com> References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> <9e8453dc345f46d39f9cc38bb1541d4c@DM2PR07MB285.namprd07.prod.outlook.com> Message-ID: Thanks a lot, Rick. Have a great vacation! Alex From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, March 20, 2014 11:07 AM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, We are good with this patch. Also, just wanted to mention that I will be on vacation next week -- Parag will be here if anything needed from us. Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 8:28 AM To: Dharani Kotte; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Rick and Carolyn, Just a friendly reminder, please let me know if you approve the patch. Thanks, Alex From: Alex Chang Sent: Monday, March 17, 2014 4:47 PM To: 'Dharani Kotte'; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From saikrishna.ravikanti at gmail.com Fri Mar 21 08:57:46 2014 From: saikrishna.ravikanti at gmail.com (Saikrishna Ravikanti) Date: Fri, 21 Mar 2014 21:27:46 +0530 Subject: [nvmewin] Handling NVMe Passthrough IOCTLs Message-ID: Hi Team, By referring PT_IOCTL.Doc and WDK SPTI Sample, I am developing an application to send IOCTLs using _NVME_PASS_THROUGH_IOCTL structure. I am facing some problem. DeviceIoControl routine returning Error code 1 (Incorrect Function). Code shown below : PNVME_PASS_THROUGH_IOCTL pInBuffer = NULL; PNVME_PASS_THROUGH_IOCTL pOutBuffer = NULL; ULONG ByteSizeTX =4096; /* Allocate input buffer to accommodate size of NVME_PASS_THRUGH_IOCTL only */ InputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL); pInBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(InputBufLen); /* Allocate output buffer to accommodate size of NVME_PASS_THRUGH_IOCTL and data */ OutputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL)+ ByteSizeTX - 1; pOutBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(OutputBufLen); if (pInBuffer == NULL || pOutBuffer == NULL) return; /* Zero out the buffers */ memset(pInBuffer, 0, InputBufLen); memset(pOutBuffer, 0, OutputBufLen); pInBuffer->SrbIoCtrl.HeaderLength = sizeof(SRB_IO_CONTROL); memcpy((UCHAR*)(&pInBuffer->SrbIoCtrl.Signature[0]), NVME_SIG_STR, NVME_SIG_STR_LEN); pInBuffer->SrbIoCtrl.Timeout = NVME_PT_TIMEOUT; pInBuffer->SrbIoCtrl.ControlCode = (ULONG)NVME_PASS_THROUGH_SRB_IO_CODE; pInBuffer->SrbIoCtrl.Length = InputBufLen - sizeof(SRB_IO_CONTROL); pInBuffer->NVMeCmd[0]=ADMIN_IDENTIFY; pInBuffer->NVMeCmd[1]=0; pInBuffer->NVMeCmd[10]= 1; //Return corresponding controller data structure pInBuffer->DataBufferLen = 0; pInBuffer->ReturnBufferLen = sizeof(NVME_PASS_THROUGH_IOCTL) + ByteSizeTX - 1; status = DeviceIoControl( fileHandle, /* Handle to \\.\scsi device via CreateFile */ IOCTL_SCSI_MINIPORT, /* IO control function to a miniport driver */ pInBuffer , /* Input buffer with data sent to driver */ InputBufLen, /* Length of data sent to driver (in bytes) */ pOutBuffer, /* Output buffer with data received from driver */ OutputBufLen, /* Length of data received from driver */ &Count, /* Bytes placed in DataBuffer */ NULL); Kindly let me know your valuable inputs about this behavior. Note: I am getting proper handle from CreateFile for my NVMe Device. Regards, Sai -------------- next part -------------- An HTML attachment was scrubbed... URL: From akyros000 at gmail.com Fri Mar 21 09:09:13 2014 From: akyros000 at gmail.com (Jeff Glass) Date: Fri, 21 Mar 2014 09:09:13 -0700 Subject: [nvmewin] Handling NVMe Passthrough IOCTLs In-Reply-To: References: Message-ID: <532C6429.1040204@gmail.com> You said "Note: I am getting proper handle from CreateFile for my NVMe Device." Did you open \\.\PhysicalDriveXxxx or \\.\ScsiXxxx? You need to open a handle to the adapter (i.e. Scsi) and not the disk (i.e. PhysicalDrive) for IOCTL_SCSI_MINIPORT. That's the most common mistake I've seen that causes this error. On 3/21/201 8:57 AM, Saikrishna Ravikanti wrote: > Hi Team, > > By referring PT_IOCTL.Doc and WDK SPTI Sample, I am developing an > application to send IOCTLs > using _NVME_PASS_THROUGH_IOCTL structure. > > I am facing some problem. > DeviceIoControl routine returning Error code 1 (Incorrect Function). > > Code shown below : > > PNVME_PASS_THROUGH_IOCTL pInBuffer = NULL; > PNVME_PASS_THROUGH_IOCTL pOutBuffer = NULL; > ULONG ByteSizeTX =4096; > > /* Allocate input buffer to accommodate size of NVME_PASS_THRUGH_IOCTL > only */ > InputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL); > pInBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(InputBufLen); > /* Allocate output buffer to accommodate size of > NVME_PASS_THRUGH_IOCTL and data */ > OutputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL)+ ByteSizeTX - 1; > pOutBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(OutputBufLen); > if (pInBuffer == NULL || pOutBuffer == NULL) > return; > /* Zero out the buffers */ > memset(pInBuffer, 0, InputBufLen); > memset(pOutBuffer, 0, OutputBufLen); > > pInBuffer->SrbIoCtrl.HeaderLength = sizeof(SRB_IO_CONTROL); > memcpy((UCHAR*)(&pInBuffer->SrbIoCtrl.Signature[0]), NVME_SIG_STR, > NVME_SIG_STR_LEN); > pInBuffer->SrbIoCtrl.Timeout = NVME_PT_TIMEOUT; > pInBuffer->SrbIoCtrl.ControlCode = (ULONG)NVME_PASS_THROUGH_SRB_IO_CODE; > pInBuffer->SrbIoCtrl.Length = InputBufLen - sizeof(SRB_IO_CONTROL); > > pInBuffer->NVMeCmd[0]=ADMIN_IDENTIFY; > pInBuffer->NVMeCmd[1]=0; > pInBuffer->NVMeCmd[10]= 1; //Return corresponding controller data > structure > > pInBuffer->DataBufferLen = 0; > pInBuffer->ReturnBufferLen = sizeof(NVME_PASS_THROUGH_IOCTL) + > ByteSizeTX - 1; > > status = DeviceIoControl( > fileHandle, /* > Handle to \\.\scsi device via CreateFile */ > IOCTL_SCSI_MINIPORT, /* IO control > function to a miniport driver */ > pInBuffer , /* > Input buffer with data sent to driver */ > InputBufLen, /* Length > of data sent to driver (in bytes) */ > pOutBuffer, /* Output > buffer with data received from driver */ > OutputBufLen, /* > Length of data received from driver */ > &Count, /* > Bytes placed in DataBuffer */ > NULL); > > Kindly let me know your valuable inputs about this behavior. > > Note: I am getting proper handle from CreateFile for my NVMe Device. > > Regards, > Sai > > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From carolyn.d.foster at intel.com Fri Mar 21 11:49:21 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Fri, 21 Mar 2014 18:49:21 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> <9e8453dc345f46d39f9cc38bb1541d4c@DM2PR07MB285.namprd07.prod.outlook.com> Message-ID: I approve the patch as well. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 11:12 AM To: Knoblaugh, Rick; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thanks a lot, Rick. Have a great vacation! Alex From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, March 20, 2014 11:07 AM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, We are good with this patch. Also, just wanted to mention that I will be on vacation next week -- Parag will be here if anything needed from us. Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 8:28 AM To: Dharani Kotte; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Rick and Carolyn, Just a friendly reminder, please let me know if you approve the patch. Thanks, Alex From: Alex Chang Sent: Monday, March 17, 2014 4:47 PM To: 'Dharani Kotte'; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From Alex.Chang at pmcs.com Fri Mar 21 11:51:42 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 21 Mar 2014 18:51:42 +0000 Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes In-Reply-To: References: <23EC73C80FB59046A6B7B8EB7B3826593BDF0B5C@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593BDF134C@SACMBXIP01.sdcorp.global.sandisk.com> <26455_1392829222_5304E326_26455_6404_1_23EC73C80FB59046A6B7B8EB7B3826593BDF1488@SACMBXIP01.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6D6E0@SACMBXIP02.sdcorp.global.sandisk.com> <23EC73C80FB59046A6B7B8EB7B3826593DA6F8B5@SACMBXIP02.sdcorp.global.sandisk.com> <26920_1394039063_53175917_26920_9432_1_23EC73C80FB59046A6B7B8EB7B3826593DA794AC@SACMBXIP01.sdcorp.global.sandisk.com> <11860_1394639990_53208476_11860_6208_1_23EC73C80FB59046A6B7B8EB7B3826593DA8A9A6@SACMBXIP01.sdcorp.global.sandisk.com> <8dcf97355cf44d88b0f91fd636740955@DM2PR07MB285.namprd07.prod.outlook.com> <6446_1395099377_532786F1_6446_19570_1_23EC73C80FB59046A6B7B8EB7B3826593DA8E18E@SACMBXIP01.sdcorp.global.sandisk.com> <9e8453dc345f46d39f9cc38bb1541d4c@DM2PR07MB285.namprd07.prod.outlook.com> Message-ID: Thank you, Carolyn. Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 21, 2014 11:49 AM To: Alex Chang; Knoblaugh, Rick; Dharani Kotte; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes I approve the patch as well. Thanks, Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 11:12 AM To: Knoblaugh, Rick; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thanks a lot, Rick. Have a great vacation! Alex From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, March 20, 2014 11:07 AM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Cc: Sheth, Parag Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, We are good with this patch. Also, just wanted to mention that I will be on vacation next week -- Parag will be here if anything needed from us. Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 20, 2014 8:28 AM To: Dharani Kotte; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Rick and Carolyn, Just a friendly reminder, please let me know if you approve the patch. Thanks, Alex From: Alex Chang Sent: Monday, March 17, 2014 4:47 PM To: 'Dharani Kotte'; Knoblaugh, Rick; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani, for the effort in preparing another revision of the patch. Hi all, Please review/test it and provide your feedback at your earliest convenience. Once I receive the approvals from LSI and Intel, I will push the patch right away. Hi Rick and Carolyn, If you approve the patch, please let me know as well. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Monday, March 17, 2014 4:36 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Rick/Alex, Attached is the code with the changes incorporated according to Rick's suggestions. Removed the flags relevant to LUN reset Renamed the function name to NVMeProcessAbortCmd() Tested as suggested by Carolyn. Password: sndk1234 Thanks, Dharani. From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Friday, March 14, 2014 3:23 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Looks like NVMeProcessAbortLunReset is called only in the case of receiving SRB_FUNCTION_ABORT_COMMAND. As such, it would probably be good to remove the case statements in that routine that are also looking for SRB_FUNCTION_RESET_LOGICAL_UNIT - and the related flag can be removed etc. Also, I was wondering in what cases you see the SRB_FUNCTION_RESET_LOGICAL_UNIT get sent into the driver. (I haven't seen any other drivers out there that actually take specific command abort actions for this. The class Lsi_u3 sample in the WDK does what we had originally done which is to lump this request in with SRB_FUNCTION_RESET_DEVICE, etc. Drivers such as the MSFT Win 8 Storport AHCI miniport ignore SRB_FUNCTION_ABORT_COMMAND.) Thanks, -Rick From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 14, 2014 2:08 PM To: Dharani Kotte; Foster, Carolyn D; Knoblaugh, Rick Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Rick, I am about to finish reviewing and testing the revised patch by the end of today. It'd be appreciated if you can provide your approval early next week if you don't have any more feedback. Thank you and have a great weekend! Alex From: Alex Chang Sent: Wednesday, March 12, 2014 9:49 AM To: 'Dharani Kotte'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you very much, Dharani. Hi all, Please review/test the revised patch as soon as possible. We need to speed up in wrapping up this patch. Thank you! Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 12, 2014 9:00 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated fix and tested well yesterday. Password: sndk1234 Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, March 07, 2014 2:40 PM To: Foster, Carolyn D; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, I'd suggest you to initialize srbExtention via NVMeInitSrbExtension for all cases in BuildIO to avoid the BSOD in the future. Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 07, 2014 2:16 PM To: Alex Chang; Dharani Kotte; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Dharani, Unfortunately I am still seeing a D1 BSOD when I try to step through the new NVMeProcessAbortLunReset code. I was able to spend a little more time to look at this failure this time and can give you more information about it. On line 1316 of nvmeStd.c where the call is for NVMeIssueAbortCmd, the parameter being passed in is the pResetSrbExt that was retrieved from the reset Srb. Then in the function IssueAbortCmd, the device extension is pulled out of that resetSrbExt. Unfortunately that SRB extension is all 0s as it was never initialized. So it's passing a null pointer into ProcessIo, which is causing the BSOD. So far, the changes to HwResetBus seem to be working ok. I will follow up with you separately on the steps I took to reproduce this failure. Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, March 05, 2014 11:13 AM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Great! Thank you very much, Dharani, for the quick fixes. Hi all, Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, March 05, 2014 9:03 AM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640. Password: sndk1234 Thanks, Dharani. From: Dharani Kotte Sent: Tuesday, February 25, 2014 3:25 PM To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex, Sure, I will go over the list below and make changes accordingly next week. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 25, 2014 2:09 PM To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Just left out one more : In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts. Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, February 25, 2014 2:03 PM To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn and Dharani, I basically agree the suggested changes in HwStorResetBus. However, I have some feedback: 1. In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well. 2. In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me. 3. In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case. Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 25, 2014 4:28 AM To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Carolyn, Line 1384: I can take care of this item. Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail. In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding. This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus). I have some feedback on the current architecture of that routine: 1. Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs. According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns: HwResetBus Pointer to the miniport driver's HwStorResetBus routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI) and HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value. and The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed. 2. Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called. 3. Code should be added to call StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock. 4. We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware. 5. Code should be added to call StorPortResume() when all work is complete. 6. We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above. Thanks, Judy Thanks, Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Monday, February 24, 2014 3:51 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Hi Alex and Dharni, I have been reviewing the code and performing some tests and I have some concerns about this patch. In nvmeStd.c: Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine. This change concerns me the most. During a reset there is no need to send individual abort requests for outstanding commands. When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue." This reset behavior has been accounted for in the driver, by design. In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands. What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only. Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands. Also, during testing, I hit a D1 BSOD when I tried to step through the code. I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell. The IO should be timed out by storport, which will then send a reset lun. Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed. The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC. Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, February 19, 2014 10:06 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes Thank you, Dharani. Hi all, Please review/test the attached reset fix patch from Sandisk and provide your feedbacks. Thank you very much, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, February 19, 2014 9:00 AM To: Alex Chang Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the patch source for review. I have tested the I/O running over night. Areas need to be focused for test this patch: 1. Test abort/LUN resets. 2. Test chip reset. 3. Test the format command. 4.Test Firmware download command. Password is "sndk1234" Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:15 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Great! Thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Tuesday, February 18, 2014 12:14 PM To: Alex Chang Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Just testing after merging the code it I should be able to send it tomorrow morning. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, February 18, 2014 12:13 PM To: Dharani Kotte Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Hi Dharani, Just a friendly reminder, could you please send out your patch as soon as it's ready? Many thanks, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, February 14, 2014 10:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes Sure Alex. Dharani. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Friday, February 14, 2014 10:17 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes Good morning, Dharani, As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes. Regards, Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Wednesday, January 15, 2014 2:08 PM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ? Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Hi Alex, The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command. Thanks, Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:54 AM To: Dharani Kotte; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Happy Holidays to you all. Alex From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 11:52 AM To: Alex Chang; Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Thank you for the explanation. Sure I will take look. Happy Holidays. Dharani. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Friday, December 20, 2013 11:44 AM To: Kwok Kong; Dharani Kotte; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Dharani, The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well. Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations. Thank you, Alex From: Kwok Kong Sent: Friday, December 20, 2013 9:08 AM To: Dharani Kotte; Akshay Mathur; Alex Chang Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Dharani, Yes, these are the three areas that you are committed to. Alex, Please send more details on the "Controller reset does not handle all cases" to Dharani. Thanks -Kwok From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com] Sent: Friday, December 20, 2013 9:02 AM To: Kwok Kong; Akshay Mathur Cc: Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Hi Kwok, I think the below are the items that we are committing for: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases". Thanks, Dharani. From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Thursday, December 19, 2013 6:53 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Excellent! Your help is much appreciated. Dharani, Please let me know if you have any question. Happy holiday to all of you. -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Thursday, December 19, 2013 6:51 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kwok, You are welcome. We are pleased to contribute to the community and appreciate you driving it! We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks. Anyway, Dharani will be in touch with you as he makes progress. Thanks Akshay From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Tuesday, December 17, 2013 4:21 PM To: Akshay Mathur Cc: Dharani Kotte; Dave Landsman Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Akshay, Thanks for your willingness to contribute to the driver. I am looking for a patch before end of Jan 2014, the earlier the better. Please let me know if Sandisk can commit to that. Your help is much appreciated. Thanks -Kwok From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com] Sent: Tuesday, December 17, 2013 4:11 PM To: Kwok Kong Cc: Dharani Kotte; Dave Landsman; Akshay Mathur Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ? Kowk, I manage the Software and driver development team at SanDisk/ESS. We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed. Thanks Akshay Mathur Sr Software Manager, Enterprise Storage Solutions 951 SanDisk Drive, Building #5 | Milpitas, CA 95035 U.S.A. | Direct +1 408.801.1336 | Cell +1 856.607.7323 | Corporate +1 408.801.1000 | Akshay.Mathur at sandisk.com [Description: cid:image001.jpg at 01CC358D.60974910] From: Kwok Kong [mailto:Kwok.Kong at pmcs.com] Sent: Wednesday, December 11, 2013 18:00 To: Dave Landsman Cc: Dharani Kotte Subject: Would you please help to resolve a few OFA NVMe driver problems ? Dave and Dharani, There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems: - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling I am also making request to other companies to work on some of the issues. I wonder if your company can work on the following three problems: - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests Please let me know if your company can work on these two issues. Thanks -Kwok ________________________________ PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 9449 bytes Desc: image001.jpg URL: From raymond.c.robles at intel.com Fri Mar 21 13:11:16 2014 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Fri, 21 Mar 2014 20:11:16 +0000 Subject: [nvmewin] Handling NVMe Passthrough IOCTLs In-Reply-To: <532C6429.1040204@gmail.com> References: <532C6429.1040204@gmail.com> Message-ID: <49158E750348AA499168FD41D88983606269B8A2@FMSMSX105.amr.corp.intel.com> Agreed with Jeff... but you will also want to make sure you are opening the correct ScsiXxx handle. This varies depending on the system. Many times it's Scsi1or Scsi2, but essentially, you'll have to walk through opening handles and upon opening one successfully, seeing the Identify succeed. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Jeff Glass Sent: Friday, March 21, 2014 9:09 AM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling NVMe Passthrough IOCTLs You said "Note: I am getting proper handle from CreateFile for my NVMe Device." Did you open \\.\PhysicalDriveXxxx or \\.\ScsiXxxx? You need to open a handle to the adapter (i.e. Scsi) and not the disk (i.e. PhysicalDrive) for IOCTL_SCSI_MINIPORT. That's the most common mistake I've seen that causes this error. On 3/21/201 8:57 AM, Saikrishna Ravikanti wrote: Hi Team, By referring PT_IOCTL.Doc and WDK SPTI Sample, I am developing an application to send IOCTLs using _NVME_PASS_THROUGH_IOCTL structure. I am facing some problem. DeviceIoControl routine returning Error code 1 (Incorrect Function). Code shown below : PNVME_PASS_THROUGH_IOCTL pInBuffer = NULL; PNVME_PASS_THROUGH_IOCTL pOutBuffer = NULL; ULONG ByteSizeTX =4096; /* Allocate input buffer to accommodate size of NVME_PASS_THRUGH_IOCTL only */ InputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL); pInBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(InputBufLen); /* Allocate output buffer to accommodate size of NVME_PASS_THRUGH_IOCTL and data */ OutputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL)+ ByteSizeTX - 1; pOutBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(OutputBufLen); if (pInBuffer == NULL || pOutBuffer == NULL) return; /* Zero out the buffers */ memset(pInBuffer, 0, InputBufLen); memset(pOutBuffer, 0, OutputBufLen); pInBuffer->SrbIoCtrl.HeaderLength = sizeof(SRB_IO_CONTROL); memcpy((UCHAR*)(&pInBuffer->SrbIoCtrl.Signature[0]), NVME_SIG_STR, NVME_SIG_STR_LEN); pInBuffer->SrbIoCtrl.Timeout = NVME_PT_TIMEOUT; pInBuffer->SrbIoCtrl.ControlCode = (ULONG)NVME_PASS_THROUGH_SRB_IO_CODE; pInBuffer->SrbIoCtrl.Length = InputBufLen - sizeof(SRB_IO_CONTROL); pInBuffer->NVMeCmd[0]=ADMIN_IDENTIFY; pInBuffer->NVMeCmd[1]=0; pInBuffer->NVMeCmd[10]= 1; //Return corresponding controller data structure pInBuffer->DataBufferLen = 0; pInBuffer->ReturnBufferLen = sizeof(NVME_PASS_THROUGH_IOCTL) + ByteSizeTX - 1; status = DeviceIoControl( fileHandle, /* Handle to \\.\scsi device via CreateFile */ IOCTL_SCSI_MINIPORT, /* IO control function to a miniport driver */ pInBuffer , /* Input buffer with data sent to driver */ InputBufLen, /* Length of data sent to driver (in bytes) */ pOutBuffer, /* Output buffer with data received from driver */ OutputBufLen, /* Length of data received from driver */ &Count, /* Bytes placed in DataBuffer */ NULL); Kindly let me know your valuable inputs about this behavior. Note: I am getting proper handle from CreateFile for my NVMe Device. Regards, Sai _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From akyros000 at gmail.com Fri Mar 21 13:57:54 2014 From: akyros000 at gmail.com (Jeff Glass) Date: Fri, 21 Mar 2014 13:57:54 -0700 Subject: [nvmewin] Handling NVMe Passthrough IOCTLs In-Reply-To: <49158E750348AA499168FD41D88983606269B8A2@FMSMSX105.amr.corp.intel.com> References: <532C6429.1040204@gmail.com> <49158E750348AA499168FD41D88983606269B8A2@FMSMSX105.amr.corp.intel.com> Message-ID: <532CA7D2.70300@gmail.com> If he has a handle to the disk device he should be able to send IOCTL_SCSI_GET_ADDRESS to determine the ScsiXxx name of the HBA device. The "PortNumber" field of the returned SCSI_ADDRESS structure is the Xxx portion of \\.\ScsiXxx. On 3/21/2014 1:11 PM, Robles, Raymond C wrote: > > Agreed with Jeff... but you will also want to make sure you are > opening the correct ScsiXxx handle. This varies depending on the > system. Many times it's Scsi1or Scsi2, but essentially, you'll have to > walk through opening handles and upon opening one successfully, seeing > the Identify succeed. > > > > *From:*nvmewin-bounces at lists.openfabrics.org > [mailto:nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Jeff Glass > *Sent:* Friday, March 21, 2014 9:09 AM > *To:* nvmewin at lists.openfabrics.org > *Subject:* Re: [nvmewin] Handling NVMe Passthrough IOCTLs > > > > You said "Note: I am getting proper handle from CreateFile for my NVMe > Device." Did you open \\.\PhysicalDriveXxxx > or \\.\ScsiXxxx > ? > > You need to open a handle to the adapter (i.e. Scsi) and not the disk > (i.e. PhysicalDrive) for IOCTL_SCSI_MINIPORT. That's the most common > mistake I've seen that causes this error. > > > On 3/21/201 8:57 AM, Saikrishna Ravikanti wrote: > > Hi Team, > > > > By referring PT_IOCTL.Doc and WDK SPTI Sample, I am developing an > application to send IOCTLs > > using _NVME_PASS_THROUGH_IOCTL structure. > > > > I am facing some problem. > > DeviceIoControl routine returning Error code 1 (Incorrect Function). > > > > Code shown below : > > > > PNVME_PASS_THROUGH_IOCTL pInBuffer = NULL; > > PNVME_PASS_THROUGH_IOCTL pOutBuffer = NULL; > > ULONG ByteSizeTX =4096; > > > > /* Allocate input buffer to accommodate size of > NVME_PASS_THRUGH_IOCTL only */ > > InputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL); > > pInBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(InputBufLen); > > /* Allocate output buffer to accommodate size of > NVME_PASS_THRUGH_IOCTL and data */ > > OutputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL)+ ByteSizeTX - 1; > > pOutBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(OutputBufLen); > > if (pInBuffer == NULL || pOutBuffer == NULL) > > return; > > > > /* Zero out the buffers */ > > memset(pInBuffer, 0, InputBufLen); > > memset(pOutBuffer, 0, OutputBufLen); > > > > pInBuffer->SrbIoCtrl.HeaderLength = sizeof(SRB_IO_CONTROL); > > memcpy((UCHAR*)(&pInBuffer->SrbIoCtrl.Signature[0]), NVME_SIG_STR, > NVME_SIG_STR_LEN); > > pInBuffer->SrbIoCtrl.Timeout = NVME_PT_TIMEOUT; > > pInBuffer->SrbIoCtrl.ControlCode = > (ULONG)NVME_PASS_THROUGH_SRB_IO_CODE; > > pInBuffer->SrbIoCtrl.Length = InputBufLen - sizeof(SRB_IO_CONTROL); > > > > pInBuffer->NVMeCmd[0]=ADMIN_IDENTIFY; > > pInBuffer->NVMeCmd[1]=0; > > pInBuffer->NVMeCmd[10]= 1; //Return corresponding controller data > structure > > > > pInBuffer->DataBufferLen = 0; > > pInBuffer->ReturnBufferLen = sizeof(NVME_PASS_THROUGH_IOCTL) + > ByteSizeTX - 1; > > > > status = DeviceIoControl( > > fileHandle, /* > Handle to \\.\scsi device via CreateFile */ > > IOCTL_SCSI_MINIPORT, /* IO control > function to a miniport driver */ > > pInBuffer , > /* Input buffer with data sent to driver */ > > InputBufLen, /* > Length of data sent to driver (in bytes) */ > > pOutBuffer, /* Output > buffer with data received from driver */ > > OutputBufLen, /* > Length of data received from driver */ > > &Count, > /* Bytes placed in DataBuffer */ > > NULL); > > > > Kindly let me know your valuable inputs about this behavior. > > > > Note: I am getting proper handle from CreateFile for my NVMe Device. > > > > Regards, > > Sai > > > > > _______________________________________________ > > nvmewin mailing list > > nvmewin at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin > > > > > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Fri Mar 21 16:12:18 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 21 Mar 2014 23:12:18 +0000 Subject: [nvmewin] NVMe Windows DB Is LOCKED - Pushing Patch From Sandisk For Reset Changes Message-ID: Locking NVMe Windows DB. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Fri Mar 21 16:46:27 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 21 Mar 2014 23:46:27 +0000 Subject: [nvmewin] NVMe Windows DB Is UNLOCKED - Pushing Patch From Sandisk For Reset Fixes Message-ID: Hi all, Thank you, Dharani, for the effort. The patch had been updated to the source base and a new tag called "Patch#23_Reset_Fix" had been created under tags directory. I will re-base PMC changes and send out for review/test early next week. Should you have any questions, please reply to the email listed below. Thanks, Alex nvmewin mailing list nvmewin at lists.openfabrics.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From saikrishna.ravikanti at gmail.com Fri Mar 21 17:16:47 2014 From: saikrishna.ravikanti at gmail.com (Saikrishna Ravikanti) Date: Sat, 22 Mar 2014 05:46:47 +0530 Subject: [nvmewin] Handling NVMe Passthrough IOCTLs In-Reply-To: <532CA7D2.70300@gmail.com> References: <532C6429.1040204@gmail.com> <49158E750348AA499168FD41D88983606269B8A2@FMSMSX105.amr.corp.intel.com> <532CA7D2.70300@gmail.com> Message-ID: Thanks for your inputs. I am opening the handle to disk. I'll try with \\.\ScsiXxx and get back to you. Regards, Sai On Sat, Mar 22, 2014 at 2:27 AM, Jeff Glass wrote: > If he has a handle to the disk device he should be able to send > IOCTL_SCSI_GET_ADDRESS to determine the ScsiXxx name of the HBA device. > The "PortNumber" field of the returned SCSI_ADDRESS structure is the Xxx > portion of \\.\ScsiXxx. > > > > > On 3/21/2014 1:11 PM, Robles, Raymond C wrote: > > Agreed with Jeff… but you will also want to make sure you are opening > the correct ScsiXxx handle. This varies depending on the system. Many times > it’s Scsi1or Scsi2, but essentially, you’ll have to walk through opening > handles and upon opening one successfully, seeing the Identify succeed. > > > > *From:* nvmewin-bounces at lists.openfabrics.org [ > mailto:nvmewin-bounces at lists.openfabrics.org] > *On Behalf Of *Jeff Glass > *Sent:* Friday, March 21, 2014 9:09 AM > *To:* nvmewin at lists.openfabrics.org > *Subject:* Re: [nvmewin] Handling NVMe Passthrough IOCTLs > > > > You said "Note: I am getting proper handle from CreateFile for my NVMe > Device." Did you open \\.\PhysicalDriveXxxx or \\.\ScsiXxxx? > > You need to open a handle to the adapter (i.e. Scsi) and not the disk > (i.e. PhysicalDrive) for IOCTL_SCSI_MINIPORT. That's the most common > mistake I've seen that causes this error. > > > On 3/21/201 8:57 AM, Saikrishna Ravikanti wrote: > > Hi Team, > > > > By referring PT_IOCTL.Doc and WDK SPTI Sample, I am developing an > application to send IOCTLs > > using _NVME_PASS_THROUGH_IOCTL structure. > > > > I am facing some problem. > > DeviceIoControl routine returning Error code 1 (Incorrect Function). > > > > Code shown below : > > > > PNVME_PASS_THROUGH_IOCTL pInBuffer = NULL; > > PNVME_PASS_THROUGH_IOCTL pOutBuffer = NULL; > > ULONG ByteSizeTX =4096; > > > > /* Allocate input buffer to accommodate size of NVME_PASS_THRUGH_IOCTL > only */ > > InputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL); > > pInBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(InputBufLen); > > /* Allocate output buffer to accommodate size of NVME_PASS_THRUGH_IOCTL > and data */ > > OutputBufLen = sizeof(NVME_PASS_THROUGH_IOCTL)+ ByteSizeTX - 1; > > pOutBuffer = (PNVME_PASS_THROUGH_IOCTL) malloc(OutputBufLen); > > if (pInBuffer == NULL || pOutBuffer == NULL) > > return; > > > > /* Zero out the buffers */ > > memset(pInBuffer, 0, InputBufLen); > > memset(pOutBuffer, 0, OutputBufLen); > > > > pInBuffer->SrbIoCtrl.HeaderLength = sizeof(SRB_IO_CONTROL); > > memcpy((UCHAR*)(&pInBuffer->SrbIoCtrl.Signature[0]), NVME_SIG_STR, > NVME_SIG_STR_LEN); > > pInBuffer->SrbIoCtrl.Timeout = NVME_PT_TIMEOUT; > > pInBuffer->SrbIoCtrl.ControlCode = (ULONG)NVME_PASS_THROUGH_SRB_IO_CODE; > > pInBuffer->SrbIoCtrl.Length = InputBufLen - sizeof(SRB_IO_CONTROL); > > > > pInBuffer->NVMeCmd[0]=ADMIN_IDENTIFY; > > pInBuffer->NVMeCmd[1]=0; > > pInBuffer->NVMeCmd[10]= 1; //Return corresponding controller data structure > > > > pInBuffer->DataBufferLen = 0; > > pInBuffer->ReturnBufferLen = sizeof(NVME_PASS_THROUGH_IOCTL) + ByteSizeTX > - 1; > > > > status = DeviceIoControl( > > fileHandle, /* Handle > to \\.\scsi device via CreateFile */ > > IOCTL_SCSI_MINIPORT, /* IO control function to > a miniport driver */ > > pInBuffer , /* Input > buffer with data sent to driver */ > > InputBufLen, /* Length of > data sent to driver (in bytes) */ > > pOutBuffer, /* Output buffer > with data received from driver */ > > OutputBufLen, /* Length of > data received from driver */ > > &Count, /* > Bytes placed in DataBuffer */ > > NULL); > > > > Kindly let me know your valuable inputs about this behavior. > > > > Note: I am getting proper handle from CreateFile for my NVMe Device. > > > > Regards, > > Sai > > > > > _______________________________________________ > > nvmewin mailing list > > nvmewin at lists.openfabrics.org > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin > > > > > _______________________________________________ > nvmewin mailing listnvmewin at lists.openfabrics.orghttp://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin > > > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Mon Mar 24 16:29:36 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Mon, 24 Mar 2014 23:29:36 +0000 Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Message-ID: Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pmc_patch_v2_0324_2014.zip Type: application/x-zip-compressed Size: 177742 bytes Desc: pmc_patch_v2_0324_2014.zip URL: From carolyn.d.foster at intel.com Thu Mar 27 11:18:19 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Thu, 27 Mar 2014 18:18:19 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: Message-ID: Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Mar 27 11:25:58 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 27 Mar 2014 18:25:58 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: Message-ID: Sure. I will look into it... Thanks, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Mar 27 12:14:15 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 27 Mar 2014 19:14:15 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: Message-ID: Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.c.robles at intel.com Thu Mar 27 12:18:01 2014 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Thu, 27 Mar 2014 19:18:01 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: Message-ID: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> Shouldn't S4 work as a boot and data device after the hibernation support patch? I/O generated during the hiber driver by the OS (to write out the hiber-file) should work regardless of any IOMeter workloads. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Thursday, March 27, 2014 12:14 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] PMC New Patch Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Mar 27 12:27:24 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 27 Mar 2014 19:27:24 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> References: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> Message-ID: Hi Ray, That's what I thought, too. For some reasons, after coming back from S4, IOMeter discontinues and prompts out error messages. After terminating it and re-launching IOMeter, it works fine. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:18 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Shouldn't S4 work as a boot and data device after the hibernation support patch? I/O generated during the hiber driver by the OS (to write out the hiber-file) should work regardless of any IOMeter workloads. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Thursday, March 27, 2014 12:14 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] PMC New Patch Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.c.robles at intel.com Thu Mar 27 12:33:24 2014 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Thu, 27 Mar 2014 19:33:24 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> Message-ID: <49158E750348AA499168FD41D88983606269F0EA@FMSMSX105.amr.corp.intel.com> Understood. However, the I/O that is not working is when entering S4 when the hiber-driver is loaded, as a boot device. This needs to work regardless of any other issues being seen, otherwise S4 as a boot device is not functional in the OFA driver. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 27, 2014 12:27 PM To: Robles, Raymond C; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Ray, That's what I thought, too. For some reasons, after coming back from S4, IOMeter discontinues and prompts out error messages. After terminating it and re-launching IOMeter, it works fine. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:18 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Shouldn't S4 work as a boot and data device after the hibernation support patch? I/O generated during the hiber driver by the OS (to write out the hiber-file) should work regardless of any IOMeter workloads. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Thursday, March 27, 2014 12:14 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] PMC New Patch Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Mar 27 17:56:13 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 28 Mar 2014 00:56:13 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: <49158E750348AA499168FD41D88983606269F0EA@FMSMSX105.amr.corp.intel.com> References: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> <49158E750348AA499168FD41D88983606269F0EA@FMSMSX105.amr.corp.intel.com> Message-ID: Hi Ray and Carolyn, Just let you know that I retested it as boot driver/hibernation with Patch#22, #23 and the patch I sent out. They are all working properly. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:33 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Understood. However, the I/O that is not working is when entering S4 when the hiber-driver is loaded, as a boot device. This needs to work regardless of any other issues being seen, otherwise S4 as a boot device is not functional in the OFA driver. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 27, 2014 12:27 PM To: Robles, Raymond C; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Ray, That's what I thought, too. For some reasons, after coming back from S4, IOMeter discontinues and prompts out error messages. After terminating it and re-launching IOMeter, it works fine. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:18 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Shouldn't S4 work as a boot and data device after the hibernation support patch? I/O generated during the hiber driver by the OS (to write out the hiber-file) should work regardless of any IOMeter workloads. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Thursday, March 27, 2014 12:14 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] PMC New Patch Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Fri Mar 28 17:24:36 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Sat, 29 Mar 2014 00:24:36 +0000 Subject: [nvmewin] PMC New Patch In-Reply-To: References: <49158E750348AA499168FD41D88983606269EFB4@FMSMSX105.amr.corp.intel.com> <49158E750348AA499168FD41D88983606269F0EA@FMSMSX105.amr.corp.intel.com> Message-ID: Thank you, Carolyn. Your suggestion makes sense. I will modify it accordingly and test again. Have a great weekend! Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Friday, March 28, 2014 4:01 PM To: Alex Chang; Robles, Raymond C; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Thank you Alex, I think it's a configuration issue on my part. The only feedback I really have for you about this patch is in NVMeAllocateMem in nvmeInit.c. On line 184, if the initial allocation attempt failed, we try to allocate from node 0. I'd like to see this changed to MM_ANY_NODE_OK instead of specifically hard coding it for node 0. I know this isn't something specific to your patch, but I think it will be a bit more generic and flexible. I have one or two more tests I'd like to wrap up on Monday, but I think the patch is looking good so far. Thanks! Carolyn From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 27, 2014 5:56 PM To: Robles, Raymond C; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Ray and Carolyn, Just let you know that I retested it as boot driver/hibernation with Patch#22, #23 and the patch I sent out. They are all working properly. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:33 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Understood. However, the I/O that is not working is when entering S4 when the hiber-driver is loaded, as a boot device. This needs to work regardless of any other issues being seen, otherwise S4 as a boot device is not functional in the OFA driver. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, March 27, 2014 12:27 PM To: Robles, Raymond C; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Ray, That's what I thought, too. For some reasons, after coming back from S4, IOMeter discontinues and prompts out error messages. After terminating it and re-launching IOMeter, it works fine. Regards, Alex From: Robles, Raymond C [mailto:raymond.c.robles at intel.com] Sent: Thursday, March 27, 2014 12:18 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Shouldn't S4 work as a boot and data device after the hibernation support patch? I/O generated during the hiber driver by the OS (to write out the hiber-file) should work regardless of any IOMeter workloads. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Thursday, March 27, 2014 12:14 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] PMC New Patch Hi Carolyn, Since I don't think my changes will introduce the problem, I replaced the driver with tag "Patch#22_Hibernation_Support", used our device as secondary drive and ran IOMeter to issue IOs to the drive. I've seen IOmeter reporting errors after the system/our device came back from hibernation properly. If no IO accesses, S4 works fine as either boot drive or secondary drive. Could you please verify that as well in your side? Once it's confirmed as a known issue, we need to decide when to fix it. Regards, Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, March 27, 2014 11:18 AM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: PMC New Patch Hi Alex, Were you able to test S4 as a boot device? I am seeing some issues with the IO during hiber driver execution. The hiber driver enumeration and initialization seems to complete with no issues, but after the first call to start io for the inquiry, I'm not seeing any more IO happen. I will try to debug further, but is this something you can look into? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Monday, March 24, 2014 4:30 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Non-contiguous Namespace ID support. Files changed: nvmeStat.c and nvmeInit.c (When fetching Namespace Structure with an invalid Namespace ID (which is less than value of NN field of Controller Structure), driver moves on to next Namespace ID as long as it's not larger than the value of NN field) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable/hibernation and resume. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress. 5. Quick/full disk formats. 6. Non-contiguous Namespace IDs. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Mon Mar 31 03:27:05 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Mon, 31 Mar 2014 10:27:05 +0000 Subject: [nvmewin] broken OFA NVMe driver development and SVN repo links Message-ID: <36E8D38D6B771A4BBDB1C0D800158A516B615506@SSIEXCH-MB3.ssi.samsung.com> Hi, The following links are no longer working all of a sudden: http://www.openfabrics.org/svnrepo/nvmewin/ https://www.openfabrics.org/resources/developer-tools/nvme-windows-development.html thanks, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From kens at flatbed.openfabrics.org Mon Mar 31 08:12:57 2014 From: kens at flatbed.openfabrics.org (kens at flatbed.openfabrics.org) Date: Mon, 31 Mar 2014 08:12:57 -0700 (PDT) Subject: [nvmewin] links and such Message-ID: <1396278777.57364@flatbed.openfabrics.org> We are migrating all web service to hardware. Some links and urls are not yet working, but I diligently trying to solve the issues. The web site, lists server, and mail server are running. Bugs are bugs.openfabrics.org/bugzilla/. The git daemon is running, but the web interface is not yet up. SVN is available through a client at svn://flatbed.openfabrics.org. The web interface is not up yet. My goal is to have them running today. Thanks for your patience. And thanks to Vladimir for help in getting the git daemon running. Ken From swise at opengridcomputing.com Mon Mar 31 08:42:41 2014 From: swise at opengridcomputing.com (Steve Wise) Date: Mon, 31 Mar 2014 10:42:41 -0500 Subject: [nvmewin] [ewg] links and such In-Reply-To: <1396278777.57364@flatbed.openfabrics.org> References: <1396278777.57364@flatbed.openfabrics.org> Message-ID: <001501cf4cf7$da2a44b0$8e7ece10$@opengridcomputing.com> FYI: This URL isn't working: t4:~ # wget www.openfabrics.org/downloads/OFED/ofed-3.12-daily/latest.tgz --2014-03-31 11:00:39-- http://www.openfabrics.org/downloads/OFED/ofed-3.12-daily/latest.tgz Resolving www.openfabrics.org... 69.55.231.74 Connecting to www.openfabrics.org|69.55.231.74|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2014-03-31 11:00:40 ERROR 404: Not Found. > -----Original Message----- > From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On > Behalf Of kens at flatbed.openfabrics.org > Sent: Monday, March 31, 2014 10:13 AM > To: nvmewin at openfabrics.org; ewg at openfabrics.org > Subject: [ewg] links and such > > We are migrating all web service to hardware. Some links and urls are not yet working, but I > diligently trying to solve the issues. The web site, lists server, and mail server are running. > Bugs are bugs.openfabrics.org/bugzilla/. The git daemon is running, but the web interface is > not yet up. SVN is available through a client at svn://flatbed.openfabrics.org. The web > interface is not up yet. My goal is to have them running today. > > Thanks for your patience. And thanks to Vladimir for help in getting the git daemon running. > > Ken