[nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Alex Chang Alex.Chang at pmcs.com
Wed Mar 5 10:13:29 PST 2014


Great! Thank you very much, Dharani, for the quick fixes.

Hi all,

Please review/test the patch and provide your feedback. If no big changes required, I will start to collect approvals from Intel and LSI early next week.

Regards,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Wednesday, March 05, 2014 9:03 AM
To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org
Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin


Hi Alex,

The attached is the patch with the changes incorporated and BSOD fix. The BSOD fix is in nvmeIo.c line 640.

Password: sndk1234

Thanks,
Dharani.

From: Dharani Kotte
Sent: Tuesday, February 25, 2014 3:25 PM
To: 'Alex Chang'; Foster, Carolyn D; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Alex,

Sure, I will go over the list below and make changes accordingly next week.

Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, February 25, 2014 2:09 PM
To: Alex Chang; Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Just left out one more :
In Line#1228 of nvmeInit.c, NVMeWaitOnRdy is called to replace assigning NextDriverState as NVMeWaitOnRdy. I think the assignment is still required before the initialization state machine starts.

Alex

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Tuesday, February 25, 2014 2:03 PM
To: Dharani Kotte; Foster, Carolyn D; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Carolyn and Dharani,

I basically agree the suggested changes in HwStorResetBus. However, I have some feedback:

1.       In Line#2161, NVMeResetAdapter is called and it returns after making sure RDY bit is cleared as 0, why we need 10 ms delay ? The exact same delay added in RecoveryDpcRoutine was because original NVMeResetAdater did not wait until RDY bit is cleared as 0. Due to the changes in NVMeResetAdapter, we need to remove the 10 ms delay in RecoveryDpcRoutine as well.

2.       In Line#2216, StorPortPause is called with 60 seconds to force Storport hold up requests. I am not sure 60 seconds is proper assumption. Instead, calling "StorPortBusy(pAdapterExtension, STOR_ALL_REQUESTS);" seems better idea to me.

3.       In the definition of HwStorResetBus, the routine returns TRUE in successful case. We need to take care of failed cases as well, i.e., any failures within NVMeSynchronizeReset. And StorPortResume should be called only in successful case.

Thanks,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Tuesday, February 25, 2014 4:28 AM
To: Foster, Carolyn D; Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Carolyn,

Line 1384: I can take care of this item.

Line 2219: StorPortSynchronizeAccess, This is the request from Samsung suggested by Judy. Below is the reference mail.

In our testing, we create a situation where we put the NVMe driver under heavy I/O load with Iometer and then cause the device to stop responding.  This results in I/O request timeouts which eventually causes the driver to be called at it's HwStorResetBus entry point (NVMeResetBus).  I have some feedback on the current architecture of that routine:


1.       Among other things, NMeResetBus schedules a DPC to complete any pending commands. This creates a situation where upon return from this entry point, there are still cmds outstanding which don't get completed till the DPC runs.  According to the WDK, this doesn't appear to be legal - all outstanding cmds have to be completed by the HwStorResetBus routine before it returns:

HwResetBus

Pointer to the miniport driver's HwStorResetBus<ms-help://MS.WDK.v10.7600.091201/Storage_r/hh/Storage_r/stormini_b3051379-4caa-4502-9492-a21672cfbf0d.xml.htm> routine, which is a required entry point for all miniport drivers. This member has the same meaning for the Storport version of the HW_INITIALIZATION_DATA structure as it does for the SCSI Port version of the structure. For more information, see the HwResetBus member of HW_INITIALIZATION_DATA (SCSI)
and
HwScsiResetBus must complete any outstanding requests by calling ScsiPortCompleteRequest with the SrbStatus value SRB_STATUS_BUS_RESET or, for individual SRBs, ScsiPortNotification with this status value.
and
The port driver pauses all device IO queues for the adapter and then calls the HwStorResetBus routine at IRQL DISPATCH_LEVEL after acquiring the StartIo spin lock. A miniport driver is responsible for completing SRBs received by HwStorStartIo<http://msdn.microsoft.com/en-us/library/windows/hardware/ff557423(v=vs.85).aspx> for PathId during this routine and setting their status to SRB_STATUS_BUS_RESET if necessary

Since HwStorResetBus must finish its work before returning; it can't schedule a DPC to do so later on. The logic which schedules a DPC should be removed.


2.       Code should be added to call StorPortPause() to hold off any new requests till StorPortResume() is called.


3.       Code should be added to call  StorPortSynchronizeAccess() in order to synchronize with HwStorInterrupt. A callback routine in the NVMe driver should also be added for NVMeResetBus to do the synchronized work in. HwStorResetBus is already synchronized with HwStorStartIo since the port driver calls it only after acquiring the StartIo spinlock.



4.       We should implement a driver-internal global (per "adapter") flag signifying we are busy with reset processing and thus can't allow new I/O requests to go through to the hardware.



5.       Code should be added to call StorPortResume() when all work is complete.



6.       We should refer to the WDK-supplied LSI parallel SCSI StorPort miniport sample driver for an example of all of the above.



Thanks,
Judy


Thanks,
Dharani.


From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D
Sent: Monday, February 24, 2014 3:51 PM
To: Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Alex and Dharni,

I have been reviewing the code and performing some tests and I have some concerns about this patch.

In nvmeStd.c:
Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine.  This change concerns me the most.  During a reset there is no need to send individual abort requests for outstanding commands.  When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue."  This reset behavior has been accounted for in the driver, by design.  In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands.

What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only.  Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands.

Also, during testing, I hit a D1 BSOD when I tried to step through the code.  I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell.  The IO should be timed out by storport, which will then send a reset lun.

Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed.  The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC.

Thanks,
Carolyn


From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Wednesday, February 19, 2014 10:06 AM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Thank you, Dharani.

Hi all,

Please review/test the attached reset fix patch from Sandisk and provide your feedbacks.

Thank you very much,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Wednesday, February 19, 2014 9:00 AM
To: Alex Chang
Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin


Hi Alex,

The attached is the patch source for review. I have tested the I/O running over night.

Areas need to be focused for test this patch:
1. Test abort/LUN resets.
2. Test chip reset.
3. Test the format command.
4.Test Firmware download command.

Password is "sndk1234"

Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, February 18, 2014 12:15 PM
To: Dharani Kotte
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Great!

Thanks,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Tuesday, February 18, 2014 12:14 PM
To: Alex Chang
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Just testing after merging the code it I should be able to  send it tomorrow morning.
Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, February 18, 2014 12:13 PM
To: Dharani Kotte
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Hi Dharani,

Just a friendly reminder, could you please send out your patch as soon as it's ready?

Many thanks,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, February 14, 2014 10:18 AM
To: Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Sure Alex.
Dharani.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Friday, February 14, 2014 10:17 AM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Good morning, Dharani,

As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes.

Regards,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Wednesday, January 15, 2014 2:08 PM
To: Alex Chang; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ?


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin


Hi Alex,

The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command.

Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Friday, December 20, 2013 11:54 AM
To: Dharani Kotte; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Happy Holidays to you all.
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, December 20, 2013 11:52 AM
To: Alex Chang; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Thank you for the explanation. Sure I will take look.
Happy Holidays.
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Friday, December 20, 2013 11:44 AM
To: Kwok Kong; Dharani Kotte; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Hi Dharani,

The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well.
Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations.

Thank you,
Alex


From: Kwok Kong
Sent: Friday, December 20, 2013 9:08 AM
To: Dharani Kotte; Akshay Mathur; Alex Chang
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Dharani,

Yes, these are the three areas that you are committed to.

Alex,

Please send more details on the "Controller reset does not handle all cases"  to Dharani.

Thanks

-Kwok

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, December 20, 2013 9:02 AM
To: Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Hi Kwok,

I think the below are the items that we are committing for:
- Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset
- Controller reset does not handle all cases
- orphaned requests

Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases".

Thanks,
Dharani.


From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Thursday, December 19, 2013 6:53 PM
To: Akshay Mathur
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Excellent! Your help is much appreciated.

Dharani,

Please let me know if you have any question.

Happy holiday to all of you.

-Kwok

From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com]
Sent: Thursday, December 19, 2013 6:51 PM
To: Kwok Kong
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Kwok,
You are welcome. We are pleased to contribute to the community and appreciate you driving it!

We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks.

Anyway, Dharani will be in touch with you as he makes progress.
Thanks
Akshay

From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Tuesday, December 17, 2013 4:21 PM
To: Akshay Mathur
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Akshay,

Thanks for your willingness to contribute to the driver.   I am looking for a patch before end of Jan 2014, the earlier the better.
Please let me know if Sandisk can commit to that.

Your help is much appreciated.

Thanks

-Kwok

From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com]
Sent: Tuesday, December 17, 2013 4:11 PM
To: Kwok Kong
Cc: Dharani Kotte; Dave Landsman; Akshay Mathur
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Kowk,
I manage the Software and driver development team at SanDisk/ESS.
We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed.
Thanks
Akshay Mathur
Sr Software Manager, Enterprise Storage Solutions
951 SanDisk Drive, Building #5  |  Milpitas, CA 95035 U.S.A.  |  Direct  +1 408.801.1336  |
Cell +1 856.607.7323  |  Corporate +1 408.801.1000  |  Akshay.Mathur at sandisk.com<mailto:Akshay.Mathur at sandisk.com>
[Description: cid:image001.jpg at 01CC358D.60974910]


From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Wednesday, December 11, 2013 18:00
To: Dave Landsman
Cc: Dharani Kotte
Subject: Would you please help to resolve a few OFA NVMe driver problems ?

Dave and Dharani,

There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems:
- remove #define for CHATHAM2
- Learning of CPU core to Vector failure handling

I am also making request to other companies to work on some of the issues.

I wonder if your company can work on the following three problems:
                - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset
                - Controller reset does not handle all cases
                - orphaned requests

Please let me know if your company can work on these two issues.

Thanks

-Kwok



________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).











-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20140305/5a4965e8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 9449 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20140305/5a4965e8/attachment.jpg>


More information about the nvmewin mailing list