[nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Alex Chang Alex.Chang at pmcs.com
Mon Feb 24 18:13:34 PST 2014


Hi Dharani,

Your patch is meant for the reset issues. Now, the bug check seems happening due to a forced timeout that would later trigger reset requests from Storport. Could you please mimic the way Carolyn had done to replicate it? I will also do some similar testing as well.

Thanks,
Alex

From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com]
Sent: Monday, February 24, 2014 4:06 PM
To: Alex Chang; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Unfortunately I did not have time to debug the failure.  I saw that we did get the reset from Storport, and that it called into the new function.  But I'm not sure where exactly after that it crashed.

Carolyn

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Monday, February 24, 2014 4:59 PM
To: Foster, Carolyn D; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Carolyn,

Did you root cause why D1 BSOD hit when you forced a time out?

Thanks,
Alex

From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com]
Sent: Monday, February 24, 2014 3:51 PM
To: Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Hi Alex and Dharani,

I have been reviewing the code and performing some tests and I have some concerns about this patch.

In nvmeStd.c:
Line 1384: NVMeProcessAbortLunReset - This change will now send abort commands for all pending requests when a RESET_LOGICAL_UNIT request comes in, instead of issuing the RecoveryDpc routine.  This change concerns me the most.  During a reset there is no need to send individual abort requests for outstanding commands.  When the LUN reset comes in, we will set CC.EN to 0 and the spec clearly states that "the controller shall not process commands nor post completion queue entries to the completion queue."  This reset behavior has been accounted for in the driver, by design.  In the LUN reset case, we should continue to issue the recovery DPC routine, which will complete all outstanding commands.

What should happen here is that the new processAbortLun function should be moved under the SRB_FUNCTION_ABORT_COMMAND only.  Then the procesAbortLunReset function should only send one abort and not abort all outstanding commands.

Also, during testing, I hit a D1 BSOD when I tried to step through the code.  I ran IO and forced a timeout by using the debugger to skip over the line of code that rings the submission queue doorbell.  The IO should be timed out by storport, which will then send a reset lun.

Line 2219: StorPortSynchronizeAccess - I don't understand why this is needed.  The SynchronizeReset function looks very much like the recovery DPC routine, which should already be synchronized with Start IO and the interrupt DPC.

Thanks,
Carolyn


From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Wednesday, February 19, 2014 10:06 AM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] ***UNCHECKED*** FW: Re-send Sandisk Patch For Reset Fixes

Thank you, Dharani.

Hi all,

Please review/test the attached reset fix patch from Sandisk and provide your feedbacks.

Thank you very much,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Wednesday, February 19, 2014 9:00 AM
To: Alex Chang
Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin


Hi Alex,

The attached is the patch source for review. I have tested the I/O running over night.

Areas need to be focused for test this patch:
1. Test abort/LUN resets.
2. Test chip reset.
3. Test the format command.
4.Test Firmware download command.

Password is "sndk1234"

Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, February 18, 2014 12:15 PM
To: Dharani Kotte
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Great!

Thanks,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Tuesday, February 18, 2014 12:14 PM
To: Alex Chang
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Just testing after merging the code it I should be able to  send it tomorrow morning.
Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, February 18, 2014 12:13 PM
To: Dharani Kotte
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Hi Dharani,

Just a friendly reminder, could you please send out your patch as soon as it's ready?

Many thanks,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, February 14, 2014 10:18 AM
To: Alex Chang; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Sure Alex.
Dharani.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang
Sent: Friday, February 14, 2014 10:17 AM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] Re-send Sandisk Patch For Reset Fixes

Good morning, Dharani,

As you may know, both Intel and Huawei patches had been added into OFA source base. Now, you may re-base your changes and send a patch out for review/test. Thank you very much for contributing the fixes.

Regards,
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Wednesday, January 15, 2014 2:08 PM
To: Alex Chang; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: [WARNING - ENCRYPTED ATTACHMENT NOT VIRUS SCANNED] RE: Would you please help to resolve a few OFA NVMe driver problems ?


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin


Hi Alex,

The attached is the source for the preliminary review. I have tested the IO and scsi compliance test. I don't have a drive which supports abort/lun resets, not sure how to test the format command.

Thanks,
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Friday, December 20, 2013 11:54 AM
To: Dharani Kotte; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Happy Holidays to you all.
Alex

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, December 20, 2013 11:52 AM
To: Alex Chang; Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Thank you for the explanation. Sure I will take look.
Happy Holidays.
Dharani.

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Friday, December 20, 2013 11:44 AM
To: Kwok Kong; Dharani Kotte; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Hi Dharani,

The controller reset can be issued from either from the host or the driver itself. Currently, the driver seems handling them in the same manner via single entry "NVMeResetController". In the case of "from the host", the driver needs to separate the cases of SRB_FUNCTION_RESET_... requests from the ioctl request of NVME_RESET_DEVICE in the sense of handling pending IOs. In the case of "the driver itself", needs to re-exam the related error recovery codes as well.
Judy from Samsung suggested referring the storahci.sys driver sample codes for Windows 7/8 based on reset bus logic examples and detailed recommendations.

Thank you,
Alex


From: Kwok Kong
Sent: Friday, December 20, 2013 9:08 AM
To: Dharani Kotte; Akshay Mathur; Alex Chang
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Dharani,

Yes, these are the three areas that you are committed to.

Alex,

Please send more details on the "Controller reset does not handle all cases"  to Dharani.

Thanks

-Kwok

From: Dharani Kotte [mailto:Dharani.Kotte at sandisk.com]
Sent: Friday, December 20, 2013 9:02 AM
To: Kwok Kong; Akshay Mathur
Cc: Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Hi Kwok,

I think the below are the items that we are committing for:
- Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset
- Controller reset does not handle all cases
- orphaned requests

Can somebody provide little bit more details on the expectation for the item "Controller reset does not handle all cases".

Thanks,
Dharani.


From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Thursday, December 19, 2013 6:53 PM
To: Akshay Mathur
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Excellent! Your help is much appreciated.

Dharani,

Please let me know if you have any question.

Happy holiday to all of you.

-Kwok

From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com]
Sent: Thursday, December 19, 2013 6:51 PM
To: Kwok Kong
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Kwok,
You are welcome. We are pleased to contribute to the community and appreciate you driving it!

We will try our best to complete the implementation by end of January but we may not be able to complete comprehensive testing by that time. This is because of overlaps with few internal business deliverables and a company-wide shut-down for next 1.5 weeks.

Anyway, Dharani will be in touch with you as he makes progress.
Thanks
Akshay

From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Tuesday, December 17, 2013 4:21 PM
To: Akshay Mathur
Cc: Dharani Kotte; Dave Landsman
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Akshay,

Thanks for your willingness to contribute to the driver.   I am looking for a patch before end of Jan 2014, the earlier the better.
Please let me know if Sandisk can commit to that.

Your help is much appreciated.

Thanks

-Kwok

From: Akshay Mathur [mailto:Akshay.Mathur at sandisk.com]
Sent: Tuesday, December 17, 2013 4:11 PM
To: Kwok Kong
Cc: Dharani Kotte; Dave Landsman; Akshay Mathur
Subject: RE: Would you please help to resolve a few OFA NVMe driver problems ?

Kowk,
I manage the Software and driver development team at SanDisk/ESS.
We are certainly willing to contribute to fixing the problems listed below but before we can commit, we would like to get clarification on the timeline i.e. by when these fixes are expected to be completed.
Thanks
Akshay Mathur
Sr Software Manager, Enterprise Storage Solutions
951 SanDisk Drive, Building #5  |  Milpitas, CA 95035 U.S.A.  |  Direct  +1 408.801.1336  |
Cell +1 856.607.7323  |  Corporate +1 408.801.1000  |  Akshay.Mathur at sandisk.com<mailto:Akshay.Mathur at sandisk.com>
[Description: cid:image001.jpg at 01CC358D.60974910]


From: Kwok Kong [mailto:Kwok.Kong at pmcs.com]
Sent: Wednesday, December 11, 2013 18:00
To: Dave Landsman
Cc: Dharani Kotte
Subject: Would you please help to resolve a few OFA NVMe driver problems ?

Dave and Dharani,

There are some issues with the current OFA driver that need to be fixed. PMC is working on resolving some of the problems. Intel has agreed to work on the following two problems:
- remove #define for CHATHAM2
- Learning of CPU core to Vector failure handling

I am also making request to other companies to work on some of the issues.

I wonder if your company can work on the following three problems:
                - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset
                - Controller reset does not handle all cases
                - orphaned requests

Please let me know if your company can work on these two issues.

Thanks

-Kwok



________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).









-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20140225/ec0b14be/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 9449 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20140225/ec0b14be/attachment.jpg>


More information about the nvmewin mailing list