[nvmewin] Samsung patch for Hot plug fixes

Judy Brock-SSI judy.brock at ssi.samsung.com
Tue Nov 4 17:02:03 PST 2014


Hi Alex,
Suman may have more to say about this but let me take a crack at answering your questions:

1.      In NVMeInitCallback function, why we don't need to check SC and SCT anymore for NVMeWaitOnLearnMapping case?
The read cmds we send during learning are not necessarily expected to succeed - in fact we expect they very well may fail since it's early on in initialization so there is a great likely hood that namespaces are not ready at that point. We don't care about getting NAMESPACE_NOT_READY  status back -  we just want to get a cmd out to each submission queue in order to take note of which logical processor is running when the command completion associated with that command occurs.
Since each queue pair is tied to a particular MSI-x vector , this allows us to "learn" what the optimal logical processor-to-queue pair relationship should be and we adjust our internal tables accordingly so as to ensure that IOs are launched and completed on the same logical processor to avoid the overhead of processor context switching, etc.
That is the relationship we are trying to "learn" about - again, we don't care if the read cmds complete with success or error status, we just care about what the current logical processor is at the time of cmd completion, we just send them as a mechanism to discover what logical processor is associated with the completion queue associated with a given submission queue.
2.  The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others?
Suman can speak to the decision to use this exact number but I believe one of the reasons is to ensure that when a device is surprise removed, it disappears from Device Manager right away.
Thanks,
Judy

From: Alex Chang [mailto:Alex.Chang at pmcs.com]
Sent: Tuesday, November 04, 2014 12:41 PM
To: suman.p at samsung.com; nvmewin at lists.openfabrics.org; Judy Brock-SSI
Subject: RE: Samsung patch for Hot plug fixes

Hi Suman,
I have couple of questions for you:

1.      In NVMeInitCallback function, why we don't need to check SC and SCT anymore for NVMeWaitOnLearnMapping case?

2.      The resolution of start surprise removal timer is set as 1 second. What are the reasons to set it as 1 second rather than others?
Thank you!
Alex

From: SUMAN PRAKASH B [mailto:suman.p at samsung.com]
Sent: Tuesday, November 04, 2014 2:50 AM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Alex Chang; judy.brock at ssi.samsung.com<mailto:judy.brock at ssi.samsung.com>
Subject: Samsung patch for Hot plug fixes


Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit

Date: %%SENT_DATE%%

Subject: Suspect Message Quarantined







WARNING: The virus scanner was unable to scan an attachment in an email message sent to you.  This attachment could possibly contain viruses or other malicious programs.  The attachment could not be scanned for the following reasons:



%%DESC%%



The full message and the attachment have been stored in the quarantine.



The identifier for this message is '%%QID%%'.



Access the quarantine at:

https://puremessage.pmc-sierra.bc.ca:28443/



For more information on PMC's Anti-Spam system:

http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ



IT Services

PureMessage Admin



Hi Everyone,

We have a patch for the Hot plug fixes.

Please find attached the source code. The password is samsung123



Please find the change description below -



1) Surprise removal while IOs are in progress.

To reproduce this scenario -
Connect the disk and execute IOmeter on the disk volume. When IOs are in progress, surprise remove the device. User expects that the device should be removed from device manager immediately and iometer should increase the error count field. This does not happen since we don't handle this scenario in OFA driver.

Resolution -
a. Added a new function IsdeviceRemoved(). This is a recursive function. Compares the values of Version Register values with old value and incase of mismatch complete the outstanding commands with SRB_STATUS_ERROR. (nvmeStd.c/IsDeviceRemoved)
b. Start the Timer for IsDeviceRemoved() when the NextDriverState is set to StartComplete.(nvmeStat.c/NVMeRunning)
c. Stop the timer for IsDeviceRemoved() incase of ScsiStopAdapter. (nvmeStat.c/NVMeAdapterControl)
d. Restart the timer for IsDeviceRemoved() incase of ScsiRestartAdapter. (nvmeStat.c/NVMeAdapterControl)
e. Stop the timer for IsDeviceRemoved() incase of SRB_FUNCTION_SHUTDOWN. (nvmeStd.c/NVMeBuildIo)
f. If DeviceRemovedDuringIO flag is set to TRUE, complete the SRBs with SRB_STATUS_ERROR for the IOs. This case is to handle the IOs received once the device has been surprise removed. (nvmeStdc/NVMeBuildIo)
g. Modified the prototype of NVMeDetectPendingCmds function. When device is surprise removed when IOs are pending, the outstanding IOs has to be completed with SRB_STATUS_ERROR. (nvmeIo.c/NVMeDetectPendingCmds)
h. Call the NVMeDetectPendingCmds function with SRB_STATUS_BUS_RESET. (nvmeInit.c/NVMeNormalShutdown, nvmePwrMgmt.c/NVMeAdapterControlPowerDown, nvmeStd.c/RecoveryDpcRoutine)



2) Memory leak issues.

To reproduce this scenario -
a. Memory leak observed during hot removal in Resource monitor->Non-paged pool. (On Server2012R2 -> Task Manager -> Performance -> Non-paged pool)
b. Memory leak observed during disable/enable the NVMe controller in device manager.

Resolution -
To fix memory leak, in NVMeBuildIo()->SRB_FUNCTION_PNP, when PnPAction is StorRemoveDevice(disable controller) and StorSurpriseRemoval(hot remove device), NVMeAdapterControlPowerDown() is invoked to stop the adapter and then NVMeFreeBuffers is invoked to free the memory. At this point, since the ShutdownInProgress is set in NVMeAdapterControlPowerDown(), nothing is done during NVMeAdapterControl() - ScsiStopAdapter -> NVMeAdapterControlPowerDown().



3) Surprise Removal during Disk Initialization

To reproduce this scenario -
Hot insert the device and hot remove the device immediately. At this point, our driver might be executing the initialization state machine in NVMePassiveInitialize. The device will not be immediately removed from the device manger. The while loop will be active till passiveTimeout happens, then system BSOD.

Resolution -
a. Read the Version register. This is used to compare against the value in version register after a surprise removal. (nvmeStd.c/NVMeFindAdapter)
b. Read the Version Register and compare with old Version Register value(i.e. value read in NVMeFindAdapter). Mismatch in these values means surprise removal. (nvmeStd.c/NVMePassiveInitialize)
c. Set the NextDriverState to NVMeStartComplete and DeviceRemovedDuringIO to TRUE and return TRUE from NVMePassiveInitialize.
d. Driver may get commands in NVMeBuildIo, where driver returns SRB_STATUS_ERROR when DeviceRemovedDuringIO is set to TRUE.
e. Then NVMeAdapterControl() - ScsiStopAdapter is executed.



4) Delay in removing the device from device manager after hot removal of device. When device is hot removed, the NVMeAdapterControlPowerDown() -> NVMeResetAdapter() -> NVMeWaitForCtrlRDY() is invoked which sets

the EN bit to 0 and waits for RDY bit to become 0. Since the device is physically removed, the memory mapped registers will be come all 1's and the RDY bit will never become 0. Hence the while loop in NVMeWaitForCtrlRDY() is active for some time even after device removal and hence device is not removed from device manager immediately.

Resolution -
Check for the value of CSTS. If its 0xFFFFFFFF, then device has been surprise removed and return TRUE. (nvmeStd.c/NVMeWaitForCtrlRDY)



5) Avoid redundant call of NVMeResetAdapter()
a. File/Function: nvmeInit.c/NVMeEnableAdapter - Removed the NVMeResetAdapter() function call from NVMeEnableAdapter() as this is redundant.  The NVMeResetAdapter() is being invoked in the RecoveryDpcRouitne() and then again its being invoked in the NVMeEnableAdapter.
b. In the NVMeInitialize() function the EN and RDY bit are set to 0 before the  NVMeEnableAdapter() is being invoked. But NVMeResetAdapter() does again the same functionality.



6) When testing hot insertion with different devices, we observed some devices returned NAMESPACE_NOT_READY for IO commands during learning cores and disk initialization(report luns, inquiry, etc). To address this issue and provide support for these devices in the driver, we have done the following changes.
a. During learning cores, driver sends read commands on all the queues to get the core to MSI-x mapping. When the read commands are interrupted, in the NVMeInitCallback(), if the SC and SCT values are not 0, then the learning cores is not completed. This check is not required as driver wants only the core to MSI-x mapping. Since this is not a fatal error, we can skip reading the SC and SCT values, as this will impact the performance. (nvmeInit.c/NVMeInitCallback).
b. Following the above, when the initialization state machine is complete and kernel starts sending SCSI commands for disk initialization, and when device returns NAMESPACE_NOT_READY, this has to be translated to the corresponding SCSI sense data so that the commands will be re-tried after some time. (nvmeSnti.c/genericCommandStatusTable[]).


Tested the following.

- WHCK on Win7 and 2012R2
- Install/Uninstall, Enable/Disable, FS Format
- Hibernation/Resume, Sleep/Resume
- IOmeter
- Hot removal which iometer is running.
- Hot removal immediately after hot insertion.
- Continous hot insert and remove operations.
- Check for device removal after following sequence - Hot insert, system hibernation, Hot remove, system resume.
- Check for device presense after following sequence - System hibernation, hot insert, system resume.
- Memory leaks during hot plug operations and disable/enable.



Thanks,
Suman



[cid:image001.gif at 01CFF850.87C6AB50]

[Image removed by sender.]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20141105/9e6d8527/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 13168 bytes
Desc: image001.gif
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20141105/9e6d8527/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 823 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20141105/9e6d8527/attachment.jpg>


More information about the nvmewin mailing list