[nvmewin] Patch with changes for disk Read only support

Thomas Freeman thomas.freeman at hgst.com
Thu Apr 21 09:18:08 PDT 2016


Hi Suman,
Thank you for the quick response. I agree with your comments.
Thank you,
Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital Corporation
e.  Thomas.freeman at hgst.com
o.  +1-507-322-2311

[cid:image004.jpg at 01D19BBF.78987050]

From: SUMAN PRAKASH B [mailto:suman.p at samsung.com]
Sent: Thursday, April 21, 2016 9:55 AM
To: Thomas Freeman <thomas.freeman at hgst.com>; nvmewin at lists.openfabrics.org
Cc: anshul at samsung.com; prakash.v at samsung.com; MANOJ THAPLIYAL <m.thapliyal at samsung.com>
Subject: Re: RE: [nvmewin] Patch with changes for disk Read only support


Hi Tom,



Thanks for the comments. Please find my replies below:



1. For mode sense with 0x3f, you set the WP in the response header. Shouldn WP also be set if any of those pages(0x8, 0xa, 0x1a, 0x1c) are individually requested?

[Suman] Below is our observations:

a. During driver install/device enable from dev manager/hot insert, the  mode pages 0x8 and 0x3f are invoked.

b. During online/offline of disk, only 0x3f is invoked.

So as per our understanding, during disk initialization, not all the mode pages will be called. But driver gets the mode page 0x3f every time during disk initialization. Also for Detection during run time, when driver returns SCSI_SENSE_DATA_PROTECT for sense data, driver gets the mode page 0x3f  consistently. So we feel, setting the WP bit for mode page 0x3f will suffice.



2. snti.c:8242

[Suman] Agreed. We will use pLunExt->IsNamespaceReadOnly = TRUE;



3. Along with the new member, IsNamespaceReadOnly, the nvme_lun_extension also has ReadOnly. It seems like the setting of WP should take into account the value of both members.

[Suman] The OFA driver supports only the first lba range type for a namespace, though spec supports 64 lba range types per NS. This has to be corrected first.

Also we have to decide if the disk should be exposed as Read Only if any of the LBA range type is read only or only if the LBA 0 is read only.

I feel this should be taken as a separate patch since this involves too many changes.



4. If the NVMe command Get Log Page fails, (SCT != Generic_command_status || SC != Successful completion), the buffer pSrbExt->pDatBuffer is not freed. This corresponds to the allocation at snti.c:6530.

[Suman] Agreed.



5. snti.c:6539: I think the following can be eliminated. The same copy occurs during SntiTranslateModeSenseResponse - snti.c:8272.

[Suman] Agreed.



Please let us know your opinion.



Regards,

Suman



------- Original Message -------

Sender : Thomas Freeman<thomas.freeman at hgst.com>

Date : Apr 20, 2016 21:59 (GMT+05:30)

Title : RE: [nvmewin] Patch with changes for disk Read only support


Hi Suman,
After reviewing the code, I have a few questions/comments:


1.       For mode sense with 0x3f, you set the WP in the response header. Shouldn WP also be set if any of those pages(0x8, 0xa, 0x1a, 0x1c) are individually requested?


2. snti.c:8242
Lun = pLunExt->namespaceId - 1;
pDevExt->pLunExtensionTable[Lun]->IsNamespaceReadOnly = TRUE;

These 2 lines can be replaced with
pLunExt->IsNamespaceReadOnly = TRUE;

Also, the original code is not a reliable way to determine the LUN id.
Here is an example where there doesn't work. The device has attached namespaces 1,3 & 4 and Existing namespaces of 1, 2, 3 & 4. LUNs 0-3 will correspond to namespaces 1,3,4,2. For namespace 3, the calculation NSID-1=lun will incorrectly give you LUNid of 2.

3. Along with the new member, IsNamespaceReadOnly, the nvme_lun_extension also has ReadOnly. It seems like the setting of WP should take into account the value of both members.

4. If the NVMe command Get Log Page fails, (SCT != Generic_command_status || SC != Successful completion), the buffer pSrbExt->pDatBuffer is not freed. This corresponds to the allocation at snti.c:6530.

5. snti.c:6539: I think the following can be eliminated. The same copy occurs during SntiTranslateModeSenseResponse - snti.c:8272.
if (GET_DATA_BUFFER(pSrb) != NULL) {
            StorPortCopyMemory((PVOID)GET_DATA_BUFFER(pSrb),
                (PVOID)(pSrbExt->modeSenseBuf), GET_DATA_LENGTH(pSrb));
        }
Let me know if you have questions,
Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital Corporation
e.  Thomas.freeman at hgst.com
o.  +1-507-322-2311

[cid:image002.jpg at 01D19BBF.789032F0]

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of SUMAN PRAKASH B
Sent: Tuesday, April 19, 2016 8:45 AM
To: nvmewin at lists.openfabrics.org
Cc: sukka.kim at samsung.com; prakash.v at samsung.com; anshul at samsung.com; MANOJ THAPLIYAL <m.thapliyal at samsung.com>; tru.nguyen at ssi.samsung.com
Subject: [nvmewin] Patch with changes for disk Read only support


Hi all,

This patch includes changes for supporting NVMe Disk read only mode.

I have made a detailed overview of the changes in the attached doc file(the contents are also copied here below) and the attached zip file contains the source code.

Password is samsungnvme

Please let me know if you have any questions.

Thanks,
Suman



******************



NVMe Disk End of Life support:

Whenever NVMe disk exhausts the P/E cycles, the disk become Read only(reaches End Of Life). In this case, the user should be able to read the data from the disk for backup or migration purpose. To achieve this, the driver should inform the kernel that disk has become read only. If driver does not inform the kernel, the disk will be unusable from Windows.

The device has to be detected as Read only in following 2 scenarios –

a. Detection during device hot plug

When a Read only device is hot inserted, the kernel should be able to enumerate the device as Read only and alert the user accordingly. When the SSD is hot inserted, as part of disk initialization process, a SCSI mode sense command with page code ‘Return all pages’ (0x3f) is requested by the kernel. The mode page has a mode parameter header, which has a WP bit in the 'Device specific Parameter' field which indicates if the device is Write Protected for some reason. We can make use of this field to report to the kernel that the device has become Read only.

When the miniport driver receives this request, the NVM Express command Get log page is built with log identifier 'SMART / Health Information' (0x2) and send to the device. The SMART data has a 'Critical Warning' field in which a bit 'MediaInReadOnlyMode' is set whenever the media becomes Read only. So if the device returns SMART data with this bit set, the miniport driver sets the Device specific parameter – WP bit in mode parameter header and completes the command.

When the WP bit is set in the mode parameter header, the kernel will understand that the device is Write protected and hence kernel will not send any more write requests.

b. Detection during run time

When the device is in use and the Write exhausts and device becomes Read only, the kernel has to immediately report to the user that device has become write protected. To achieve this, whenever the device receives a NVMe Write request after it has become Read only, the device sets SCT to Command Specific Status and SC to 'Attempted Write to Read Only Range' in response to the write command.  For this the following sense data is returned for the corresponding SCSI write command.

Sense data – SCSI_SENSE_DATA_PROTECT, ASC – SCSI_ADSENSE_WRITE_PROTECT and ASCQ – SCSI_ADSENSE_NO_SENSE.

With this sense data, the kernel will understand that the device is in Write protected state for which the Mode sense command with mode page 'Return all pages' will be send to the device. Again with the NVM Express Get log page – SMART command, the miniport driver will return the mode sense 'Data Specific parameter' accordingly.



Code changes:

1. In SntiReturnAllModePages(), build get log page for SMART/health information and send to device.

2. In SntiTranslateModeSenseResponse(), for log page MODE_SENSE_RETURN_ALL, set the Write protect bit in device specific parameter in the mode header based on the media in read only mode bit(03) in critical warning field returned in SMART/health log page.

3. The checking for volatile write cache is moved from SntiReturnAllModePages() to SntiTranslateModeSenseResponse() after successful completion of get log page command.



We have tested the following:

a. On a Read only NVMe SSD, install OFA driver with these changes. In the disk management tool, the status of disk is shown as Read Only. Please find attached “DiskMgmt.jpg” (sometimes requires a system restart after driver installation).

b. Hot insert a RO NVMe SSD and observe status as Read Only in disk management tool.

c. On NVMe SSD, which has less % of available spare(for example 10%), execute io meter tool with write commands. When available spare reaches 0%, the error count in io meter tools starts increasing(i.e. write commands fails with the sense data, as explained in above sections), and status becomes Read Only in disk management tool.

d. After disk becomes RO, when we try to copy files to the RO drive, Windows show message "The disk is write protected". Please find attached “FileCopy.jpg”

Note:

a. As per NVMe spec 1.2, section 5.10.1.2, "There is not namespace specific information defined in the SMART / Health log page in this revision, thus the global log page and namespaces specific log page contain identical information". So when testing with multi namespace, when 1 namespace becomes RO, all the namespace will become RO. Spec has to be defined to have separate SMART /Health data per namespace.

b. For testing, if there is no NVMe SSD which is in RO state, the following changes can to be made in the driver to test this feature:

    1. In SntiTranslateModeSenseResponse(), hardcode pNvmeLogPage->CriticalWarning.MediaInReadOnlyMode to 1, before checking for the value. This can be done for per namespace also.

    2. In SntiMapCompletionStatus(), for NVMe write command, hardcode statusCodeType to COMMAND_SPECIFIC_ERRORS and statusCode to 0x82. This can be done for per namespace also.



[Image removed by sender.]

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.





[cid:image003.gif at 01D19BBF.789032F0]

[Image removed by sender.]
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160421/7a776c5f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160421/7a776c5f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 2938 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160421/7a776c5f/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.gif
Type: image/gif
Size: 13168 bytes
Desc: image003.gif
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160421/7a776c5f/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 2934 bytes
Desc: image004.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160421/7a776c5f/attachment-0002.jpg>


More information about the nvmewin mailing list