From judy.brock at ssi.samsung.com Mon Jan 6 20:08:39 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Tue, 7 Jan 2014 04:08:39 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded message was scrubbed... From: Judy Brock-SSI Subject: FW: [nvmewin] Handling pending commands when processing Format Date: Tue, 7 Jan 2014 03:04:30 +0000 Size: 176130 URL: From barrett.n.mayes at intel.com Mon Jan 6 21:15:24 2014 From: barrett.n.mayes at intel.com (Mayes, Barrett N) Date: Tue, 7 Jan 2014 05:15:24 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> Message-ID: What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngalbo at micron.com Tue Jan 7 06:26:56 2014 From: ngalbo at micron.com (Neal Galbo (ngalbo)) Date: Tue, 7 Jan 2014 14:26:56 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> Message-ID: <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jan 7 08:44:28 2014 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 7 Jan 2014 16:44:28 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at pmcs.com Tue Jan 7 09:57:11 2014 From: Kwok.Kong at pmcs.com (Kwok Kong) Date: Tue, 7 Jan 2014 17:57:11 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress Message-ID: <03D88B383FA04244AA514AA931F7B1290D26B138@BBYEXM01.pmc-sierra.internal> I would agree with Barrett. What problem do you want to solve here ? Why do you want to send IO to a device while you are doing a NVMe format ? Since a NVMe format may change the sector size and all data are gone, a NVMe format should behave as if a drive had been removed and a new drive was added. I think the simplest procedure to do a NVMe format is to: - offline the LUN (or remove the LUN from the system) - NVMe format - online the LUN Do you see any problem with this simple approach ? Have you a use case that this procedure does not work ? Thanks -Kwok ---------------------------------------------------------------------- Message: 1 Date: Tue, 7 Jan 2014 05:15:24 +0000 From: "Mayes, Barrett N" To: Judy Brock-SSI , "nvmewin at lists.openfabrics.org" Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Message-ID: Content-Type: text/plain; charset="us-ascii" What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy From judy.brock at ssi.samsung.com Tue Jan 7 18:41:49 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 02:41:49 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> >>In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. Like SCSI LUNs, NVMe namespaces are also not transient and (I now submit, having reversed positions :)) that they should not hide either. Format NVM command does not result in a different number of Namespaces then existed before the operation after it is finished. While namespaces can be formatted with different LBA Format than previously, they are still static in terms of their existence/non-existence. Additionally , the current code is not achieving what it intended to do since it does not actually hide any LUNS before launching the Format NVM op. That is it does not wait till the existing LUNs are "gone" (ie until the miniport fails to report them in a subsequent inquiry) before it starts the operation. Thanks, Judy From: Neal Galbo (ngalbo) [mailto:ngalbo at micron.com] Sent: Tuesday, January 07, 2014 6:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Tue Jan 7 19:55:47 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 03:55:47 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478ABD@SSIEXCH-MB3.ssi.samsung.com> >> I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format The block size on a SCSI device can change with a SCSI mode select command via a mode parameter block descriptor. That would result in sense code = UNIT ATTENTION and additional sense code = MODE PARAMETERS CHANGED. We could return this combo at the end of a Format NVM op that did in fact change the block size attribute of one or more namespaces. But once the format NVM op finishes, wouldn't indicating that a bus change has occurred / re-enumeration is required more than cover the base of getting the upper layers to discover the potentially changed block size since that notifies Storport that it needs to do discovery all over again? Just curious why this was thought to be insufficient. Thanks, Judy From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Tuesday, January 07, 2014 8:44 AM To: Neal Galbo (ngalbo); Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From akyros000 at gmail.com Tue Jan 7 20:12:32 2014 From: akyros000 at gmail.com (Jeff Glass) Date: Tue, 07 Jan 2014 20:12:32 -0800 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> Message-ID: <52CCD030.8080804@gmail.com> Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. In my experience, SCSI RAID controller or HBA driver's don't do anything to get the O/S to recognize the change which in turn requires the user to manually disable and re-enable the device to get Windows to recognize that the device's capacity has changed. The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. Jeff On 1/7/2014 8:44 AM, Luse, Paul E wrote: > > Wrt Judy's (a) below, So I believe the original concern that drove us > to the implementation was that w/NVMe the block size can be changed > with a format whereas that can't happen with a SCSI format... I could > be mistaken but on a quick scan of the email threads I didn't see that > point mentioned. We felt like the only way to get the upper layers to > discover the potentially changed block size was to tear it down/bring > it back up > > > > Thx > > Paul > > > > *From:* nvmewin-bounces at lists.openfabrics.org > [mailto:nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Neal > Galbo (ngalbo) > *Sent:* Tuesday, January 7, 2014 7:27 AM > *To:* Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org > *Subject:* Re: [nvmewin] Handling IO when Format NVM op is in progress > > > > The conditions you describe would never happen with a SCSI device. In > general, LU's and LUN's never disappear in SCSI; they are not > transient. They are not dynamic. They are static. They either exist or > they don't. They don't hide, once enumerated/attached/located. Unlike > namespaces. > > > > The media, backing storage or provisioning can change, but the LU > would always be available for communication (commands). Other LU's in > the same device would not be affected either -- they are independent > entities relative to each other. > > > > -Neal > > > > *From:*nvmewin-bounces at lists.openfabrics.org > > [mailto:nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Mayes, > Barrett N > *Sent:* Tuesday, January 07, 2014 12:15 AM > *To:* Judy Brock-SSI; nvmewin at lists.openfabrics.org > > *Subject:* Re: [nvmewin] Handling IO when Format NVM op is in progress > > > > What problem do you want to solve by keeping the block device around > during a format and allowing IO through so it can be failed with check > condition? > > > > Namespace not ready can't generically be translated to SCSI check > condition/not ready/Format In Progress. The driver would need to know > that a format command is outstanding so it could translate that > correctly (for 1.0-based device support) since the namespace could be > not ready for reasons other than a format in progress. If the driver > already has to know a format is in progress, it could just fail > commands without sending it to the device (so no need for the new > failure code). But if that's the case, why _/not/_ hide the LUN until > the format is complete. By hiding the LUN and bringing it back when > the format is complete, you don't have to worry about handling IO and > you also take care of the re-enumeration that has to happen when the > format is complete (in the case of changing the LBA Format). > > > > -Barrett > > > > *From:* nvmewin-bounces at lists.openfabrics.org > > [mailto:nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Judy > Brock-SSI > *Sent:* Monday, January 06, 2014 8:09 PM > *To:* nvmewin at lists.openfabrics.org > *Subject:* [nvmewin] Handling IO when Format NVM op is in progress > > > > All, > > > > Many months ago, I initiated a thread (see attached) in which I > argued that there were some holes in the current implementation of > Format NVM ioctl passthru and in which I advocated for, among other > things, the addition of logic to make sure pseudo-SCSI bus > re-enumeration had fully taken place in the driver (such that Storport > was notified that no "luns" were present) before the actual Format NVM > op was launched. > > > > I intuitively understand -- and up to this point have unquestioningly > agreed with -- the basic assumption that the reason the namespaces > must be removed/ "luns" disappeared prior to formatting is because > "we cannot format a namespace while the OS is aware of its presence > and could be potentially sending I/O to a stale namespace config (i.e. > changing LBA/sector size)." (excerpt from attached thread). > > > > The question has recently arisen in internal discussion however as to > whether or not we really have to do this. It was pointed out that SCSI > devices (real ones) are capable of receiving IO commands while SCSI > format commands are in process. They will return the following error: > > > > SCSI status = Check condition, sense code = NOT READY (0x2), > additional sense code = LUN NOT READY (0x4), additional sense code > qualifier = FORMAT IN PROGRESS (0x4) > > > > Why then, instead of removing namespaces/luns, can our Windows driver > not return the same error status a real SCSI drive would return in > such a situation? One would assume that upper layers of the storage > stack have plenty years of experience in knowing what to do when it > sees that error. > > > > As a point of comparison, there is no standard I am aware of which > specifies that Storport miniports which support real SCSI devices, if > they happen to provide a proprietary pass-thru to allow a SCSI format > command to go through to a device, must cause all LUNS to appear > offline prior to formatting, > > > > One could even argue (and they have!) that these IO commands could > even be allowed to go through to the NVMe device itself (as in the > real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new > format-in-progress status code that NVMe firmware will be able to > return at some point in the future, current firmware could easily > return NAMESPACE_NOT_READY and driver could translate to the above > SCSI sense data, etc. > > > > So ... here I stand, devil's advocate hat in hand, hoping to find out: > > > > a) what the "back story" is on how this decision was ultimately > made (the attached thread said a lot of discussion took place on the > subject) > > b) whether or not the diametrically-opposed alternative I am > discussing above was thoroughly considered and if so, why it was rejected > > c) whether the topic bears reconsidering at this point. > > > > Thanks in advance for your collective consideration, > > > > Judy > > > > > > _______________________________________________ > nvmewin mailing list > nvmewin at lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Tue Jan 7 20:19:24 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 04:19:24 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <52CCD030.8080804@gmail.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> <52CCD030.8080804@gmail.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478AE9@SSIEXCH-MB3.ssi.samsung.com> >>Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, Ah. Given this fact, please ignore the first part of the last email I just sent out which referred to this method as a possible mechanism to alert the storage stack to the potential change in dev capacity. >>so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. >> The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. So it would seem that reporting BusChangedDetected at the conclusion of a Format NVM op is a mechanism sufficient in and of itself to get the system to rescan and flag capacity changes, etc - no additional need to "hide" LUNs. Is that incorrect? Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Jeff Glass Sent: Tuesday, January 07, 2014 8:13 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. In my experience, SCSI RAID controller or HBA driver's don't do anything to get the O/S to recognize the change which in turn requires the user to manually disable and re-enable the device to get Windows to recognize that the device's capacity has changed. The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. Jeff On 1/7/2014 8:44 AM, Luse, Paul E wrote: Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrett.n.mayes at intel.com Tue Jan 7 20:28:41 2014 From: barrett.n.mayes at intel.com (Mayes, Barrett N) Date: Wed, 8 Jan 2014 04:28:41 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> Message-ID: There is no definition in NVMe spec or SCSI to NVMe translation reference doc that defines namespaces as either static or transient. I would argue they are transient because they can be created, destroyed, resized, and have their physical properties changed. But I think it is fair to say it is undefined. There is no requirement in current specs for Format NVM command to not change the number of namespaces. Given namespace management in 1.0 and 1.1 is vendor specific, it's conceivable a device might leverage secure erase to reset namespaces to a default/factory config. In NVMe, the adapter/controller is the static object and management commands are directed towards the admin queue that is associated with that controller. If current code isn't working as intended, it is a bug and should be fixed. If there is a compelling reason to change the intended behavior, let's discuss. From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Tuesday, January 07, 2014 6:42 PM To: Neal Galbo (ngalbo); Mayes, Barrett N; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress >>In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. Like SCSI LUNs, NVMe namespaces are also not transient and (I now submit, having reversed positions :)) that they should not hide either. Format NVM command does not result in a different number of Namespaces then existed before the operation after it is finished. While namespaces can be formatted with different LBA Format than previously, they are still static in terms of their existence/non-existence. Additionally , the current code is not achieving what it intended to do since it does not actually hide any LUNS before launching the Format NVM op. That is it does not wait till the existing LUNs are "gone" (ie until the miniport fails to report them in a subsequent inquiry) before it starts the operation. Thanks, Judy From: Neal Galbo (ngalbo) [mailto:ngalbo at micron.com] Sent: Tuesday, January 07, 2014 6:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From barrett.n.mayes at intel.com Tue Jan 7 20:41:45 2014 From: barrett.n.mayes at intel.com (Mayes, Barrett N) Date: Wed, 8 Jan 2014 04:41:45 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <52CCD030.8080804@gmail.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> <52CCD030.8080804@gmail.com> Message-ID: It is possible to get partmgr, disk and upper storage layers (ex. VDS) to recognize capacity (and I assume physical geometry but have never actually tested it) changes to a LUN using IOCTL_DISK_UPDATE_PROPERTIES and IOCTL_DISK_UPDATE_DRIVE_SIZE. From a storage driver point of view, they need to be sent to the top of the driver stack and getting the necessary PDOs in a storport miniport requires some extra work. They can be more easily be sent from a user-mode app such as the one that might initiate the format in the first place. But I agree, it is desirable to provide a consistent experience across devices and that would be more difficult if you had to coordinate with various 3rd party tools. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Jeff Glass Sent: Tuesday, January 07, 2014 8:13 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. In my experience, SCSI RAID controller or HBA driver's don't do anything to get the O/S to recognize the change which in turn requires the user to manually disable and re-enable the device to get Windows to recognize that the device's capacity has changed. The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. Jeff On 1/7/2014 8:44 AM, Luse, Paul E wrote: Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Tue Jan 7 20:39:22 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 04:39:22 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <03D88B383FA04244AA514AA931F7B1290D26B138@BBYEXM01.pmc-sierra.internal> References: <03D88B383FA04244AA514AA931F7B1290D26B138@BBYEXM01.pmc-sierra.internal> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478B17@SSIEXCH-MB3.ssi.samsung.com> Hi, Responses inline below in red - thanks. -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Kwok Kong Sent: Tuesday, January 07, 2014 9:57 AM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress I would agree with Barrett. What problem do you want to solve here ? 1) Bug-fixing current logic which does not in fact hide any LUNs before initiating the format NVM op. As per the thread I attached at the beginning of this revived thread, it has been confirmed that this logic is missing. So right now we are not providing any protection against the scenario we claim to want to guard against. 2) Simplification of the Windows driver. It is simpler logic to merely return an errors during format op like a real SCSI dev would than it is to wait for all the inquiries to come back down, in the meantime returning errors on new IOS that come in after the Format NVM ioctl is received but before the entire bus rescan has completed (we have no control over how long it will take Storport to do that btw) and thus the existing LUNS are not yet offlined, etc. - the whole fencing state machine is a lot more complicated if part of it depends on actions to be taken by Storport on the front end. 3) Compatibility with SCSI behavior - this is a SCSI miniport and thus inputs/outputs should behave pretty much the same as other SCSI miniports. Why do you want to send IO to a device while you are doing a NVMe format ? 1) I don't necessarily want to. What I am saying is that the driver should not be in the business of orchestrating special behavior that is inconsistent with other SCSI-to-NVMe translations, emulations, etc. I am curious as to what the open-source Linux driver does. Also I am not necessarily saying IOs need go thru all the way to the device. Since our driver knows when a Format NVM op is in prog, it could easily cook the sense data to be returned. Either way we'd be closer to "normal" SCSI dev behavior. Since a NVMe format may change the sector size and all data are gone, a NVMe format should behave as if a drive had been removed and a new drive was added. It can accomplish that by signaling a Bus Change at the end of the Format Op. I think the simplest procedure to do a NVMe format is to: - offline the LUN (or remove the LUN from the system) - NVMe format - online the LUN Do you see any problem with this simple approach ? 1. The offlining is overkill - returning NOT READY while format is in prog and Bus Change Detected at end of format op will accomplish what is desired. 2. The offlining is not taking place before the NVM format is launched in the current driver - see the originally-attached thread. So the current driver is broken any way we look at it - it's just a matter of the easiest way to fix it imo. Have you a use case that this procedure does not work ? Thanks -Kwok ---------------------------------------------------------------------- Message: 1 Date: Tue, 7 Jan 2014 05:15:24 +0000 From: "Mayes, Barrett N" > To: Judy Brock-SSI >, "nvmewin at lists.openfabrics.org" > Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Message-ID: > Content-Type: text/plain; charset="us-ascii" What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Tue Jan 7 20:47:41 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 04:47:41 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478B4D@SSIEXCH-MB3.ssi.samsung.com> Hi Barrett, You make good points below. >> If current code isn't working as intended, it is a bug and should be fixed. It is - and it is partially in revisiting the work involved in fixing the current code that the discussion took the turn it did internally :). >> If there is a compelling reason to change the intended behavior, let's discuss. I think we're all in agreement with the goal - to alert the upper layers to potential geometric changes, etc. It's just a matter of the simplest way to get there. I think it would be good to rediscuss as we've started to do here. We need to fix the code in any case so we may or may not come to the conclusion that the best way to do it is with the current design (as it was intended to work) or not. Thanks, Judy From: Mayes, Barrett N [mailto:barrett.n.mayes at intel.com] Sent: Tuesday, January 07, 2014 8:29 PM To: Judy Brock-SSI; Neal Galbo (ngalbo); nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress There is no definition in NVMe spec or SCSI to NVMe translation reference doc that defines namespaces as either static or transient. I would argue they are transient because they can be created, destroyed, resized, and have their physical properties changed. But I think it is fair to say it is undefined. There is no requirement in current specs for Format NVM command to not change the number of namespaces. Given namespace management in 1.0 and 1.1 is vendor specific, it's conceivable a device might leverage secure erase to reset namespaces to a default/factory config. In NVMe, the adapter/controller is the static object and management commands are directed towards the admin queue that is associated with that controller. If current code isn't working as intended, it is a bug and should be fixed. If there is a compelling reason to change the intended behavior, let's discuss. From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Tuesday, January 07, 2014 6:42 PM To: Neal Galbo (ngalbo); Mayes, Barrett N; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress >>In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. Like SCSI LUNs, NVMe namespaces are also not transient and (I now submit, having reversed positions :)) that they should not hide either. Format NVM command does not result in a different number of Namespaces then existed before the operation after it is finished. While namespaces can be formatted with different LBA Format than previously, they are still static in terms of their existence/non-existence. Additionally , the current code is not achieving what it intended to do since it does not actually hide any LUNS before launching the Format NVM op. That is it does not wait till the existing LUNs are "gone" (ie until the miniport fails to report them in a subsequent inquiry) before it starts the operation. Thanks, Judy From: Neal Galbo (ngalbo) [mailto:ngalbo at micron.com] Sent: Tuesday, January 07, 2014 6:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ngalbo at micron.com Tue Jan 7 21:48:11 2014 From: ngalbo at micron.com (Neal Galbo (ngalbo)) Date: Wed, 8 Jan 2014 05:48:11 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> Message-ID: <5E6652408D93274292C8F06C9A4A2600C99F8488@NTXBOIMBX05.micron.com> 1) Regarding one of the previous emails - there is no restriction preventing the SCSI block size from changing during format. Yes, it can! Unlikely, but not forbidden; therefore the READ CAPACITY command - which gives you everything to discover this information. There were/are so-called "soft sectored" drives where the LBA block can be resized during format. Whether they are in wide use anymore ... not sure. But no restrictions in SCSI. 2) Namespaces absolutely can disappear ... and reappear ... and be renamed. Can even have different namespaces referencing exactly the same storage area as other namespaces ... or overlapping. Namespaces are Wild West - anything goes. And, they must be "instantiated", i.e. Create and Delete Namespace. Yes, it is not specified in the NVMe spec how that is done (outside the scope), but it mentions that this must happen. Point of reference, there are companies that are developing drives that do exactly this, in addition to allowing a variable programmable number of namespaces to be instantiated within. You may be sensing I'm not a fan. Yes, I have a problem with the namespace concept or lack of ground rules regarding. But this is what we got. Now, how do we get it to work consistently, while at the same time mapping to the SCSI LU concept? Yes, we can lay down rules - but where does one go to reference those? Who's the keeper of the rules? And do they change by manufacturer? OS? Platform? 3) SCSI LU's and therefore LUN's (the number/address) ARE NEVER instantiated. They all exist in the device and are constant from the moment of power-on. Internally LU's usually have a physical relationship to hardware, i.e. if a device has N LU's, then there's usually N data paths and N buffers and N storage areas for N independent units. That is, in SCSI parlance, there are N Device Servers - one per each LUN (see SCSI Architecture Model or SPC4, etc.). 4) Someone mentioned this earlier - yes, you can continue sending commands to a SCSI device WHILE it is formatting. It is still online! It will report BUSY in status or NOT READY in the SENSE KEY. SCSI can also produce a UNIT ATTENTION condition after formatting - meaning, "host check with me, something radical has changed". NVMe specification does not speak to these conditions. Yes, we can make something up ... but again, who's the keeper of that info? a. Also, using SAM, there is a TASK MANAGEMENT function that can return information indicating 0-100% completion of Format (or any long duration command). Bottom line, IMHO - NVMe specification has serious omissions. The specification should be the SINGLE point of reference for these issues. It needs to address these issues and have defined protocols on how to handle them. But, that ship has sailed. Regards, Neal From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Tuesday, January 07, 2014 9:42 PM To: Neal Galbo (ngalbo); Mayes, Barrett N; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress >>In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. Like SCSI LUNs, NVMe namespaces are also not transient and (I now submit, having reversed positions :)) that they should not hide either. Format NVM command does not result in a different number of Namespaces then existed before the operation after it is finished. While namespaces can be formatted with different LBA Format than previously, they are still static in terms of their existence/non-existence. Additionally , the current code is not achieving what it intended to do since it does not actually hide any LUNS before launching the Format NVM op. That is it does not wait till the existing LUNs are "gone" (ie until the miniport fails to report them in a subsequent inquiry) before it starts the operation. Thanks, Judy From: Neal Galbo (ngalbo) [mailto:ngalbo at micron.com] Sent: Tuesday, January 07, 2014 6:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Tue Jan 7 21:33:54 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Wed, 8 Jan 2014 05:33:54 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <36E8D38D6B771A4BBDB1C0D800158A514E478A8C@SSIEXCH-MB3.ssi.samsung.com> <36E8D38D6B771A4BBDB1C0D800158A514E478B4D@SSIEXCH-MB3.ssi.samsung.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478BA8@SSIEXCH-MB3.ssi.samsung.com> >>For 1, why not just fail IO with CHECK_CONDITION NOT_READY FORMAT_IN_PROGRESS That is my precise proposal :) >> For 2, can you just use IoInvalidateDeviceRelations() after the format and allow PNP to re-enumerate the bus and all devices? The Storport miniport should confine itself to using Storport APIs. Also, that API is designed for usage by bus drivers such as PCI.SYS, the Windows PCI bus driver. The Windows NVMe Storport miniport is not a bus driver. Thanks, Judy From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Tuesday, January 07, 2014 9:21 PM To: Judy Brock-SSI; Mayes, Barrett N; Neal Galbo (ngalbo); nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress I'm not involved in this project yet, but am tracking it so forgive my ignorance. It seems to me you have two goals: 1. Fail IO that is sent during format (which it doesn't seem that you've agreed on the method here) and you can't control what an application may attempt 2. Notify windows that the device geometry has changed For 1, why not just fail IO with CHECK_CONDITION NOT_READY FORMAT_IN_PROGRESS For 2, can you just use IoInvalidateDeviceRelations() after the format and allow PNP to re-enumerate the bus and all devices? Alternatively, the idea of removing the device while it is inaccessible is not a bad idea and SCSI devices are transient in some scenarios (VSS use cases for instance). Somebody mentioned READ_CAP_DATA_CHANGED not working in 2012, while off topic, I have not seen that issue in other enviornments. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Tuesday, January 7, 2014 8:48 PM To: Mayes, Barrett N; Neal Galbo (ngalbo); nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Hi Barrett, You make good points below. >> If current code isn't working as intended, it is a bug and should be fixed. It is - and it is partially in revisiting the work involved in fixing the current code that the discussion took the turn it did internally :). >> If there is a compelling reason to change the intended behavior, let's discuss. I think we're all in agreement with the goal - to alert the upper layers to potential geometric changes, etc. It's just a matter of the simplest way to get there. I think it would be good to rediscuss as we've started to do here. We need to fix the code in any case so we may or may not come to the conclusion that the best way to do it is with the current design (as it was intended to work) or not. Thanks, Judy From: Mayes, Barrett N [mailto:barrett.n.mayes at intel.com] Sent: Tuesday, January 07, 2014 8:29 PM To: Judy Brock-SSI; Neal Galbo (ngalbo); nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress There is no definition in NVMe spec or SCSI to NVMe translation reference doc that defines namespaces as either static or transient. I would argue they are transient because they can be created, destroyed, resized, and have their physical properties changed. But I think it is fair to say it is undefined. There is no requirement in current specs for Format NVM command to not change the number of namespaces. Given namespace management in 1.0 and 1.1 is vendor specific, it's conceivable a device might leverage secure erase to reset namespaces to a default/factory config. In NVMe, the adapter/controller is the static object and management commands are directed towards the admin queue that is associated with that controller. If current code isn't working as intended, it is a bug and should be fixed. If there is a compelling reason to change the intended behavior, let's discuss. From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Tuesday, January 07, 2014 6:42 PM To: Neal Galbo (ngalbo); Mayes, Barrett N; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress >>In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. Like SCSI LUNs, NVMe namespaces are also not transient and (I now submit, having reversed positions :)) that they should not hide either. Format NVM command does not result in a different number of Namespaces then existed before the operation after it is finished. While namespaces can be formatted with different LBA Format than previously, they are still static in terms of their existence/non-existence. Additionally , the current code is not achieving what it intended to do since it does not actually hide any LUNS before launching the Format NVM op. That is it does not wait till the existing LUNs are "gone" (ie until the miniport fails to report them in a subsequent inquiry) before it starts the operation. Thanks, Judy From: Neal Galbo (ngalbo) [mailto:ngalbo at micron.com] Sent: Tuesday, January 07, 2014 6:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: RE: Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy -------------- next part -------------- An HTML attachment was scrubbed... URL: From judy.brock at ssi.samsung.com Wed Jan 8 16:24:45 2014 From: judy.brock at ssi.samsung.com (Judy Brock-SSI) Date: Thu, 9 Jan 2014 00:24:45 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> <52CCD030.8080804@gmail.com> Message-ID: <36E8D38D6B771A4BBDB1C0D800158A514E478DD2@SSIEXCH-MB3.ssi.samsung.com> Hello, Would (I'm wondering if we know for sure or not at this point) returning NOT READY/FORMAT IN PROGRESS while format op is underway and then notifying Storport that a bus change has occurred (and thus re-enumeration is required) be enough to signal to the upper storage stack layers that they need to refresh their view of relevant block devices and their properties (capacity, initialized/uninitialized, etc)? What if no IOs come down during the format op? If not, can someone elaborate on how the miniport would implement the following? [Barrett wrote]: It is possible to get partmgr, disk and upper storage layers (ex. VDS) to recognize capacity (and I assume physical geometry but have never actually tested it) changes to a LUN using IOCTL_DISK_UPDATE_PROPERTIES and IOCTL_DISK_UPDATE_DRIVE_SIZE. From a storage driver point of view, they need to be sent to the top of the driver stack and getting the necessary PDOs in a storport miniport requires some extra work. Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 8:42 PM To: Jeff Glass; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress It is possible to get partmgr, disk and upper storage layers (ex. VDS) to recognize capacity (and I assume physical geometry but have never actually tested it) changes to a LUN using IOCTL_DISK_UPDATE_PROPERTIES and IOCTL_DISK_UPDATE_DRIVE_SIZE. From a storage driver point of view, they need to be sent to the top of the driver stack and getting the necessary PDOs in a storport miniport requires some extra work. They can be more easily be sent from a user-mode app such as the one that might initiate the format in the first place. But I agree, it is desirable to provide a consistent experience across devices and that would be more difficult if you had to coordinate with various 3rd party tools. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Jeff Glass Sent: Tuesday, January 07, 2014 8:13 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. In my experience, SCSI RAID controller or HBA driver's don't do anything to get the O/S to recognize the change which in turn requires the user to manually disable and re-enable the device to get Windows to recognize that the device's capacity has changed. The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. Jeff On 1/7/2014 8:44 AM, Luse, Paul E wrote: Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Yong.sc.Chen at huawei.com Wed Jan 8 19:53:55 2014 From: Yong.sc.Chen at huawei.com (Yong Chen) Date: Thu, 9 Jan 2014 03:53:55 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> Message-ID: <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 6737 bytes Desc: image002.jpg URL: From barrett.n.mayes at intel.com Wed Jan 8 22:19:56 2014 From: barrett.n.mayes at intel.com (Mayes, Barrett N) Date: Thu, 9 Jan 2014 06:19:56 +0000 Subject: [nvmewin] Handling IO when Format NVM op is in progress In-Reply-To: <36E8D38D6B771A4BBDB1C0D800158A514E478DD2@SSIEXCH-MB3.ssi.samsung.com> References: <36E8D38D6B771A4BBDB1C0D800158A514E478620@SSIEXCH-MB3.ssi.samsung.com> <5E6652408D93274292C8F06C9A4A2600C99F729A@NTXBOIMBX05.micron.com> <82C9F782B054C94B9FC04A331649C77A44748DB4@FMSMSX112.amr.corp.intel.com> <52CCD030.8080804@gmail.com> <36E8D38D6B771A4BBDB1C0D800158A514E478DD2@SSIEXCH-MB3.ssi.samsung.com> Message-ID: On a BusChangeNotification, storport sends standard inquiry to each B/T/L. If the response is the same was the previous inquiry, I don't believe there is any further re-enumeration. Since capacity and physical geometry changes aren't reported in standard inquiry, issuing a BusChangeNotificaiton isn't enough. Perhaps the driver could modify the vendor/productID/rev ID to make a modified NS look different on an inquiry following a format, but that would be difficult to keep consistent across subsequent inquiries without some persistent metadata or deterministic way to recreate the inquiry data. From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Wednesday, January 08, 2014 4:25 PM To: Mayes, Barrett N; Jeff Glass; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] Handling IO when Format NVM op is in progress Hello, Would (I'm wondering if we know for sure or not at this point) returning NOT READY/FORMAT IN PROGRESS while format op is underway and then notifying Storport that a bus change has occurred (and thus re-enumeration is required) be enough to signal to the upper storage stack layers that they need to refresh their view of relevant block devices and their properties (capacity, initialized/uninitialized, etc)? What if no IOs come down during the format op? If not, can someone elaborate on how the miniport would implement the following? [Barrett wrote]: It is possible to get partmgr, disk and upper storage layers (ex. VDS) to recognize capacity (and I assume physical geometry but have never actually tested it) changes to a LUN using IOCTL_DISK_UPDATE_PROPERTIES and IOCTL_DISK_UPDATE_DRIVE_SIZE. From a storage driver point of view, they need to be sent to the top of the driver stack and getting the necessary PDOs in a storport miniport requires some extra work. Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 8:42 PM To: Jeff Glass; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress It is possible to get partmgr, disk and upper storage layers (ex. VDS) to recognize capacity (and I assume physical geometry but have never actually tested it) changes to a LUN using IOCTL_DISK_UPDATE_PROPERTIES and IOCTL_DISK_UPDATE_DRIVE_SIZE. From a storage driver point of view, they need to be sent to the top of the driver stack and getting the necessary PDOs in a storport miniport requires some extra work. They can be more easily be sent from a user-mode app such as the one that might initiate the format in the first place. But I agree, it is desirable to provide a consistent experience across devices and that would be more difficult if you had to coordinate with various 3rd party tools. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Jeff Glass Sent: Tuesday, January 07, 2014 8:13 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress Unfortunately the Windows storage stack does not recognize the unit attention code (capacity data has changed) (at least it didn't in Server 2012) that could be used to report a change in device capacity, so to get the system to rescan the device to determine the capacity has changed a BusChangedDetected needs to be reported. In my experience, SCSI RAID controller or HBA driver's don't do anything to get the O/S to recognize the change which in turn requires the user to manually disable and re-enable the device to get Windows to recognize that the device's capacity has changed. The NVMe driver is in the position to provide a better experience that is consistent across hardware for all manufacturer's by eliminating the need for manual user intervention. Jeff On 1/7/2014 8:44 AM, Luse, Paul E wrote: Wrt Judy's (a) below, So I believe the original concern that drove us to the implementation was that w/NVMe the block size can be changed with a format whereas that can't happen with a SCSI format... I could be mistaken but on a quick scan of the email threads I didn't see that point mentioned. We felt like the only way to get the upper layers to discover the potentially changed block size was to tear it down/bring it back up Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Neal Galbo (ngalbo) Sent: Tuesday, January 7, 2014 7:27 AM To: Mayes, Barrett N; Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress The conditions you describe would never happen with a SCSI device. In general, LU's and LUN's never disappear in SCSI; they are not transient. They are not dynamic. They are static. They either exist or they don't. They don't hide, once enumerated/attached/located. Unlike namespaces. The media, backing storage or provisioning can change, but the LU would always be available for communication (commands). Other LU's in the same device would not be affected either - they are independent entities relative to each other. -Neal From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Mayes, Barrett N Sent: Tuesday, January 07, 2014 12:15 AM To: Judy Brock-SSI; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Handling IO when Format NVM op is in progress What problem do you want to solve by keeping the block device around during a format and allowing IO through so it can be failed with check condition? Namespace not ready can't generically be translated to SCSI check condition/not ready/Format In Progress. The driver would need to know that a format command is outstanding so it could translate that correctly (for 1.0-based device support) since the namespace could be not ready for reasons other than a format in progress. If the driver already has to know a format is in progress, it could just fail commands without sending it to the device (so no need for the new failure code). But if that's the case, why _not_ hide the LUN until the format is complete. By hiding the LUN and bringing it back when the format is complete, you don't have to worry about handling IO and you also take care of the re-enumeration that has to happen when the format is complete (in the case of changing the LBA Format). -Barrett From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, January 06, 2014 8:09 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Handling IO when Format NVM op is in progress All, Many months ago, I initiated a thread (see attached) in which I argued that there were some holes in the current implementation of Format NVM ioctl passthru and in which I advocated for, among other things, the addition of logic to make sure pseudo-SCSI bus re-enumeration had fully taken place in the driver (such that Storport was notified that no "luns" were present) before the actual Format NVM op was launched. I intuitively understand - and up to this point have unquestioningly agreed with - the basic assumption that the reason the namespaces must be removed/ "luns" disappeared prior to formatting is because "we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size)." (excerpt from attached thread). The question has recently arisen in internal discussion however as to whether or not we really have to do this. It was pointed out that SCSI devices (real ones) are capable of receiving IO commands while SCSI format commands are in process. They will return the following error: SCSI status = Check condition, sense code = NOT READY (0x2), additional sense code = LUN NOT READY (0x4), additional sense code qualifier = FORMAT IN PROGRESS (0x4) Why then, instead of removing namespaces/luns, can our Windows driver not return the same error status a real SCSI drive would return in such a situation? One would assume that upper layers of the storage stack have plenty years of experience in knowing what to do when it sees that error. As a point of comparison, there is no standard I am aware of which specifies that Storport miniports which support real SCSI devices, if they happen to provide a proprietary pass-thru to allow a SCSI format command to go through to a device, must cause all LUNS to appear offline prior to formatting, One could even argue (and they have!) that these IO commands could even be allowed to go through to the NVMe device itself (as in the real SCSI case); NVMe 1.1 Technical Proposal 005 has defined a new format-in-progress status code that NVMe firmware will be able to return at some point in the future, current firmware could easily return NAMESPACE_NOT_READY and driver could translate to the above SCSI sense data, etc. So ... here I stand, devil's advocate hat in hand, hoping to find out: a) what the "back story" is on how this decision was ultimately made (the attached thread said a lot of discussion took place on the subject) b) whether or not the diametrically-opposed alternative I am discussing above was thoroughly considered and if so, why it was rejected c) whether the topic bears reconsidering at this point. Thanks in advance for your collective consideration, Judy _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Jan 9 09:16:21 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 9 Jan 2014 17:16:21 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> Message-ID: Hi Yong, I am still testing it. Hopefully, wrap it up by the end of this week. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 6737 bytes Desc: image001.jpg URL: From Rick.Knoblaugh at lsi.com Thu Jan 9 11:15:48 2014 From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick) Date: Thu, 9 Jan 2014 19:15:48 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> Message-ID: <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> Hi Yong, I have been checking it out. No issues so far. In your minor changes, please put Dumpbuffersize #define in uppercase. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 6737 bytes Desc: image001.jpg URL: From Yong.sc.Chen at huawei.com Thu Jan 9 16:46:36 2014 From: Yong.sc.Chen at huawei.com (Yong Chen) Date: Fri, 10 Jan 2014 00:46:36 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> Message-ID: <02EC085151D99A469E06988E94FEBCDB1CE6DD0B@SJCEML701-CHM.china.huawei.com> Hi, Rick, Sorry, I missed that. Alex, once final, lets uppercase this macro definition DumpBufferSize. Maybe connected with underscores? Thanks, Yong From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, January 09, 2014 11:16 AM To: Yong Chen; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I have been checking it out. No issues so far. In your minor changes, please put Dumpbuffersize #define in uppercase. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 6737 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Thu Jan 9 16:48:55 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 10 Jan 2014 00:48:55 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: <02EC085151D99A469E06988E94FEBCDB1CE6DD0B@SJCEML701-CHM.china.huawei.com> References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> <02EC085151D99A469E06988E94FEBCDB1CE6DD0B@SJCEML701-CHM.china.huawei.com> Message-ID: Sure, I can make the modification later. Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Thursday, January 09, 2014 4:47 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Rick, Sorry, I missed that. Alex, once final, lets uppercase this macro definition DumpBufferSize. Maybe connected with underscores? Thanks, Yong From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, January 09, 2014 11:16 AM To: Yong Chen; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I have been checking it out. No issues so far. In your minor changes, please put Dumpbuffersize #define in uppercase. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 6737 bytes Desc: image001.jpg URL: From Kwok.Kong at pmcs.com Fri Jan 10 13:26:03 2014 From: Kwok.Kong at pmcs.com (Kwok Kong) Date: Fri, 10 Jan 2014 21:26:03 +0000 Subject: [nvmewin] OFA NVMe Driver 2014 Planning meeting Message-ID: <03D88B383FA04244AA514AA931F7B1290D27052E@BBYEXM01.pmc-sierra.internal> Agenda 1-29-14: - Review 1.3 release status - Plan for the 2014 release - Review action items - AOB Release 1.3 features/fixes: - Hibernation (Yong Chen - Huawei) - NUMA group support in core enumeration - (Alex Chang - PMC) - Core-MSI vector queue mapping issues - CMD_ENTRY synchronization issues - Remove using mask bits as core index to allocate core tables - (Alex Chang - PMC) - Paramlist length problem - (Alex Chang - PMC) - NVMeInitAdminQueues return value - (Alex Chang - PMC) - Performance issue in Windows 2012 and Windows 8. - (Alex Chang - PMC) - freeQList Access - (Alex Chang - PMC) - PRP list building problem - (Alex Chang - PMC) - remove #define for CHATHAM2 (Carolyn) - Learning of CPU core to Vector failure handling (Carolyn) - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset (Dharani @Sandisk) - Controller reset does not handle all cases (Dharani @Sandisk) - orphaned requests (Dharani @Sandisk) - BUILDIO (Rick) The following features has been deferred to 2014: - End to end protection support - Driver tracing feature - Robert Randall (Micron) - Migrate to VS2013, WDK 8.1 - WHQL Certification Features that are not supported currently: NVMe 1.1 support: - multi-path - SGL - Get/Set feature update - Autonomous power state transition - Host Identifier - Reservation Notification Mask - Reservation Persistence - identify structure update - write zeros command Actions 1. Check with intel team to remove #define for CHATHAM2 in the source code - Done OK to remove - Carolyn Foster (intel) 2. Set up meeting in Jan 2014 to discuss release plan for 2014. - This meeting. - Kwok Kong (PMC) 3. Send up test requirement for submitting patch. - Done - Kwok Kong (PMC) 4. Send out memory/core dump procedure to debug the driver. - Yong Chen (Huawei) -+-----+-----+-----+-----+-----+-----+-----+-----+- Kwok Kong has invited you to attend an online meeting using Microsoft® Office Communications Server. Join the meeting Make sure the Office Live Meeting client is installed before the meeting: * I am connecting from inside the PMC-Sierra network * I am connecting from outside the PMC-Sierra network AUDIO INFORMATION To join a meeting from your phone, dial in using the following information: Phone: Burnaby Ext 6026 [English, French] Phone: +1 (888) 828-7722 [English, Spanish, French] Phone: +1 (604) 415-6026 [English, French] Find a local phone number for your region Conference ID: 8012327 Passcode: Passcode is not required. Note: If you have an account on this corporate network, use your PIN to join. Have you set your PIN? TROUBLESHOOTING Unable to join the meeting? Start Office Live Meeting and join the meeting with the following information: Meeting ID: 991022eb640047fc81c90c2e770c3a14 Entry Code: 3284 Location: meet:sip:kwok.kong at pmcs.com;gruu;opaque=app:conf:focus:id:991022eb640047fc81c90c2e770c3a14%3Fconf-key=3284 If you still cannot enter the meeting, contact support: * Inside the PMC-Sierra network * Outside the PMC-Sierra network NOTICE Office Live Meeting can be used to record meetings. By participating in this meeting, you agree that your communications may be monitored or recorded at any time during the meeting. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 5732 bytes Desc: not available URL: From james.p.freyensee at intel.com Tue Jan 14 17:07:42 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Wed, 15 Jan 2014 01:07:42 +0000 Subject: [nvmewin] Set Features- LBA Range Type- Mandatory or optional? Message-ID: <2D98093777D3FD46A36253F35FE9D6938917615E@ORSMSX109.amr.corp.intel.com> This question is based on a NVMe 1.0.e spec and code base. I have what I believe is a 1.0.e code base because about 6 months ago I asked this list how to retrieve such a code base, and went ahead and pulled the code based on the feedback. My question is concerning "Set Features- LBA Range Type". This is an optional command according to the spec. However, the code base I retrieved acts as if this is a mandatory command for an NVMe device setup for 1 namespace. NVMe driver will issue a "NVMeDriverFatalError" if this is not implemented by the device to at least return 0 successfully for "Number of LBA Ranges" in the NVMe completion packet. I'm curious to why this is and if this is an optional command, why the NVMe driver: * issues it in the first place or * in the event of an error, like Invalid Command Opcode, or in the event the command is not implemented, why the driver cannot be a bit more flexible and move on instead of stopping. Thanks! Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From carolyn.d.foster at intel.com Tue Jan 28 09:07:27 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Tue, 28 Jan 2014 17:07:27 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> <02EC085151D99A469E06988E94FEBCDB1CE6DD0B@SJCEML701-CHM.china.huawei.com> Message-ID: I am ok with the changes, if the testing is all complete. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, January 09, 2014 5:49 PM To: Yong Chen; Knoblaugh, Rick; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Sure, I can make the modification later. Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Thursday, January 09, 2014 4:47 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Rick, Sorry, I missed that. Alex, once final, lets uppercase this macro definition DumpBufferSize. Maybe connected with underscores? Thanks, Yong From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, January 09, 2014 11:16 AM To: Yong Chen; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I have been checking it out. No issues so far. In your minor changes, please put Dumpbuffersize #define in uppercase. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 6737 bytes Desc: image002.jpg URL: From Alex.Chang at pmcs.com Tue Jan 28 13:47:57 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Tue, 28 Jan 2014 21:47:57 +0000 Subject: [nvmewin] code review: crash dump & hibernation support In-Reply-To: References: <02EC085151D99A469E06988E94FEBCDB1CE52822@sjceml501-mbs.china.huawei.com> <645bf3af.00001a88.00000007@n9228> <02EC085151D99A469E06988E94FEBCDB1CE53382@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C349@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE537B2@sjceml501-mbs.china.huawei.com> <36E8D38D6B771A4BBDB1C0D800158A514E46C70B@SSIEXCH-MB3.ssi.samsung.com> <02EC085151D99A469E06988E94FEBCDB1CE53AA3@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE53C99@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5EDD4@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE5F8B5@sjceml501-mbs.china.huawei.com> <02EC085151D99A469E06988E94FEBCDB1CE6D9FC@SJCEML701-CHM.china.huawei.com> <95e3ca489fb541e981a852b86ea45fc4@DM2PR07MB285.namprd07.prod.outlook.com> <02EC085151D99A469E06988E94FEBCDB1CE6DD0B@SJCEML701-CHM.china.huawei.com> Message-ID: Thank you very much, Carolyn. Alex From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, January 28, 2014 9:07 AM To: Alex Chang; Yong Chen; Knoblaugh, Rick Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support I am ok with the changes, if the testing is all complete. From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, January 09, 2014 5:49 PM To: Yong Chen; Knoblaugh, Rick; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Sure, I can make the modification later. Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Thursday, January 09, 2014 4:47 PM To: Knoblaugh, Rick; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Rick, Sorry, I missed that. Alex, once final, lets uppercase this macro definition DumpBufferSize. Maybe connected with underscores? Thanks, Yong From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com] Sent: Thursday, January 09, 2014 11:16 AM To: Yong Chen; Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I have been checking it out. No issues so far. In your minor changes, please put Dumpbuffersize #define in uppercase. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Wednesday, January 08, 2014 7:54 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, Where are we now on this review? Alex do you need another explicit approval from gatekeepers? Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Thursday, December 19, 2013 4:26 PM To: Alex Chang; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, all, I finished x86 tests on Win 7 & 8. I hope we are wrapping up the review soon? The revision to previous changes is very minor. Alex, What is your release schedule for 1.3? I will be mostly oof during the next 2 weeks. Thanks, Yong From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Monday, December 16, 2013 6:33 PM To: Yong Chen; Foster, Carolyn D Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi all, I am forwarding the patch Yong sent out today. Please find them in the attachment. Should you have any questions/feedbacks, please rely this message. Please review the changes and test it with your devices as well. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, December 16, 2013 3:36 PM To: Foster, Carolyn D; Alex Chang Cc: nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Alex and all, I have uploaded the changes to the web and Alex will reply back with zipped source code when received it. Thank you, Alex! The recursive callstack is now fixed, which was a serious offense. Thanks Carolyn, for having brought it up. I also have addressed all points raised by Alex or Rick. The other major topic is memory usage in dump mode: I have reserved 5x 64KB, the actually usage is about 2.2x 64KB. The bulk of them is used by LunExt (0x11000=68KB), almost half. The variable size of all IO queues are very tiny, along other normal usage. 4x 64KB is probably enough, but to double this 2.2 to 5 is more prudent choice, IMO and that is the value that has been tested all along. I have finished with SDStress and ioMeter tests (including win7 x86) and pwrtest. Two more tests remain: A: installing Win8 now; B: Due to the destructive-nature of SCSICompliance test, I will leave that to the last because I have only 1 controller to test with. I don��t expect any changes from the initialization refactoring would affect SCSI so I am sending out now so you guys can review while I am finishing up the remaining tests. Please let me or Alex know if you need the binaries as well. Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Tuesday, November 26, 2013 2:13 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support See inline. Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Tuesday, November 26, 2013 12:09 PM To: Yong Chen; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support See my comments below in green. Also I found two more issues: According to MSDN StorportEnablePassiveInitialization will fail if the system does not support DPCs, which is the case in crashdump mode. We should have a check for ntldrDump in NVMeInitialize and call NVMePassiveInitialize directly if it��s true. [Yong>] are you talking about this block? You are right, that is exactly the case. /* * When Crashdump/Hibernation driver is being loaded, need to complete the * entire initialization here. In the case of normal driver loading, enable * passive initialization and let NVMePassiveInitialization handle the rest * of the initialization */ if (pAE->ntldrDump == FALSE) { �� /* Call StorPortPassiveInitialization to enable passive init */ StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize); The other potential issue is that in NVMeCallArbiter, in ntldrDump mode we will call NVMeRunning. This could cause a very deep call stack since NVMeCallArbiter is also called from NVMeRunning. In hibernate mode we have limited memory and this could cause issues. I suggest making modifications to NVMeRunningStartAttempt around the NVMeRunning call. It could be a while-loop that would call NVMeRunning if ntldrDump is true, with a stall execution, that would loop until the nextDriverState is start complete, or failed. [Yong>] you are right about the call stack. In the while loop we could create a new separate mini state-machine for the dump mode initialization. Given more time I can experiment with it. Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 11:23 PM To: Foster, Carolyn D; Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Carolyn, Thanks for the input. 1. There is no DPC or interrupt in dump mode. It is the polling method to call the same routine. Could you elaborate why you don��t expect DPCs to work properly on Win7? I couldn��t find where polling mode is set in the hibernate path, and I was also expecting to see the DPC initialization calls wrapped in ntldrDump checks. Specifically in NVMePassiveInitialize. [Yong>] in dump mode the driver model behaves as if in polling mode. Not something explicit to set. NVMePassiveInitialize (&DPC initialization calls) won��t be issued in dump mode. see first comment. 2. The small buffer is reserved during normal driver load. In dump mode you can��t allocate anything (doc says paltry 64KB). That dump buffer is guaranteed when entering dump mode. Regardless, the same code path would fail not BSOD if BUFFER is NULL as in any other cases. I��m specifically asking about line 189 in nvmeInit. It seems possible for that line of code, the StorportAllocateContiguousMemory to be called in the crashdump path. Can you confirm that function call in crashdump/hibernate mode simply fails? If it does then I agree that a null buffer here will likely not crash the system. [Yong>] I see. This new API call is recently merged change. I have not got into this failure case. From my earlier experience with other allocation function (not this new API), all these APIs simply fail with NULL returned. 3. I added the code because for a long time I was and still am dealing with engineer-sample hardware not finished products. After several revisions, they are much more mature now than earlier this year. This is how I make them in consistent ready mode. The rationale for cycling of CC.EN bit during resuming from hibernation is just to mirror normal Initialization step. The timeout is predefined value STORPORT_TIMER_CB_us. For mal-function hardware, the same logic would already have experienced same problems in NVMeInitialize() at raised level. The best way to decide is to test on different implementation of NVMe devices from various vendors and see whether we need to tune these values. My concern here is that the RecoveryDpc routine is not just called during hibernate, it is called during runtime if windows needs to reset the controller. I��m concerned with how these changes impact the normal runtime driver. Did you test this function during runtime? What happens if the maximum time is spent in it? [Yong>] yes, it will be called by windows if hardware misbehaves, trying to reset the hardware. I didn��t try to simulate this scenario, not easy with real hardware. With this change, this same reset is being exercised every time in resuming from hibernation. we can find out by turning on fault injections, however it is not something we have been running regularly. Hope these help, Thanks, Yong From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Monday, November 25, 2013 3:25 PM To: Judy Brock-SSI; Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I also have some feedback and questions about the code changes. 1. I am surprised that there are no ntldrDump checks around the DPC initialization and calls. I wouldn��t have expected the DPCs to work properly on the Windows 7 systems. 2. In function NVMeAllocateMem, the ntldrDump check is wrapped such that if no buffer is allocated from the DumpBuffer, the code path could end up calling StorPortAllocateContiguousMemory in ntldrDump mode. Will this cause a BSOD? Or will it just fail? 3. In RecoveryDpcRoutine new code has been added above the reset adapter call not related to ntldrDump. If the controller isn��t responding, this additional delay time could cause a DPC watchdog timeout bugcheck if the maximum time allowed for a DPC is exceeded. I have some concerns about this new code, what was your reasoning for adding it? Thanks, Carolyn From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI Sent: Monday, November 25, 2013 3:18 PM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi Yong, Ensuring CSTS.RDY == 0 before setting CC.EN to ��1�� is required by the spec �C it is a necessary part of enabling the adapter. Therefore it is not overloading the function and should be kept together. It is a sanity check that needs to take place immediately before writing CC.EN. They should not be separated by other blocks of code. That is not flexibility, it is a design flaw in my opinion. I don��t see how it could possibly result in any destabilization of major features to make sure the RDY bit is not on before setting EN bit in the routine which is dedicated to enabling the controller. If you are worried about removing other checks for CSTS.RDY == 0, then by all means, leave them in. It doesn��t a hurt a thing to have those xtra check points in the two non-runtime paths you mentioned. Conversely, it does potentially hurt to not have an explicit check in the NVMeEnableAdapter itself. As I mentioned previously, there is no check in the PassiveInitialization path for CSTS.RDY == 0 before calling NVMeEnableAdapter in the current code; so we are still violating the spec if we don��t enhance your current changes one way or the other. I say, put the fix in �C it��s fairly trivial. Thanks, Judy From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Monday, November 25, 2013 10:53 AM To: Judy Brock-SSI; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi, Judy, Thanks for your input. I agreed with what you trying to achieve. I also think that block of cycling CC.EN 1->0 can be further refactored into one standalone helper function, to be called by RecoveryDpcRoutine() and NVMeInitialize(). Embedding into NVMeEnableAdapter() would make that function overloaded more than its name & meant to do, and losing the flexibility. Plus they are separated by other blocks of code and would materially change the code, currently for no obvious reason. I would try the next guy to the right thing, it is always hard to fix something, if ever happened in the future. Unless the test is completely reset for this check-in, I would delay this further improvement of refactoring to next time, to keep it separate from and avoid destabilizing this major feature work. What do other folks think? Thanks, Yong From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com] Sent: Sunday, November 24, 2013 8:20 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, I suggest you change the function NVMeEnableAdapter to check for CSTS.RDY == 0 before setting CC.EN to ��1��. You added a check for this some lines above the call in NVMeInitialize. But I think we should avoid decoupling the check for CSTS.RDY == 0 from the controller enable itself. If it is not in the same function, it can be overlooked. For exampley, there is another call to NVMeEnableAdapter in PassiveIntialize that doesn��t check before calling. I would modify NVMeEnableAdapter as below (additions/changes in highlight), change the prototype, and have callers check for success or failure: * @return BOOLEAN * TRUE - If Adapter is enabled correctly * FALSE - If anything goes wrong ******************************************************************************/ BOOLEAN NVMeEnableAdapter( PNVME_DEVICE_EXTENSION pAE ) { PQUEUE_INFO pQI = &pAE->QueueInfo; NVMe_CONTROLLER_CONFIGURATION CC = {0}; NVMe_CONTROLLER_STATUS CSTS = {0}; ULONG PollMax = pAE->uSecCrtlTimeout / MAX_STATE_STALL_us; ULONG PollCount; /* * Program Admin queue registers before enabling the adapter: * Admin Queue Attributes */ StorPortWriteRegisterUlong( pAE, (PULONG)(&pAE->pCtrlRegister->AQA), (pQI->pSubQueueInfo->SubQEntries - 1) + ((pQI->pCplQueueInfo->CplQEntries - 1) << NVME_AQA_CQS_LSB)); . . . (further down): StorPortDebugPrint(INFO, "NVMeEnableAdapter: Setting EN...\n"); /* * Set up Controller Configuration Register */ /* After reset, we must wait for CSTS.RDY == 0 before setting CC.EN to 1 */ for (PollCount = 0; PollCount < PollMax; PollCount++) { CSTS.AsUlong = StorPortReadRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CSTS.AsUlong)); if (CSTS.RDY == 0) { /* Move on if RDY bit is cleared */ break; } NVMeStallExecution(pAE, MAX_STATE_STALL_us); } if (CSTS.RDY != 0) { /* If RDY bit won't clear we can't enable the adapter */ return FALSE; } CC.EN = 1; CC.CSS = NVME_CC_NVM_CMD; CC.MPS = (PAGE_SIZE >> NVME_MEM_PAGE_SIZE_SHIFT); CC.AMS = NVME_CC_ROUND_ROBIN; CC.SHN = NVME_CC_SHUTDOWN_NONE; CC.IOSQES = NVME_CC_IOSQES; CC.IOCQES = NVME_CC_IOCQES; StorPortWriteRegisterUlong(pAE, (PULONG)(&pAE->pCtrlRegister->CC), CC.AsUlong); return TRUE; } /* NVMeEnableAdapter */ Thanks, Judy From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Yong Chen Sent: Friday, November 22, 2013 1:48 PM To: Alex Chang; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi, everyone, I hope many are busy testing the changes on your devices. If you have any feedback to share, I��d very appreciate it. Holiday is upon us and we��d like to wrap up this much delayed soon. Thanks, Yong From: Yong Chen Sent: Wednesday, November 20, 2013 12:52 AM To: 'Alex Chang'; Uma Parepalli Cc: 'nvmewin at lists.openfabrics.org' Subject: RE: [nvmewin] code review: crash dump & hibernation support Object #1: crash dump when blue screen or manual triggered, for all SKUs (server or client). Object #2: hibernate and then resume on all client SKUs. + Minor cleaning up and fixes along the way. High-level Summary: The major change is to enable ntldrDump mode so that during crash dump or hibernation, the system memory can be dumped to pre-allocated block locations (MEMORY.DMP or HIBERFIL.SYS file). The same nvme.sys driver will be reloaded as another image into strict dumbed-down environment where usual APIs are not available anymore. The next challenge is to re-initialize the controller properly after having resumed from hibernation image and to continue serve as system boot disk. I need to give credits to earlier contributors (Intel, LSI, IDT and others) for having laid solid building blocks needed for the dump mode. This change solved the buffer issue and introduced a different code path for IOs in dump mode. Detailed Briefs: �� nvmeInit.c changes to manage buffers and IO queues in dump modes. �� nvmeIo.c minor tweak in dump mode where only boot-CPU is available �� nvmeSnti.c fix existing bug where FLUSH cmd should include NSID (all NSs in this case). �� nvmeStat.c helper function change due to some timer related API not available in dump mode. �� nvmeStd.c A: refactored code into NVMeWaitForCtrlRDY() for waiting after setting CC.EN = 1. B: introduced same waiting logic after clearing CC.EN = 0. C: during power up, Reset will issue DPC to call RecoveryDpcRoutine() to re-initialize the driver, similarly the above A+B steps are introduced. Using trunk version code, the hardware I have always timed out on initialization. I have had this fix since this spring. I think it is the same issue listed in Kwok��s laundry list. But I would need Judy to verify whether the issue she found is fixed or not by this change. (Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset.) From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Tuesday, November 19, 2013 5:04 PM To: Uma Parepalli; Yong Chen Subject: RE: [nvmewin] code review: crash dump & hibernation support Hi Yong, Could you please summarize the changes you made? Normally, we list the changes under each file as high-level briefs. Thanks, Alex From: Uma Parepalli [mailto:uma.parepalli at skhms.com] Sent: Tuesday, November 19, 2013 4:05 PM To: Alex Chang Subject: RE: [nvmewin] code review: crash dump & hibernation support Is there a change log file or something that explains what changes are made without looking at the code? Uma From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Tuesday, November 19, 2013 4:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] code review: crash dump & hibernation support Hi all, Please find the attached code changes made by Yong Chen from Huawei. Please review the changes, test them accordingly and provide your feedbacks. Thanks, Alex From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Tuesday, November 19, 2013 3:59 PM To: nvmewin at lists.openfabrics.org; Alex Chang Subject: RE: code review: crash dump & hibernation support Hi, all, Please download the source code from the link in the attached email (you need to have Silverlight installed). Or to save the trouble for everyone,�� Alex, Could you reply back with the code change you downloaded. Testlog is attached and see below for the list of tests. Thanks, Yong From: Yong Chen Sent: Monday, November 18, 2013 4:13 PM To: 'nvmewin at lists.openfabrics.org'; 'Alex Chang' Hi, Alex and all, Here is the code change to support crash dump and hibernation. Please review. Hopefully we can wrap up by this week before the meeting. Using the trunk version I had problem with the initialization as well. The trunk version would timeout on me. I think it is the same CSTS.RDY issue Judy raised up. I refactored a bit and fixed it at least for the hardware I have. Thanks, Yong Tests that I have gone thru: 1. manual crash dump. KD>.crash and then reboot and KD -z -v MEMORY.DMP, do ��!process -1 f�� 2. manual hibernation or pwrtest /sleep /c:10 /d:30 /p:30 /s:4 3. SCSICompliance (log attached). 4. stresses: iostress, sdstress (log attached) 5. hibernation has been tested on win8.0 as well, but not extensively. 6. Hibernation has also been tested with both bootable OptionROM or the newly released UEFI driver. 7. All tests were conducted on x64 platform, involving 3 different hardware, plus another Intel MB which can��t do hibernation (no S4). ________________________________ Yong Chen Storage Architect ��Ϊ�������޹�˾ Huawei Technologies Co., Ltd [Company_logo]Office: 408-330-5482 Mobile: 425-922-0658 Email: yong.sc.chen at huawei.com 2330 Central Expressway Santa Clara, CA 95050 http://www.huawei.com ���ʼ����丽�����л�Ϊ��˾�ı�����Ϣ�������ڷ��͸������ַ���г��ĸ��˻�Ⱥ�顣�� ֹ�κ����������κ���ʽʹ�ã�������������ȫ���򲿷ֵ�й¶�����ơ���ɢ�������ʼ��� ����Ϣ������������˱��ʼ������������绰���ʼ�֪ͨ�����˲�ɾ�����ʼ��� This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 6737 bytes Desc: image001.jpg URL: From carolyn.d.foster at intel.com Wed Jan 29 14:45:03 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Wed, 29 Jan 2014 22:45:03 +0000 Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Message-ID: Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: LBARangeType.zip Type: application/x-zip-compressed Size: 174861 bytes Desc: LBARangeType.zip URL: From Alex.Chang at pmcs.com Wed Jan 29 14:48:45 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 29 Jan 2014 22:48:45 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> Message-ID: Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 15:34:58 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Wed, 29 Jan 2014 23:34:58 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0BA0@ORSMSX109.amr.corp.intel.com> May I ask a novice question? It just occurred to me I was comparing the changes on this patch to the IDT_NVMe_1_0_e_compliance branch, which is what I am working with and care about. Do you want this patch compared with the latest code? Or is comparison with IDT_NVMe_1_0_ecompliance sufficient? Thanks! From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Wed Jan 29 15:37:16 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Wed, 29 Jan 2014 23:37:16 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: <2D98093777D3FD46A36253F35FE9D693891A0BA0@ORSMSX109.amr.corp.intel.com> References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0BA0@ORSMSX109.amr.corp.intel.com> Message-ID: The changes of the patch need to be based on the latest code base. Thanks, Alex From: Freyensee, James P [mailto:james.p.freyensee at intel.com] Sent: Wednesday, January 29, 2014 3:35 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: LBA Range Type Patch May I ask a novice question? It just occurred to me I was comparing the changes on this patch to the IDT_NVMe_1_0_e_compliance branch, which is what I am working with and care about. Do you want this patch compared with the latest code? Or is comparison with IDT_NVMe_1_0_ecompliance sufficient? Thanks! From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 17:19:17 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 01:19:17 +0000 Subject: [nvmewin] Versioning proposal for trunk/ Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 18:01:31 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 02:01:31 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at pmcs.com Wed Jan 29 18:19:14 2014 From: Kwok.Kong at pmcs.com (Kwok Kong) Date: Thu, 30 Jan 2014 02:19:14 +0000 Subject: [nvmewin] OFA Windows driver meeting notes 1-29-14 Message-ID: <03D88B383FA04244AA514AA931F7B1290D28037F@BBYEXM01.pmc-sierra.internal> NVMe OFA Windows Driver Meeting Note (Jan 29, 2014) Meeting Status ============== 1. LBA Range Type is an optional feature in NVMe. The current version of the driver fails if LBA Range Type is not supported by a NVMe device. This is going to cause problems for some NVMe devices in the NVMe plugfest. Intel requested that this should be fixed asap before the PlugFlex and the team agreed. 2. The following features/fixes are (or will be) completed in the next few weeks: - Hibernation (Yong Chen - Huawei) - NUMA group support in core enumeration - (Alex Chang - PMC) - Core-MSI vector queue mapping issues - CMD_ENTRY synchronization issues - Remove using mask bits as core index to allocate core tables - (Alex Chang - PMC) - Paramlist length problem - (Alex Chang - PMC) - NVMeInitAdminQueues return value - (Alex Chang - PMC) - Performance issue in Windows 2012 and Windows 8. - (Alex Chang - PMC) - freeQList Access - (Alex Chang - PMC) - PRP list building problem - (Alex Chang - PMC) - Extended SRB - (Alex Chang - PMC) - remove #define for CHATHAM2 (Carolyn) - Learning of CPU core to Vector failure handling (Carolyn) - LBA Range Type fix (Carolyn) - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset (Dharani @Sandisk) - Controller reset does not handle all cases (Dharani @Sandisk) - orphaned requests (Dharani @Sandisk) - BUILDIO (Rick) - Not a problem. No fix is required. 3. The patch commit order has been agreed as follows: 1. Carolyn at intel (2-3-14) - LBA Range type fix 2. Yong Chen at Huawei (2-10-14) - Hibernation 3. Dharani at Sandisk (2-15-14) - Not handling CSTS.RDY status (from 1->0 and 0->1) properly on NVMe reset - Controller reset does not handle all cases - orphaned requests 4. Alex Chang at PMC (2-22-14) - NUMA group support in core enumeration - Core-MSI vector queue mapping issues - CMD_ENTRY synchronization issues - Remove using mask bits as core index to allocate core tables - Paramlist length problem - NVMeInitAdminQueues return value - Performance issue in Windows 2012 and Windows 8. - freeQList Access - PRP list building problem - Extended SRB 5. Carolyn at intel (2-29-14) - remove #define for CHATHAM2 - Learning of CPU core to Vector failure handling 4. The team agreed on the following features for the 1.4 release. - Driver Trace feature (WPP tracing) - Tom Freeman at hgst - WMI commands processing - Tom Freeman at hgst - Migrate to VS2013, WDK 8.1 - Tom Freeman at hgst - WHQL Certification test - Alex Chang at PMC - Fix NVMe format - Alex Chang at PMC 5. 2014 Release schedule 1.3 is expected to be released by end of Feb/early March. 1.4 is expected to be released by Oct. Features that are not supported currently ========================================= NVMe 1.1 support: - multi-path - SGL - Get/Set feature update - Autonomous power state transition - Host Identifier - Reservation Notification Mask - Reservation Persistence - identify structure update - write zeros command End to End Protection -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 18:24:44 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 02:24:44 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Wed Jan 29 18:42:37 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 30 Jan 2014 02:42:37 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> Message-ID: Hi James, Yes, the driver displays its name in Device Manager via COMMNvme.DeviceDesc. As for the driver revision number in .inf file, the last release is 1.2.0.0 and we mean to keep the first two numbers as release numbers for the future to comply with Windows versioning. I am open to any proposals using the last two numbers. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:25 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 18:48:51 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 02:48:51 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0E21@ORSMSX109.amr.corp.intel.com> So then why not have the NVMe standard target version in the COMMNvme.DeviceDesc and change this target version when trunk/ is focusing on the next NVMe standard? Then when this happens the driver revision number goes from: 1.2.X.Y (NVMe 1.0.e version, reflected in COMMNvme.DeviceDesc) --> 1.3.0.0 (NVMe 1.1 version, reflected in COMMNvme.DeviceDesc) ?? From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, January 29, 2014 6:43 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Hi James, Yes, the driver displays its name in Device Manager via COMMNvme.DeviceDesc. As for the driver revision number in .inf file, the last release is 1.2.0.0 and we mean to keep the first two numbers as release numbers for the future to comply with Windows versioning. I am open to any proposals using the last two numbers. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:25 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Wed Jan 29 19:10:35 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 03:10:35 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A0E79@ORSMSX109.amr.corp.intel.com> 1. line 1407: Couldn't Invalid Command Opcode also be a reasonable status code for "Get/Set Features- LBA Range Type" being optional? It does look like "INVALID_FIELD_IN_COMMAND" should be the more exact return type, but I was thinking "Invalid command opcode" could be an acceptable error status code type as well? My reasoning is that since LBA Range Type is an optional command in the NVMe spec, the driver should not be too strict on what error status codes it receives that will make it an optional command. In fact, maybe the code should only be checking for "successful completion", and any error makes LBA Range Type optional (and the driver will not fail) 2. Will the driver ever attempt a re-try on "Get/Set Features- LBA Range Type" if it gets an error? It does not look like the case, but if there is a chance it does, maybe it should check the "do not retry" bit in case HW sets it? If this bit is set, then it's pretty certain the HW does not support it. But if it is not set, maybe the driver needs to try "Get/Set Features- LBA Range Type" again before it makes it an optional command? Thanks Jay From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Yong.sc.Chen at huawei.com Wed Jan 29 22:50:10 2014 From: Yong.sc.Chen at huawei.com (Yong Chen) Date: Thu, 30 Jan 2014 06:50:10 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> Message-ID: <02EC085151D99A469E06988E94FEBCDB1CE71EAD@SJCEML701-CHM.china.huawei.com> 22 LOC starting from #1424 is identical to LOC starting from #1537, including verbose comments...candidate to refactor/reuse. If not a strong case, then minor Copy&Paste mistake : On #1419 pAE->DriverState.ConfigLbaRangeNeeded = FALSE; On #1440 block, we don't have a TRUE case anymore, can simply set state to NVMeWaitOnIdentifyNS; Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From carolyn.d.foster at intel.com Thu Jan 30 10:24:56 2014 From: carolyn.d.foster at intel.com (Foster, Carolyn D) Date: Thu, 30 Jan 2014 18:24:56 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: <02EC085151D99A469E06988E94FEBCDB1CE71EAD@SJCEML701-CHM.china.huawei.com> References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> <02EC085151D99A469E06988E94FEBCDB1CE71EAD@SJCEML701-CHM.china.huawei.com> Message-ID: Hi Yong, The 22 LOC are similar, but only half of that is actual code, and the rest are comments. The effort to reuse such a small amount likely would outweigh the benefits. I don't have a problem with removing the unnecessary code you outlined on line 1440. Please let me know if this is acceptable. Thanks, Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Wednesday, January 29, 2014 11:50 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: LBA Range Type Patch 22 LOC starting from #1424 is identical to LOC starting from #1537, including verbose comments...candidate to refactor/reuse. If not a strong case, then minor Copy&Paste mistake : On #1419 pAE->DriverState.ConfigLbaRangeNeeded = FALSE; On #1440 block, we don't have a TRUE case anymore, can simply set state to NVMeWaitOnIdentifyNS; Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Yong.sc.Chen at huawei.com Thu Jan 30 11:08:16 2014 From: Yong.sc.Chen at huawei.com (Yong Chen) Date: Thu, 30 Jan 2014 19:08:16 +0000 Subject: [nvmewin] LBA Range Type Patch In-Reply-To: References: <6419_1391035517_52E9847D_6419_15986_1_B3A485AFDDB1DD4598621E85E8EB67A83AAF0B53@FMSMSX105.amr.corp.intel.com> <02EC085151D99A469E06988E94FEBCDB1CE71EAD@SJCEML701-CHM.china.huawei.com> Message-ID: <02EC085151D99A469E06988E94FEBCDB1CE72187@SJCEML701-CHM.china.huawei.com> Agreed. That is what I would do. From: Foster, Carolyn D [mailto:carolyn.d.foster at intel.com] Sent: Thursday, January 30, 2014 10:25 AM To: Yong Chen; Alex Chang; nvmewin at lists.openfabrics.org Subject: RE: LBA Range Type Patch Hi Yong, The 22 LOC are similar, but only half of that is actual code, and the rest are comments. The effort to reuse such a small amount likely would outweigh the benefits. I don't have a problem with removing the unnecessary code you outlined on line 1440. Please let me know if this is acceptable. Thanks, Carolyn From: Yong Chen [mailto:Yong.sc.Chen at huawei.com] Sent: Wednesday, January 29, 2014 11:50 PM To: Alex Chang; Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: RE: LBA Range Type Patch 22 LOC starting from #1424 is identical to LOC starting from #1537, including verbose comments...candidate to refactor/reuse. If not a strong case, then minor Copy&Paste mistake : On #1419 pAE->DriverState.ConfigLbaRangeNeeded = FALSE; On #1440 block, we don't have a TRUE case anymore, can simply set state to NVMeWaitOnIdentifyNS; Thanks, Yong From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Alex Chang Sent: Wednesday, January 29, 2014 2:49 PM To: Foster, Carolyn D; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] LBA Range Type Patch Thank you very much, Carolyn. Hi all, Please review/test the patch and provide your feedback(s) as soon as possible. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Foster, Carolyn D Sent: Wednesday, January 29, 2014 2:45 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** LBA Range Type Patch Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: %%SENT_DATE%% Subject: Suspect Message Quarantined WARNING: The virus scanner was unable to scan an attachment in an email message sent to you. This attachment could possibly contain viruses or other malicious programs. The attachment could not be scanned for the following reasons: %%DESC%% The full message and the attachment have been stored in the quarantine. The identifier for this message is '%%QID%%'. Access the quarantine at: https://puremessage.pmc-sierra.bc.ca:28443/ For more information on PMC's Anti-Spam system: http://pmc-intranet/wiki/index.php/Outlook:Anti-Spam_FAQ IT Services PureMessage Admin Overview: The LBA Range Type feature is defined as optional in the spec, but the driver currently fails to complete enumeration if the Get Features request for LBA Range Type fails. Based on the serious nature of the failure, this could be a problem at the upcoming plugfest. There will likely be devices at plugfest that do not support LBA Range Type, thus causing the OFA driver to not load on these devices. Files Modified: In nvmeInit.c, NVMeSetFeaturesCompletion(), the LBA Range Type Get Features request is treated as mandatory and will cause enumeration to fail if the Get Features command is not successful. This change looks at the status codes and will allow enumeration to continue if the device returns Invalid. Password: intel123 Feedback requested by Feb. 5th. Unit Tests: Cold boot Reboot Reset while running in the OS SdStress SCSI Compliance Test (I did see what I believe are known failures in Read Capacity, Mode Select, and Write (10)) Driver Update using INF Carolyn Dase Foster Intel Corp. NVM Solutions Group Internal SSD Engineering Phone: 480-554-2421 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Thu Jan 30 12:22:09 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Thu, 30 Jan 2014 20:22:09 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: <2D98093777D3FD46A36253F35FE9D693891A0E21@ORSMSX109.amr.corp.intel.com> References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0E21@ORSMSX109.amr.corp.intel.com> Message-ID: I wouldn't be against reflecting NVMe specification version in the COMMNvme.DeviceDesc. However, the up-coming release will be versioned as 1.3.0.0 supporting up to 1.0e. Please provide your thoughts if you have any. Thanks, Alex From: Freyensee, James P [mailto:james.p.freyensee at intel.com] Sent: Wednesday, January 29, 2014 6:49 PM To: Alex Chang; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ So then why not have the NVMe standard target version in the COMMNvme.DeviceDesc and change this target version when trunk/ is focusing on the next NVMe standard? Then when this happens the driver revision number goes from: 1.2.X.Y (NVMe 1.0.e version, reflected in COMMNvme.DeviceDesc) --> 1.3.0.0 (NVMe 1.1 version, reflected in COMMNvme.DeviceDesc) ?? From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, January 29, 2014 6:43 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Hi James, Yes, the driver displays its name in Device Manager via COMMNvme.DeviceDesc. As for the driver revision number in .inf file, the last release is 1.2.0.0 and we mean to keep the first two numbers as release numbers for the future to comply with Windows versioning. I am open to any proposals using the last two numbers. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:25 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Thu Jan 30 12:28:09 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 20:28:09 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0E21@ORSMSX109.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A42D2@ORSMSX109.amr.corp.intel.com> What would the version be when trunk/ work started for NVMe 1.1? 2.0.0.0? From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, January 30, 2014 12:22 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ I wouldn't be against reflecting NVMe specification version in the COMMNvme.DeviceDesc. However, the up-coming release will be versioned as 1.3.0.0 supporting up to 1.0e. Please provide your thoughts if you have any. Thanks, Alex From: Freyensee, James P [mailto:james.p.freyensee at intel.com] Sent: Wednesday, January 29, 2014 6:49 PM To: Alex Chang; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ So then why not have the NVMe standard target version in the COMMNvme.DeviceDesc and change this target version when trunk/ is focusing on the next NVMe standard? Then when this happens the driver revision number goes from: 1.2.X.Y (NVMe 1.0.e version, reflected in COMMNvme.DeviceDesc) --> 1.3.0.0 (NVMe 1.1 version, reflected in COMMNvme.DeviceDesc) ?? From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, January 29, 2014 6:43 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Hi James, Yes, the driver displays its name in Device Manager via COMMNvme.DeviceDesc. As for the driver revision number in .inf file, the last release is 1.2.0.0 and we mean to keep the first two numbers as release numbers for the future to comply with Windows versioning. I am open to any proposals using the last two numbers. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:25 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Thu Jan 30 12:38:47 2014 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 30 Jan 2014 20:38:47 +0000 Subject: [nvmewin] Versioning proposal for trunk/ In-Reply-To: References: <2D98093777D3FD46A36253F35FE9D693891A0CD4@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0D67@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0DCF@ORSMSX109.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D693891A0E21@ORSMSX109.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D693891A4315@ORSMSX109.amr.corp.intel.com> I like this versioning idea :). I think this along with adding the NVMe spec version in COMMNvme.DeviceDesc makes it very clear what code version someone is pulling from trunk/ and when trunk/ will reflect the next NVMe spec target. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Thursday, January 30, 2014 12:32 PM To: Alex Chang; Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ It's clear that you not only want to track the NVMe spec that the current version implements, but also want to track driver specific changes (such as implementing new features). A suggestion would be to continue using the Major.Minor to reflect driver versions and use the Release.Build fields to describe the spec and revision. The Release field could contain a 4 digit number where the first two digits represent the Major Spec and the last 2 represent the revision. For instance, in your example of 1.3.00 supporting up to 1.0e, the full version would be: 1.3.1014.36 where 10 = 1.0 and 14 = rev e and 36 = change number. This way you could release a 1.4.1014.36 which contains only driver infrastructure changes (i.e. WPP tracing) without effecting the NVMe spec that it adheres to. ~kenny From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Thursday, January 30, 2014 12:22 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ I wouldn't be against reflecting NVMe specification version in the COMMNvme.DeviceDesc. However, the up-coming release will be versioned as 1.3.0.0 supporting up to 1.0e. Please provide your thoughts if you have any. Thanks, Alex From: Freyensee, James P [mailto:james.p.freyensee at intel.com] Sent: Wednesday, January 29, 2014 6:49 PM To: Alex Chang; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ So then why not have the NVMe standard target version in the COMMNvme.DeviceDesc and change this target version when trunk/ is focusing on the next NVMe standard? Then when this happens the driver revision number goes from: 1.2.X.Y (NVMe 1.0.e version, reflected in COMMNvme.DeviceDesc) --> 1.3.0.0 (NVMe 1.1 version, reflected in COMMNvme.DeviceDesc) ?? From: Alex Chang [mailto:Alex.Chang at pmcs.com] Sent: Wednesday, January 29, 2014 6:43 PM To: Freyensee, James P; Speer, Kenny; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Hi James, Yes, the driver displays its name in Device Manager via COMMNvme.DeviceDesc. As for the driver revision number in .inf file, the last release is 1.2.0.0 and we mean to keep the first two numbers as release numbers for the future to comply with Windows versioning. I am open to any proposals using the last two numbers. Thanks, Alex From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:25 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ Oh ok, I thought COMMNvmeChat.DeviceDesc was the name/driver-title that shows up when you look at the driver under "Device Manager" in the device tree, but if CommNvme.DeviceDesc does it, then that's fine by me :). I think that is a good idea, to have the numeric version match what is in COMMNvmeChat.DeviceDesc/ CommNvme.DeviceDesc. From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 6:18 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ Yes, that makes more sense, in fact the last version I grabbed shows: CommNvme.DeviceDesc = "Community NVME Storport Miniport" ;COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" So it's already been updated to be more generic, just add the specific version you want. I do also suggest having the numeric version match for clarity. From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 6:02 PM To: Speer, Kenny; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Versioning proposal for trunk/ May be sufficient but associating 'e' as a hex value and making it '14' would not be as intuitive to the non-nvme open-source developer/maintainer, which is a goal I would like to meet. Maybe using: COMMNvmeChat.DeviceDesc = "NVMe 1.0.e open-source driver, version 36" Would be good instead? The current value is: COMMNvmeChat.DeviceDesc = "Intel Chatham Prototype Hardware" which I would think the goals of this project is beyond Intel Chatham Prototypes now? From: Speer, Kenny [mailto:Kenny.Speer at netapp.com] Sent: Wednesday, January 29, 2014 5:47 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Versioning proposal for trunk/ The only issue is that Windows driver versioning is of the format Major.Minor.Release.Build using numerics in decimal only. I've never tried to just munge the .inf version directly and have it differ from the driver but I suspect VS2013 will flag it. Perhaps just use 1.0.14.36 to represent 1.0.e.36 ... From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Wednesday, January 29, 2014 5:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Versioning proposal for trunk/ May I propose a versioning change with respect to nvme.inf file and trunk/? When I pull code from trunk/, I always would like to know what code I am getting with respect to the NVMe spec standard, without having to ask people or ask the email list what the NVMe standard target is. I would also like to be able to easily tell what version of a compiled Open-source NVMe driver is running on a Windows OS. What I would like to propose is for the .inf file to maintain the versioning in the following manner: If trunk/ is targeting the NVMe 1.0.e standard (which I assume it is), then 'DriverVer' in the .inf file is set in the following manner: DriverVer=1/29/2014,1.0.e.36 where "1.0.e" is the NVMe standard being targeted, and '36' is the 36th time code in trunk/ has been changed for the NVMe standard target (1.0.e in this case). Thus, when the open-source team is ready to target the NVMe 1.1 standard, the last version of the 1.0.e code in trunk/ will go to a 1.0.e branch, and in trunk/ the new value for "DriverVer" in nvme.inf would be: DriverVer=1/29/2014,1.1.0 When a first revision of code is done in trunk/ with respect to the NVMe 1.1 standard, the DriverVer in the .inf file would be: DriverVer=1/29/2014,1.1.1 And for a 2nd revision in trunk/ it would be: DriverVer=1/29/2014,1.1.2 etc., etc. I welcome alternative ideas if this is not doable or not simplistic enough where when someone pulls code down from trunk/ they do not recognize exactly what standard the code is targeting. Regardless, I want the goal to be a simple solution such that a person not familiar with Windows NVMe open-source development and maintenance instantly recognize what NVMe spec standard their copy of the nvme source-code/compiled-driver is targeting. Thanks! Jay Freyensee -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinayvl at gmail.com Thu Jan 30 23:18:46 2014 From: vinayvl at gmail.com (Vinay V.L) Date: Fri, 31 Jan 2014 12:48:46 +0530 Subject: [nvmewin] Intel Inbox Driver-StorNVMe.inf - ID Command failed Message-ID: Hi, I am trying to send NVMe Admin Command(ID Command-Controller) though IOCTL_SCSI_MINIPORT packet. Followed the OFA Driver Document to build the packet. It failed with error code 1117(I/O Device error). I am curious to find out why this is failing. 1. Does the driver support IOCTL_SCSI_MINIPORT packets or only IOCTL_SCSI_Pass_through? 2. I have tried with signature codes: "WINOWS NT", "stornvme" & "StorNVMe"; all led to the same error. Is the driver throwing errors based on this? Test Configuration: OS: Windows 8.1, 64-bit Driver : Inbox Driver, StorNVMe, Version : 6.3.9431.0 Single NVMe Disk connected to PCIe slot. Thanks, Vinay -------------- next part -------------- An HTML attachment was scrubbed... URL: From Parag.Sheth at lsi.com Thu Jan 30 23:42:12 2014 From: Parag.Sheth at lsi.com (Sheth, Parag) Date: Fri, 31 Jan 2014 07:42:12 +0000 Subject: [nvmewin] Intel Inbox Driver-StorNVMe.inf - ID Command failed In-Reply-To: References: Message-ID: Hi Vinay, Windows 8.1 has Microsoft's own driver in it. This one is different than OFA driver. Thanks Parag Sheth Sent from my Windows Phone ________________________________ From: Vinay V.L Sent: ‎1/‎30/‎2014 11:19 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Intel Inbox Driver-StorNVMe.inf - ID Command failed Hi, I am trying to send NVMe Admin Command(ID Command-Controller) though IOCTL_SCSI_MINIPORT packet. Followed the OFA Driver Document to build the packet. It failed with error code 1117(I/O Device error). I am curious to find out why this is failing. 1. Does the driver support IOCTL_SCSI_MINIPORT packets or only IOCTL_SCSI_Pass_through? 2. I have tried with signature codes: "WINOWS NT", "stornvme" & "StorNVMe"; all led to the same error. Is the driver throwing errors based on this? Test Configuration: OS: Windows 8.1, 64-bit Driver : Inbox Driver, StorNVMe, Version : 6.3.9431.0 Single NVMe Disk connected to PCIe slot. Thanks, Vinay -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at pmcs.com Fri Jan 31 15:43:27 2014 From: Alex.Chang at pmcs.com (Alex Chang) Date: Fri, 31 Jan 2014 23:43:27 +0000 Subject: [nvmewin] ***UNCHECKED*** PMC New Patch Message-ID: Hi all, Please find the attached patch from PMC-Sierra. The password is pmc123. In order to speed up the entire process and meet our next release date, please review the changes and provide feedbacks as soon as possible. For each outstanding patch, we plan to collect feedbacks for about a week after it is being sent out. A revised patch shall be sent out to include the feedbacks. I will follow up for approval after a week or so to allow more testing and reviewing if necessary. Summary of changes: 1. SRB Extension support for Windows 8 and up. Files changed: nvmeStd.c, nvmeSnti.c, nvmeStat.c, nvmePwrMgmt.c, nvmeInit.c and the related header files. 2. PRP list building for IOCTL and internal requests. Files changed: nvmeStd.c, nvmeInit.c and nvmestd.h. 3. Performance issue in Windows 8/Server 2012. File changed: nvmeStd.c (removed StorPortGetUncachedExtension calling in NVMeFindAdapter) 4. NVMeInitAdminQueues return value. File changed: nvmeStd.c (Instead of returning TRUE/FALSE, return Storport defined status) 5. Paramlist length problem. Files changed: nvmeSnti.c and nvmeSntiTypes.h (Program DW10 in submission entry in DWORDs for Write Buffer command translation) 6. Removal of using mask bits as core index to allocate/identify core tables. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 7. Implemented logical processor group defined by Windows. Files changed: nvmeStd.c, nvmeInit.c and the related header files. 8. NVMe reset handling issue. File changed: nvmeStd.c (Need to wait until RDY bit is cleared to 0 after changing EN bit from 1 to 0) 9. Core-MSI vector-Queue mapping, CMD_ENTRY synchronization and FreeQList access issues are related to using core mask bits as core index (#6) and no support for logical processor group (#7). Platforms tested: 1. Windows 7 64-bit 2. Windows Server 2008 R2 3. Windows 8 64-bit 4. Windows Server 2012 Tests run; 1. Installation(clean and update)/Un-Installation/Enable/Disable. 2. IOMeter 4K Read/write combining in random/sequential manners. 3. SCSC Compliance. 4. SDStress 5. Quick/full disk formats Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pmc_patch.zip Type: application/x-zip-compressed Size: 174479 bytes Desc: pmc_patch.zip URL: