From Kwok.Kong at idt.com Mon Jul 2 11:32:24 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 2 Jul 2012 18:32:24 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range Message-ID: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. Please let me know what you think. Thanks -Kwok From paul.e.luse at intel.com Mon Jul 2 12:36:17 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 2 Jul 2012 19:36:17 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree Please let me know what you think. Thanks -Kwok From Kwok.Kong at idt.com Mon Jul 2 13:48:06 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 2 Jul 2012 20:48:06 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok From Kwok.Kong at idt.com Mon Jul 2 14:08:30 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 2 Jul 2012 21:08:30 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> Keith, I agree that overlapping LBA ranges would be a spec violation and is an error. What should the driver do when there are overlapping LBA ranges ? -Kwok -----Original Message----- From: Busch, Keith [mailto:keith.busch at intel.com] Sent: Monday, July 02, 2012 2:04 PM To: Kong, Kwok; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From paul.e.luse at intel.com Mon Jul 2 14:10:45 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 2 Jul 2012 21:10:45 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> I'm thinking email is going to be a pretty inefficient way to handle this now that we've traded a few. Kwok, maybe you can write up a list of use cases along with a section/column for spec interpretation and one for expected behavior of the standard drivers. Then we can setup a call to discuss each one as there will be lots of discussion on each item that really requires real time conversation I think. Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 2:09 PM To: Busch, Keith; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Keith, I agree that overlapping LBA ranges would be a spec violation and is an error. What should the driver do when there are overlapping LBA ranges ? -Kwok -----Original Message----- From: Busch, Keith [mailto:keith.busch at intel.com] Sent: Monday, July 02, 2012 2:04 PM To: Kong, Kwok; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From Kwok.Kong at idt.com Mon Jul 2 14:35:47 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 2 Jul 2012 21:35:47 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704A8F7@corpmail1.na.ads.idt.com> Sure. -Kwok -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 2:11 PM To: Kong, Kwok; Busch, Keith; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range I'm thinking email is going to be a pretty inefficient way to handle this now that we've traded a few. Kwok, maybe you can write up a list of use cases along with a section/column for spec interpretation and one for expected behavior of the standard drivers. Then we can setup a call to discuss each one as there will be lots of discussion on each item that really requires real time conversation I think. Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 2:09 PM To: Busch, Keith; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Keith, I agree that overlapping LBA ranges would be a spec violation and is an error. What should the driver do when there are overlapping LBA ranges ? -Kwok -----Original Message----- From: Busch, Keith [mailto:keith.busch at intel.com] Sent: Monday, July 02, 2012 2:04 PM To: Kong, Kwok; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From paul.e.luse at intel.com Tue Jul 3 10:41:59 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 3 Jul 2012 17:41:59 +0000 Subject: [nvmewin] pending windows patch Message-ID: <82C9F782B054C94B9FC04A331649C77A028EA9CD@FMSMSX106.amr.corp.intel.com> I've got another set of updates to apply to the windows driver, I can either wait and have the previous patch applied and send another or, if Alex/Rick/Aprit haven't started reviewing the last one we can disregard that one and I'll send out a new one in its place. Alex/Rick/Aprit, what's your preference? Thx Paul PS: the updates are a series of bug fixes and/or updates associated with namespace enumeration including, by coincidence, some supplication around LBA range handling until we sort out exactly how we want the driver to handle it. ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at idt.com Tue Jul 3 11:18:55 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Tue, 3 Jul 2012 18:18:55 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704ACD5@corpmail1.na.ads.idt.com> All, Have you any use cases for LBA range ? I would like to understand your use case while I am creating my proposal. I personally don't have any good use case for the LBA range. Thanks -Kwok -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 2:11 PM To: Kong, Kwok; Busch, Keith; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range I'm thinking email is going to be a pretty inefficient way to handle this now that we've traded a few. Kwok, maybe you can write up a list of use cases along with a section/column for spec interpretation and one for expected behavior of the standard drivers. Then we can setup a call to discuss each one as there will be lots of discussion on each item that really requires real time conversation I think. Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 2:09 PM To: Busch, Keith; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Keith, I agree that overlapping LBA ranges would be a spec violation and is an error. What should the driver do when there are overlapping LBA ranges ? -Kwok -----Original Message----- From: Busch, Keith [mailto:keith.busch at intel.com] Sent: Monday, July 02, 2012 2:04 PM To: Kong, Kwok; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From paul.e.luse at intel.com Tue Jul 3 11:27:52 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 3 Jul 2012 18:27:52 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <05CD7821AE397547A01AC160FBC2314704ACD5@corpmail1.na.ads.idt.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704ACD5@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028EAB4F@FMSMSX106.amr.corp.intel.com> The only value I see at the moment is the attributes and that's only on a per namespace basis (one LBA range covering the whole namespace). Specifically just the hidden and read-only bits... I'm not sure I see the value in breaking a NS into ranges given the support for multiple namespaces in the spec (other than to take advantage of the properties I just mentioned which are not available per NS). Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Tuesday, July 03, 2012 11:19 AM To: Luse, Paul E; Busch, Keith; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range All, Have you any use cases for LBA range ? I would like to understand your use case while I am creating my proposal. I personally don't have any good use case for the LBA range. Thanks -Kwok -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 2:11 PM To: Kong, Kwok; Busch, Keith; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range I'm thinking email is going to be a pretty inefficient way to handle this now that we've traded a few. Kwok, maybe you can write up a list of use cases along with a section/column for spec interpretation and one for expected behavior of the standard drivers. Then we can setup a call to discuss each one as there will be lots of discussion on each item that really requires real time conversation I think. Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 2:09 PM To: Busch, Keith; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Keith, I agree that overlapping LBA ranges would be a spec violation and is an error. What should the driver do when there are overlapping LBA ranges ? -Kwok -----Original Message----- From: Busch, Keith [mailto:keith.busch at intel.com] Sent: Monday, July 02, 2012 2:04 PM To: Kong, Kwok; Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From Alex.Chang at idt.com Tue Jul 3 11:40:08 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 3 Jul 2012 18:40:08 +0000 Subject: [nvmewin] pending windows patch In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028EA9CD@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A028EA9CD@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D3601BB60@corpmail1.na.ads.idt.com> Hi Paul, I am in the middle of testing your current patch. However, I don't mind you put both into one. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, July 03, 2012 10:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] pending windows patch I've got another set of updates to apply to the windows driver, I can either wait and have the previous patch applied and send another or, if Alex/Rick/Aprit haven't started reviewing the last one we can disregard that one and I'll send out a new one in its place. Alex/Rick/Aprit, what's your preference? Thx Paul PS: the updates are a series of bug fixes and/or updates associated with namespace enumeration including, by coincidence, some supplication around LBA range handling until we sort out exactly how we want the driver to handle it. ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 3 12:29:07 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 3 Jul 2012 19:29:07 +0000 Subject: [nvmewin] pending windows patch In-Reply-To: <548C5470AAD9DA4A85D259B663190D3601BB60@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A028EA9CD@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601BB60@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028EABEB@FMSMSX106.amr.corp.intel.com> OK, cool. Unless Arpit or Rick objects, I'll roll the changes into once patch and send out probably late this week. Your testing will still be relevant, I only fixes 2 small things in the current patch, the other stuff is in other areas. Thx Paul PS: 2 small things I fixed: - I used a ^ operator as power-of so I changed to 1 << instead but it wasn't being used for anything critical - made an extra copy of CAP and put it in the devExt not realizing we had a * to the ctrl registers in there already so I removed that From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 03, 2012 11:40 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: pending windows patch Hi Paul, I am in the middle of testing your current patch. However, I don't mind you put both into one. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, July 03, 2012 10:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] pending windows patch I've got another set of updates to apply to the windows driver, I can either wait and have the previous patch applied and send another or, if Alex/Rick/Aprit haven't started reviewing the last one we can disregard that one and I'll send out a new one in its place. Alex/Rick/Aprit, what's your preference? Thx Paul PS: the updates are a series of bug fixes and/or updates associated with namespace enumeration including, by coincidence, some supplication around LBA range handling until we sort out exactly how we want the driver to handle it. ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rick.Knoblaugh at lsi.com Tue Jul 3 13:52:14 2012 From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick) Date: Tue, 3 Jul 2012 14:52:14 -0600 Subject: [nvmewin] pending windows patch In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028EABEB@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A028EA9CD@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601BB60@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028EABEB@FMSMSX106.amr.corp.intel.com> Message-ID: <4565AEA676113A449269C2F3A549520FCF74BF38@cosmail03.lsi.com> Hi Paul, Sounds good to us. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, July 03, 2012 12:29 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] pending windows patch OK, cool. Unless Arpit or Rick objects, I'll roll the changes into once patch and send out probably late this week. Your testing will still be relevant, I only fixes 2 small things in the current patch, the other stuff is in other areas. Thx Paul PS: 2 small things I fixed: - I used a ^ operator as power-of so I changed to 1 << instead but it wasn't being used for anything critical - made an extra copy of CAP and put it in the devExt not realizing we had a * to the ctrl registers in there already so I removed that From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 03, 2012 11:40 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: pending windows patch Hi Paul, I am in the middle of testing your current patch. However, I don't mind you put both into one. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, July 03, 2012 10:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] pending windows patch I've got another set of updates to apply to the windows driver, I can either wait and have the previous patch applied and send another or, if Alex/Rick/Aprit haven't started reviewing the last one we can disregard that one and I'll send out a new one in its place. Alex/Rick/Aprit, what's your preference? Thx Paul PS: the updates are a series of bug fixes and/or updates associated with namespace enumeration including, by coincidence, some supplication around LBA range handling until we sort out exactly how we want the driver to handle it. ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at idt.com Thu Jul 5 12:28:15 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Thu, 5 Jul 2012 19:28:15 +0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range - new Proposal In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028EAB4F@FMSMSX106.amr.corp.intel.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> <05CD7821AE397547A01AC160FBC2314704A8CD@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E9A5B@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704ACD5@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028EAB4F@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704B4C3@corpmail1.na.ads.idt.com> All, I have reviewed the NVMe working group meeting minutes and have more discussions on the usage of LBA Range with others. I believe this is the direction from the NVMe WG based on the discussion on 12-15-11 and ECN 25. Here is ECN 25: "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" 1. A NVMe driver is not going to look at the LAB Range type (i.e. "Filesystem", "RAID" or "Cache" ...etc). It is expected that the layer above the NVMe driver may use the LBA Range type. i.e. A "RAID" driver or "Cache" driver may sit above the NVMe driver. A "Cache" driver may use the LBA Range type to decide how much space it can use for the caching function. How this is implemented is beyond the scope of the NVMe Spec. 2. A NVMe driver only looks at the attributes. If a range is "hidden", the NVMe drier should not expose this range to the OS/EFI/BIOS. All LBA ranges that follow a hidden range shall also be hidden. 3. A NVMe driver exposes all Read only (area should not be overwritten) LBA Ranges to the OS/EFI/BIOS but marks these areas as read only in the NVMe driver. NVMe driver returns an error when a write request is received with the LBA range that is within the read only area. 4. A NVMe driver exposes the total namespace size "NSZE" as reported by Identify Namespace minus the hidden LBA ranges to the OS/EFI/BIOS. 5. When there are overlapping LBA ranges, the most restrictive attribute wins. - Check for hidden LBA ranges - Check for Read only LBA ranges for those LBA ranges that are not hidden - The remaining LBA ranges are read/writeable 6. This is very simple for the NVMe driver to implement. It matches the NVMe spec and ECN 25. Please let me know if you have any comment or different understanding of the NVMe spec and ECN 25. We need an agreement from the Windows driver team (Windows driver) and Matthew (Linux driver) before we can add the support of LBA Range to the NVMe drivers. Thanks -Kwok We can set up a call to discuss this and reach an agreement. From paul.e.luse at intel.com Sun Jul 8 12:10:20 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Sun, 8 Jul 2012 19:10:20 +0000 Subject: [nvmewin] patch status Message-ID: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> Wanted to provide a quick update: I'm nearly ready with the latest patch and due to timing its going to include another key feature - the ability to handle NS creations and deletions. The mechanism by which a NS comes and goes is beyond the scope of the spec so I won't be including the details of what we're doing at Intel to implement that however the driver framework to handle a dynamic set of NS's is independent of that mechanism and there's no reason not to push that to the community so we can all use it. It consists of modifications to the LunExt array and the associated counters and components that read/write/use it. Details will be in the right up when I send it out. I have some travel this week so once again I'll target later in the week for sending it. Following its publication some changes will be needed (or should be done I should say) for the format PT IOCTL to take advantage of the new capabilities but rather than bundle even more into this patch I'll hold off on that for another time.... -Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at idt.com Mon Jul 9 09:25:51 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 9 Jul 2012 16:25:51 +0000 Subject: [nvmewin] patch status In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704D06B@corpmail1.na.ads.idt.com> Paul, The ability to handle NS creations and deletions is a new feature to the driver. I would like to separate this features from the rest of the patch. Would you please send out your write up on how you want to handle the NS creation and deletion such that we can review it first before you send out the patch? It is a good idea to support the NS creation and deletion but we should review your proposal first before the patch. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Sunday, July 08, 2012 12:10 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] patch status Wanted to provide a quick update: I'm nearly ready with the latest patch and due to timing its going to include another key feature - the ability to handle NS creations and deletions. The mechanism by which a NS comes and goes is beyond the scope of the spec so I won't be including the details of what we're doing at Intel to implement that however the driver framework to handle a dynamic set of NS's is independent of that mechanism and there's no reason not to push that to the community so we can all use it. It consists of modifications to the LunExt array and the associated counters and components that read/write/use it. Details will be in the right up when I send it out. I have some travel this week so once again I'll target later in the week for sending it. Following its publication some changes will be needed (or should be done I should say) for the format PT IOCTL to take advantage of the new capabilities but rather than bundle even more into this patch I'll hold off on that for another time.... -Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Jul 9 09:29:13 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 9 Jul 2012 16:29:13 +0000 Subject: [nvmewin] patch status In-Reply-To: <05CD7821AE397547A01AC160FBC2314704D06B@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704D06B@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028EFA23@FMSMSX106.amr.corp.intel.com> Hi Kwok- To be clear, I'm just talking about having the driver logical map (the lun extension array) support the ability to tolerate create and delete, I'm not going to propose the actual create and delete functions themselves as those are out of scope of the spec. Given that, I'll send the patch out w/the changes included and if anyone is uncomfortable with them I can remove them fairly easily. I know this isn't what you're asking for but with limited time and the changes not being very extensive I'd like to proceed this way. Again, I'll pull them if the group doesn't like what they see but I'm nearly done right now as it is J Thx Paul From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 09, 2012 9:26 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: patch status Paul, The ability to handle NS creations and deletions is a new feature to the driver. I would like to separate this features from the rest of the patch. Would you please send out your write up on how you want to handle the NS creation and deletion such that we can review it first before you send out the patch? It is a good idea to support the NS creation and deletion but we should review your proposal first before the patch. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Sunday, July 08, 2012 12:10 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] patch status Wanted to provide a quick update: I'm nearly ready with the latest patch and due to timing its going to include another key feature - the ability to handle NS creations and deletions. The mechanism by which a NS comes and goes is beyond the scope of the spec so I won't be including the details of what we're doing at Intel to implement that however the driver framework to handle a dynamic set of NS's is independent of that mechanism and there's no reason not to push that to the community so we can all use it. It consists of modifications to the LunExt array and the associated counters and components that read/write/use it. Details will be in the right up when I send it out. I have some travel this week so once again I'll target later in the week for sending it. Following its publication some changes will be needed (or should be done I should say) for the format PT IOCTL to take advantage of the new capabilities but rather than bundle even more into this patch I'll hold off on that for another time.... -Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at idt.com Mon Jul 9 09:35:16 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Mon, 9 Jul 2012 16:35:16 +0000 Subject: [nvmewin] patch status In-Reply-To: <82C9F782B054C94B9FC04A331649C77A028EFA23@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704D06B@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028EFA23@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC2314704D0A4@corpmail1.na.ads.idt.com> Paul, OK. We can review the patch but I still would like to understand how to use the logical map to support the ability to tolerate create and delete. Please have a write up on this when you send out the patch. Thanks -Kwok From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 09, 2012 9:29 AM To: Kong, Kwok; nvmewin at lists.openfabrics.org Subject: RE: patch status Hi Kwok- To be clear, I'm just talking about having the driver logical map (the lun extension array) support the ability to tolerate create and delete, I'm not going to propose the actual create and delete functions themselves as those are out of scope of the spec. Given that, I'll send the patch out w/the changes included and if anyone is uncomfortable with them I can remove them fairly easily. I know this isn't what you're asking for but with limited time and the changes not being very extensive I'd like to proceed this way. Again, I'll pull them if the group doesn't like what they see but I'm nearly done right now as it is :) Thx Paul From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 09, 2012 9:26 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: patch status Paul, The ability to handle NS creations and deletions is a new feature to the driver. I would like to separate this features from the rest of the patch. Would you please send out your write up on how you want to handle the NS creation and deletion such that we can review it first before you send out the patch? It is a good idea to support the NS creation and deletion but we should review your proposal first before the patch. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Sunday, July 08, 2012 12:10 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] patch status Wanted to provide a quick update: I'm nearly ready with the latest patch and due to timing its going to include another key feature - the ability to handle NS creations and deletions. The mechanism by which a NS comes and goes is beyond the scope of the spec so I won't be including the details of what we're doing at Intel to implement that however the driver framework to handle a dynamic set of NS's is independent of that mechanism and there's no reason not to push that to the community so we can all use it. It consists of modifications to the LunExt array and the associated counters and components that read/write/use it. Details will be in the right up when I send it out. I have some travel this week so once again I'll target later in the week for sending it. Following its publication some changes will be needed (or should be done I should say) for the format PT IOCTL to take advantage of the new capabilities but rather than bundle even more into this patch I'll hold off on that for another time.... -Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Jul 9 09:41:46 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 9 Jul 2012 16:41:46 +0000 Subject: [nvmewin] patch status In-Reply-To: <05CD7821AE397547A01AC160FBC2314704D0A4@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A028EF20D@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704D06B@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028EFA23@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704D0A4@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A028EFAA4@FMSMSX106.amr.corp.intel.com> You got it Kwok, will do. Thanks! Paul From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 09, 2012 9:35 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: patch status Paul, OK. We can review the patch but I still would like to understand how to use the logical map to support the ability to tolerate create and delete. Please have a write up on this when you send out the patch. Thanks -Kwok From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 09, 2012 9:29 AM To: Kong, Kwok; nvmewin at lists.openfabrics.org Subject: RE: patch status Hi Kwok- To be clear, I'm just talking about having the driver logical map (the lun extension array) support the ability to tolerate create and delete, I'm not going to propose the actual create and delete functions themselves as those are out of scope of the spec. Given that, I'll send the patch out w/the changes included and if anyone is uncomfortable with them I can remove them fairly easily. I know this isn't what you're asking for but with limited time and the changes not being very extensive I'd like to proceed this way. Again, I'll pull them if the group doesn't like what they see but I'm nearly done right now as it is :) Thx Paul From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 09, 2012 9:26 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: patch status Paul, The ability to handle NS creations and deletions is a new feature to the driver. I would like to separate this features from the rest of the patch. Would you please send out your write up on how you want to handle the NS creation and deletion such that we can review it first before you send out the patch? It is a good idea to support the NS creation and deletion but we should review your proposal first before the patch. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Sunday, July 08, 2012 12:10 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] patch status Wanted to provide a quick update: I'm nearly ready with the latest patch and due to timing its going to include another key feature - the ability to handle NS creations and deletions. The mechanism by which a NS comes and goes is beyond the scope of the spec so I won't be including the details of what we're doing at Intel to implement that however the driver framework to handle a dynamic set of NS's is independent of that mechanism and there's no reason not to push that to the community so we can all use it. It consists of modifications to the LunExt array and the associated counters and components that read/write/use it. Details will be in the right up when I send it out. I have some travel this week so once again I'll target later in the week for sending it. Following its publication some changes will be needed (or should be done I should say) for the format PT IOCTL to take advantage of the new capabilities but rather than bundle even more into this patch I'll hold off on that for another time.... -Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Thu Jul 12 16:01:39 2012 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 12 Jul 2012 23:01:39 +0000 Subject: [nvmewin] Question on PciBar pointer... Message-ID: <2D98093777D3FD46A36253F35FE9D69346BAEDAB@ORSMSX101.amr.corp.intel.com> What is: PVOID pPciBar[MAX_PCI_BAR]; Supposed to be used for in the driver? I did a search on this variable and I only find it declared and not used. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Fri Jul 13 10:06:35 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 13 Jul 2012 17:06:35 +0000 Subject: [nvmewin] ***UNCHECKED*** latest patch Message-ID: <82C9F782B054C94B9FC04A331649C77A028F6510@FMSMSX106.amr.corp.intel.com> Here's the latest patch (cumulative with the last one) pw is intel123. Again, many minor changes to keep things clean and consistent with our coding guidelines so don't let the # of changes frighten you. I'm going to ask for faster turnaround time on this because that's one of the reasons I had to put a patch on top of patch, right now we're moving quickly on my end (making updates) so if I wait two weeks I'm just pilling up more stuff for next time. So, review board folks, please make an effort to provide feedback before the end of next week. I'll schedule a meeting to cover Q&A for Fri sometime. Wrt preparing for create/delete NS, there's actually very little that ended up in here. We've done our own implementation at Intel (since its not defined in the spec) and really all of the complexity will remain in our private branch which is good news for the OFA driver. It's a shame that the spec doesn't cover NS management but when it does we'll tackle it. For now, here's the high level of what changed in this patch and then below are the details: Preparing for create/delete: - report luns was changed to handle the case where lun IDs might not be sequential. This does no harm for the driver without create/delete NS but may/may not be needed depending on how create/delete NS is implemented - during enum, and in a few other places, the lunExt array is not indexed by a sequential count but instead a count of the number of NS discovered that are not hidden (in other words, we're not putting hidden NS's in the lunExt array) - changed init state machine to do get NS, get/set features together before moving to the next NS Other: - fixed a timeout issue in passive init - fixed a missing startio sync lock when DPC routines are trying to do IO - removed an un-needed MSI spinlock in the DPC handler - removed extra CAP register from the devExt - fixed small math bug in findAdapter - addition of history buffers (inline trace) for debug - perf opt for the case where the device has fewer queues than there are cores in the machine Todo: - the format NVM command was not tested and is likely broken right now (slightly). We'll be updating that code to leverage some of the new mechanisms added here in the next few weeks and, Alex, we may be consulting with you on this one. Details Nvmeinit.c - changed retValue of NVMeMapCore2Queue to be consistent with other functions called there - misc variable name changes (per coding guidelines) - had previously stored an extra copy of the CAP reg in the devExt, not needed becase the ctrl registers are already stored so there are updaters where that was used - in NVMeSetFeaturesCompletion: - some name changes - removed all of the code where we processed range types and left it for now so that we only pay attention to visibility and readOnly (not rangetype) and we never issue a setFeatures - there's also an init state machine change in here that will be needed by anyone wanting to Support create/delete NS in the future. Before we did all of the ID NS commands and then All of the gets which isn't enough info to build our map because after ID NS you don't know If you need to save the NS or not and if the create/delete scheme results in holes (it may or may not It depends on the scheme that the spec chooses) then you're in trouble. The soln is to have the machine Perform ID NS, then get/set for that NS and then move on to the next ID NS and so on. Small change, Long explanation J There are also changes to support this in the InitCallback - update to learning mode callback not to consider a read error on learning mode as fatal, let the driver attempt to Continue in less than optimal mode and let a host driven read result in driver failure - new parm added to procession calls, covered later - small change(s) where the LunExt table is indexed by NSID, changed to index it by a new devExt variable that counts the number of visible namespaces - again needed in the event that holes are introduced in the NSID map in the future - in NVMeAllocIoQueues() we used to drop to one QP in the event that there were more cores than we had QPs based on what the device supported. It was a small change to make this condition less performance limiting, see the code for details as its fairly straightforward. Makes us more flexible and also shortens the routine a bit. Also fixed a bug here where some dbl buffer memory wasn't being free in the error case - moved NVMeIssueCmd() from this file to the io file (makes a lot more sense over there) Nvmeio.c - NVMeIssueCmd moved to here. Also, new parm added to determine if the startio spinlock is needed or not. Before we had 6 or so callers from DPC that didn't sync with startIO which was a bug. Callers from DPC needing to send IO must synchronize with startIO - Implemented try/finally clause in NVMeIssueCmd simply because mechanically it made sense given the new startIO lock being taken at the top and the number of exits we have in the middle of the routine. Nvmesnti.c - in SntiTranslateInquiry we don't check the exposenamespace element of the lunExt anymore as no hidden nsmaespaces will ever be there anyway - SntiTranslateReportLuns - changes to support potential holes in the LunExt in the event that they are introduces as part of a create/delete NS scheme. Instead of listing how many we have sequentially, we now look through the entire list and only include that that exist in the return map - GetLunExtension - now checks the slot status of the lun being looked at. New status introduced of ONLINE, OFFLINE and FREE. Online and free should be clear, OFFLINE can be used for things like format where we need to make sure that lunID doesn't get "taken" by a potential addition while we're using it so its sorta like a reservation on the lun ID Nvmestat.c - besides namechanges and a few prints, the only change here is in one of the criteria for learning mode, since we're allowing a core/QP mismatch now the check to see if queues == 1 isn't valid anymore, we need to disable learning if its less than the # of active cores (learning makes no sense in that case, queues are being shared by multiple cores) Nvmestd.c - introduction of HISTORY_BUFFERS. For debug only and can be extremely handy. Self explanatory - basically inline trace capability. - NVMeFindAdapter: removed the extra CAP in the devExt, fixed an operator bug (was using ^ as to the power of) - NVMePassiveInitialize: the gross timeout value here didn't account for the fact that the wait on RDY in the state machine itself has a variable timeout based on what the card tells us in the CAP register. So, the passiveInit timeout has to be longer than that for sure, so changed it so that it is - IoCompletionDpcRoutine: must have been a misunderstanding in the original coding of this (I recall there was some confusion) but we do not need to take the MSI lock here, it raised us back to DIRQL! The MSI lock is taken by storport for us when it calls the ISR and then released when the ISR is released. These API to get it are for cases where we need to sync with the ISR, its not meant to be called from the DPC as a regular thing. We are protected by the DPC spinlock - windows will synchronize DPCs running on the same core and taking the lock synchronizes our DPC cross-core. Taking the MSI lock was just a performance impact with no value add. - fixed bug in RecoveryDpcRoutine, we were missing a call to unfreeze storport queues ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source.zip Type: application/x-zip-compressed Size: 162984 bytes Desc: source.zip URL: From paul.e.luse at intel.com Fri Jul 13 10:09:08 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 13 Jul 2012 17:09:08 +0000 Subject: [nvmewin] Q&A for latest patch - approval for pushing it Message-ID: <82C9F782B054C94B9FC04A331649C77A028F6581@FMSMSX106.amr.corp.intel.com> Friday, July 20, 2012, 08:00 AM US Arizona Time 916-356-2663, 8-356-2663, Bridge: 3, Passcode: 2773570 Live Meeting: https://webjoin.intel.com/?passcode=2773570 Speed dialer: inteldialer://3,2773570 | Learn more -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1696 bytes Desc: not available URL: From paul.e.luse at intel.com Wed Jul 18 05:43:41 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Wed, 18 Jul 2012 12:43:41 +0000 Subject: [nvmewin] IO queue memory Message-ID: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Wed Jul 18 06:36:11 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Wed, 18 Jul 2012 13:36:11 +0000 Subject: [nvmewin] IO queue memory Message-ID: <82C9F782B054C94B9FC04A331649C77A02903C2F@FMSMSX106.amr.corp.intel.com> So apparently with this API non-cached does not imply non-paged which I suppose makes sense as the cache mode here refers to CPU cache. I don't see a way to get non-paged NUMA node specific contiguous memory at all actually, anyone? From: Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Thu Jul 19 13:04:34 2012 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Thu, 19 Jul 2012 20:04:34 +0000 Subject: [nvmewin] Question on PciBar pointer... In-Reply-To: <2D98093777D3FD46A36253F35FE9D69346BAEDAB@ORSMSX101.amr.corp.intel.com> References: <2D98093777D3FD46A36253F35FE9D69346BAEDAB@ORSMSX101.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D69346BC278F@ORSMSX101.amr.corp.intel.com> So I am assuming the silence means the code pointed out in the last email (bottom of email) is in fact, useless dead-code? I at least believe this is the case because in NVMeFindAdapter() it looks like one can access the BAR memory-mapped space via (example): pMM_Range = NULL; pMM_Range = &(*(pPCI->AccessRanges))[0]; if (pMM_Range == NULL) { return (SP_RETURN_NOT_FOUND); } /* Mapping BAR memory to the virtual address of Control registers */ pAE->pCtrlRegister = (PNVMe_CONTROLLER_REGISTERS)StorPortGetDeviceBase(pAE, pPCI->AdapterInterfaceType, pPCI->SystemIoBusNumber, pMM_Range->RangeStart, pMM_Range->RangeLength, FALSE); So I am assuming if one were to add code for a BAR2-BAR5, this code shown above is basically how it is going to be initialized. Also a new member variable in pAE will be needed to point to this new BAR. Right? From: Freyensee, James P Sent: Thursday, July 12, 2012 4:02 PM To: nvmewin at lists.openfabrics.org Subject: Question on PciBar pointer... What is: PVOID pPciBar[MAX_PCI_BAR]; Supposed to be used for in the driver? I did a search on this variable and I only find it declared and not used. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 24 13:29:58 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 24 Jul 2012 20:29:58 +0000 Subject: [nvmewin] IO queue memory In-Reply-To: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0290F720@FMSMSX106.amr.corp.intel.com> Took a week for this to make it out to the list.... odd. We talked about this already and I'm still investigating (time has not permitted) the scenario where it appears we're getting paged pool memory - have confirmed w/Msft contact that we shouldn't be From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 24 13:32:44 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 24 Jul 2012 20:32:44 +0000 Subject: [nvmewin] Question on PciBar pointer... In-Reply-To: <2D98093777D3FD46A36253F35FE9D69346BC278F@ORSMSX101.amr.corp.intel.com> References: <2D98093777D3FD46A36253F35FE9D69346BAEDAB@ORSMSX101.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D69346BC278F@ORSMSX101.amr.corp.intel.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0290F783@FMSMSX106.amr.corp.intel.com> Something seems to be wrong with the mail server - lots of email just now showing up. Yes, that last line down there was not used and is removed in the pending patch. Yes, you're in the right place to add mapping of addt'l BARs if your hw has them. Let me know if you have problems getting it working - as NVMe doesn't use the other BARs we wouldn't be looking for that code to be contributed back of course... Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Thursday, July 19, 2012 1:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Question on PciBar pointer... So I am assuming the silence means the code pointed out in the last email (bottom of email) is in fact, useless dead-code? I at least believe this is the case because in NVMeFindAdapter() it looks like one can access the BAR memory-mapped space via (example): pMM_Range = NULL; pMM_Range = &(*(pPCI->AccessRanges))[0]; if (pMM_Range == NULL) { return (SP_RETURN_NOT_FOUND); } /* Mapping BAR memory to the virtual address of Control registers */ pAE->pCtrlRegister = (PNVMe_CONTROLLER_REGISTERS)StorPortGetDeviceBase(pAE, pPCI->AdapterInterfaceType, pPCI->SystemIoBusNumber, pMM_Range->RangeStart, pMM_Range->RangeLength, FALSE); So I am assuming if one were to add code for a BAR2-BAR5, this code shown above is basically how it is going to be initialized. Also a new member variable in pAE will be needed to point to this new BAR. Right? From: Freyensee, James P Sent: Thursday, July 12, 2012 4:02 PM To: nvmewin at lists.openfabrics.org Subject: Question on PciBar pointer... What is: PVOID pPciBar[MAX_PCI_BAR]; Supposed to be used for in the driver? I did a search on this variable and I only find it declared and not used. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.freyensee at intel.com Tue Jul 24 14:42:41 2012 From: james.p.freyensee at intel.com (Freyensee, James P) Date: Tue, 24 Jul 2012 21:42:41 +0000 Subject: [nvmewin] Question on PciBar pointer... In-Reply-To: <82C9F782B054C94B9FC04A331649C77A0290F783@FMSMSX106.amr.corp.intel.com> References: <2D98093777D3FD46A36253F35FE9D69346BAEDAB@ORSMSX101.amr.corp.intel.com> <2D98093777D3FD46A36253F35FE9D69346BC278F@ORSMSX101.amr.corp.intel.com> <82C9F782B054C94B9FC04A331649C77A0290F783@FMSMSX106.amr.corp.intel.com> Message-ID: <2D98093777D3FD46A36253F35FE9D69346BC7B80@ORSMSX101.amr.corp.intel.com> Thanks for the reply...yah, I sent this email a while ago and just now got it...strange... From: Luse, Paul E Sent: Tuesday, July 24, 2012 1:33 PM To: Freyensee, James P; nvmewin at lists.openfabrics.org Subject: RE: Question on PciBar pointer... Something seems to be wrong with the mail server - lots of email just now showing up. Yes, that last line down there was not used and is removed in the pending patch. Yes, you're in the right place to add mapping of addt'l BARs if your hw has them. Let me know if you have problems getting it working - as NVMe doesn't use the other BARs we wouldn't be looking for that code to be contributed back of course... Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P Sent: Thursday, July 19, 2012 1:05 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Question on PciBar pointer... So I am assuming the silence means the code pointed out in the last email (bottom of email) is in fact, useless dead-code? I at least believe this is the case because in NVMeFindAdapter() it looks like one can access the BAR memory-mapped space via (example): pMM_Range = NULL; pMM_Range = &(*(pPCI->AccessRanges))[0]; if (pMM_Range == NULL) { return (SP_RETURN_NOT_FOUND); } /* Mapping BAR memory to the virtual address of Control registers */ pAE->pCtrlRegister = (PNVMe_CONTROLLER_REGISTERS)StorPortGetDeviceBase(pAE, pPCI->AdapterInterfaceType, pPCI->SystemIoBusNumber, pMM_Range->RangeStart, pMM_Range->RangeLength, FALSE); So I am assuming if one were to add code for a BAR2-BAR5, this code shown above is basically how it is going to be initialized. Also a new member variable in pAE will be needed to point to this new BAR. Right? From: Freyensee, James P Sent: Thursday, July 12, 2012 4:02 PM To: nvmewin at lists.openfabrics.org Subject: Question on PciBar pointer... What is: PVOID pPciBar[MAX_PCI_BAR]; Supposed to be used for in the driver? I did a search on this variable and I only find it declared and not used. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Tue Jul 24 15:39:01 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 24 Jul 2012 22:39:01 +0000 Subject: [nvmewin] IO queue memory In-Reply-To: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D3601C58A@corpmail1.na.ads.idt.com> Hi Paul, Have you confirm that the IO queue memory the driver allocates can be paged out? You brought the issue up last week and nobody could confirm that. According to the link below, StorPortAllocateContiguousMemorySpecifyCacheNode allocates a range of physically contiguous, noncached, nonpaged memory. http://msdn.microsoft.com/en-us/library/ff567027(v=vs.85) Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 24 15:44:38 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 24 Jul 2012 22:44:38 +0000 Subject: [nvmewin] IO queue memory In-Reply-To: <548C5470AAD9DA4A85D259B663190D3601C58A@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601C58A@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0290F98C@FMSMSX106.amr.corp.intel.com> Hi Alex, I'm aware of what the docs say but I have a BSOD (with verifier on) that claims that an address in the range of our large chunk of queue memory is paged memory. Its accessed when we look for pending commands when being shutdown via the adapterControl entry which runs at DIRQL. With verifier on, its supposed to throw regardless of whether the memory is actually paged out or not but simply based on whether its capable of being paged out. I have a request from Msft to send them the DMP which I plan on doing this week. Will keep you all posted. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 24, 2012 3:39 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Paul, Have you confirm that the IO queue memory the driver allocates can be paged out? You brought the issue up last week and nobody could confirm that. According to the link below, StorPortAllocateContiguousMemorySpecifyCacheNode allocates a range of physically contiguous, noncached, nonpaged memory. http://msdn.microsoft.com/en-us/library/ff567027(v=vs.85) Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Tue Jul 24 15:58:30 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 24 Jul 2012 22:58:30 +0000 Subject: [nvmewin] IO queue memory In-Reply-To: <82C9F782B054C94B9FC04A331649C77A0290F98C@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601C58A@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0290F98C@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D3601C5A5@corpmail1.na.ads.idt.com> Hi Paul, Thanks for the explanation. When being shutdown, I am not sure if it is really necessary to look for pending commands in "normal" shutdown cases. Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Tuesday, July 24, 2012 3:45 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Alex, I'm aware of what the docs say but I have a BSOD (with verifier on) that claims that an address in the range of our large chunk of queue memory is paged memory. Its accessed when we look for pending commands when being shutdown via the adapterControl entry which runs at DIRQL. With verifier on, its supposed to throw regardless of whether the memory is actually paged out or not but simply based on whether its capable of being paged out. I have a request from Msft to send them the DMP which I plan on doing this week. Will keep you all posted. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 24, 2012 3:39 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Paul, Have you confirm that the IO queue memory the driver allocates can be paged out? You brought the issue up last week and nobody could confirm that. According to the link below, StorPortAllocateContiguousMemorySpecifyCacheNode allocates a range of physically contiguous, noncached, nonpaged memory. http://msdn.microsoft.com/en-us/library/ff567027(v=vs.85) Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 24 16:09:42 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 24 Jul 2012 23:09:42 +0000 Subject: [nvmewin] IO queue memory In-Reply-To: <548C5470AAD9DA4A85D259B663190D3601C5A5@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A02901BD5@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601C58A@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0290F98C@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D3601C5A5@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0290FA8E@FMSMSX106.amr.corp.intel.com> Agree but that's not the focus, just how I got to this BSOD - the point is if the memory is actually paged pool then we have serious problems as the device DMAs in/out of this area.... From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 24, 2012 3:59 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Paul, Thanks for the explanation. When being shutdown, I am not sure if it is really necessary to look for pending commands in "normal" shutdown cases. Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Tuesday, July 24, 2012 3:45 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Alex, I'm aware of what the docs say but I have a BSOD (with verifier on) that claims that an address in the range of our large chunk of queue memory is paged memory. Its accessed when we look for pending commands when being shutdown via the adapterControl entry which runs at DIRQL. With verifier on, its supposed to throw regardless of whether the memory is actually paged out or not but simply based on whether its capable of being paged out. I have a request from Msft to send them the DMP which I plan on doing this week. Will keep you all posted. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, July 24, 2012 3:39 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: IO queue memory Hi Paul, Have you confirm that the IO queue memory the driver allocates can be paged out? You brought the issue up last week and nobody could confirm that. According to the link below, StorPortAllocateContiguousMemorySpecifyCacheNode allocates a range of physically contiguous, noncached, nonpaged memory. http://msdn.microsoft.com/en-us/library/ff567027(v=vs.85) Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Wednesday, July 18, 2012 5:44 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] IO queue memory Discussion point I wanted to get some input on: Memory type: When we designed this, we chose cached memory for our IO queues because we don't have to worry about DMA coherency with IA anymore however the implication here is that our queues can now be paged out which I don't think we want for performance reasons. Also, if we don't decide to switch to non-paged for that reason we need to rework (minor) our shutdown code which is touching IO queue memory at DIRQL which, of course, you can't do. I think for the paging reason alone we should consider non cached allocations for the IO queues. Other thoughts? We may want to also think about a different strategy for IO queue sizing as well, if we switch to non cached, to be a little more accurate/conservative with how much memory we're using based on the current config. Right now, for example, on a 32 core system we'll use 2MB of memory just for IO queues. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Jul 24 16:57:40 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 24 Jul 2012 23:57:40 +0000 Subject: [nvmewin] paged pool access - IO queues Message-ID: <82C9F782B054C94B9FC04A331649C77A0290FBE5@FMSMSX106.amr.corp.intel.com> Can't say as I have a full explanation for this but the scenario I described earlier appears to only happen when using the platform emulation code where we have the device simulator that we used for early development (qemu). When I run the same thing on real hardware I don't see the issue. I don't plan on following up on this further unless it shows up later on real hardware. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.busch at intel.com Mon Jul 2 14:07:30 2012 From: keith.busch at intel.com (Busch, Keith) Date: Mon, 02 Jul 2012 21:07:30 -0000 Subject: [nvmewin] NVMe driver behaviour on LBA Range In-Reply-To: <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> References: <05CD7821AE397547A01AC160FBC2314704A798@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A028E95CC@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC2314704A8AA@corpmail1.na.ads.idt.com> Message-ID: KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? Overlapping LBA ranges would be a spec violation. -----Original Message----- From: linux-nvme-bounces at lists.infradead.org [mailto:linux-nvme-bounces at lists.infradead.org] On Behalf Of Kong, Kwok Sent: Monday, July 02, 2012 2:48 PM To: Luse, Paul E; Matthew Wilcox Cc: nvmewin at lists.openfabrics.org; linux-nvme at lists.infradead.org Subject: RE: NVMe driver behaviour on LBA Range Paul, Please see below for my comment -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, July 02, 2012 12:36 PM To: Kong, Kwok; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: RE: NVMe driver behaviour on LBA Range Kwok- See below for my thoughts Thx Paul -----Original Message----- From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Monday, July 02, 2012 11:32 AM To: Luse, Paul E; Matthew Wilcox Cc: linux-nvme at lists.infradead.org; nvmewin at lists.openfabrics.org Subject: NVMe driver behaviour on LBA Range Paul and Matthew, Both Windows and Linux driver should behave the same with LBA range data. Before we add the support for the LBA range in Windows driver, I would like to get your opinion and agreement on what we should be doing in the driver. PL> For sure. Currently I believe the windows driver is handling this incorrectly and am coincidentally (to your email) working on fixing that now and will propose a patch shortly (that can be discussed of course if we don't like where its heading). This is my understanding and please let me know if you agree: 1. By default (when a Set Feature - LBA Range has not been issued to a drive), a get feature - LBA range should return - Number of LBA Range (NUM) = 0 (means 1 entry) - Type = 0x00 (reserved) - Attributes - 0x01 (Read/writeable, not hidden from the OS) - Starting LBA = 0 - Number of Logical Blocks (NLB) = total number of logical blocks in this namespace. - This should have the same size as the Namespace Size (NSZE) as returned by Identify Namespace. - Unique Identifier (GUID) = ??? what should this be ? Should the driver care ? PL> The definition of "by default" totally depends on the manufacturer of the device. The case you mention above is what I would call the "reserved" case where the driver should not do anything with the LBA range. It should not expose it to upper layers and it should not send any more commands to it. At this point the manageability tools provided by whomever should be relied upon to have the smarts to use PT commands to determine that the LBA range needs to be configured and configure it accordingly. Once that's done, the driver will see it as 'configured' the next time (whether the tool submits an IOCTL to rescan, requires a reboot, whatever). KK> My understanding is that the "standard" driver is not going to interpret the LBA type. It is not going to do anything special whether it is "Reserved", "Filesystem", "RAID" or others. The standard driver only looks at the attributes for Read only or hidden. I think the driver should export all ranges. I think we probably should have allowed the device to report 0 entry before any set - LBA range command. When there is no entry, the LBA range is not used. 2. When the driver get the default LBA range, it "exports" "NSZE" of LBA to the OS. PL> See above 3. What happen if the total size LBA as reported by LBA range does not match the Namespace size as reported by Identify Namespace ? Should the driver "export" the size as reported by Identify Namespace (NSZE) or LBA Range ? I think it should be "NSZE" and not as reported by LBA range. What do you think ? PL> I believe the correct driver operation is to report the size reported by the LBA range and not the NS. The reason is because it (LBA range # blocks) refers to the actual LBA range and the NSZE refers to the entire NS. Perhaps the vendor has a reason for not exposing the entire NSZE to the host and thus has defined an LBA range (single) that is smaller than the NSZE. KK> I still think the driver should report the size "NSZE". 4. When there are multiple entries in the LBA range, the driver still exports this namespace with size "NSZE" as a single "LUN" with size as reported in "NSZE" except when there are ranges with "Hidden" attribute. PL> So I'm not quite sure what you are asking or proposing on this one. Currently the windows driver doesn't support multiple ranges per NS. If we want to change that it sounds like you are proposing (or the ECN is) that we report each LBA range as its own tgt or are you saying that all non-hidden LNUS should be exposed as a single tgt?? Sorry, I'm confused :) KK> The driver should only export a single tgt for a Namespace. If there are multiple LBA ranges, then the driver still export a single tgt. I understand that current Windows driver does not support multiple ranges per NS. What should the driver do if there are multiple ranges ? ECN 25 describes the handling the hidden LBA. "The host storage driver should expose all LBA ranges that are not set to be hidden from the OS / EFI / BIOS in the Attributes field. All LBA ranges that follow a hidden range shall also be hidden; the host storage driver should not expose subsequent LBA ranges that follow a hidden LBA range" The number of logical blocks that are hidden from the OS must be deducted from "NSZE" before exporting this namespace to the OS. In this case, the size is smaller than "NSZE". 5. When there are one or more ranges with attribute = 0 (Read Only), the driver needs to keep track of these ranges internally. The driver must return an error when there is a write request to these Read only LBA ranges. PL> Agree KK> I think the LBA Range causes a lot of confusions. What happen if there are multiple ranges and the ranges overlaps ? I think we should set up a call to discuss this. Should we raise this in the NVMe WG to ask the expected behavior ? I would also like to get Matthew's opinion such that both the Windows and Linux driver behave the same. Please let me know what you think. Thanks -Kwok _______________________________________________ Linux-nvme mailing list Linux-nvme at lists.infradead.org http://merlin.infradead.org/mailman/listinfo/linux-nvme From raymond.c.robles at intel.com Fri Jul 27 14:22:04 2012 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Fri, 27 Jul 2012 21:22:04 +0000 Subject: [nvmewin] NVMe Windows DB is LOCKED - Pushing latest patch from Paul Luse (misc. bug fixes and enum fixes) Message-ID: <49158E750348AA499168FD41D88983600F1FA549@FMSMSX105.amr.corp.intel.com> Locking the NVMe Windows DB. Thanks, Ray [Description: cid:image001.png at 01CB3870.4BB88E70] Raymond C. Robles Attached Platform Storage Software Datacenter Software Division Intel Corporation Desk: 480.554.2600 Mobile: 480.399.0645 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1756 bytes Desc: image001.png URL: From raymond.c.robles at intel.com Fri Jul 27 14:40:48 2012 From: raymond.c.robles at intel.com (Robles, Raymond C) Date: Fri, 27 Jul 2012 21:40:48 +0000 Subject: [nvmewin] NVMe Windows DB is UNLOCKED - Pushing latest patch from Paul Luse (misc. bug fixes and enum fixes) Message-ID: <49158E750348AA499168FD41D88983600F1FA5AD@FMSMSX105.amr.corp.intel.com> Latest patch by Intel has been pushed to the trunk. And, as always, I've created a tag for the latest push (misc_bug_fixes_and_enum_fixes) If anyone has any questions, please free to contact me. Thanks, Ray From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C Sent: Friday, July 27, 2012 2:22 PM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] NVMe Windows DB is LOCKED - Pushing latest patch from Paul Luse (misc. bug fixes and enum fixes) Locking the NVMe Windows DB. Thanks, Ray [Description: cid:image001.png at 01CB3870.4BB88E70] Raymond C. Robles Attached Platform Storage Software Datacenter Software Division Intel Corporation Desk: 480.554.2600 Mobile: 480.399.0645 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 1756 bytes Desc: image001.png URL: