From raymond.c.robles at intel.com  Thu Sep  6 17:25:05 2012
From: raymond.c.robles at intel.com (Robles, Raymond C)
Date: Fri, 7 Sep 2012 00:25:05 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and
 others) --- Intel Feedback
Message-ID: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>

Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?


-          nvmeInit.c:

o    NVMeResetAdapter:

§  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

o    NVMeNormalShutdown:

§  Same comment as above for reset adapter (same check is performed here).

§  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

o    NvmeCheckPendingCpl:

§  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

§  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/e1012169/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/e1012169/attachment.png>

From Rick.Knoblaugh at lsi.com  Thu Sep  6 18:00:15 2012
From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick)
Date: Thu, 6 Sep 2012 19:00:15 -0600
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
Message-ID: <4565AEA676113A449269C2F3A549520FCFC07DCF@cosmail03.lsi.com>

Hi Alex,
                We were also curious about the structure change Ray mentioned. No worries on the NvmeCheckPendingCpl, as I see you are checking completion queue for newly completed entries -- our routine is for a different purpose, as it checks submission queue for commands that are pending.

                Thanks,

                             -Rick


From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org; 'Chang, Alex (Alex.Chang at idt.com)'
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?


-          nvmeInit.c:

o    NVMeResetAdapter:

§  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

o    NVMeNormalShutdown:

§  Same comment as above for reset adapter (same check is performed here).

§  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

o    NvmeCheckPendingCpl:

§  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

§  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Thanks,
Ray

[cid:image001.png at 01CD8C59.42BDDED0]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120906/99fc9af6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120906/99fc9af6/attachment.png>

From Alex.Chang at idt.com  Thu Sep  6 18:06:46 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Fri, 7 Sep 2012 01:06:46 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.


-          nvmeInit.c:

o    NVMeResetAdapter:

§  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

§  Same comment as above for reset adapter (same check is performed here).

§  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

§  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

§  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/9d82a21e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/9d82a21e/attachment.png>

From Alex.Chang at idt.com  Thu Sep  6 18:09:21 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Fri, 7 Sep 2012 01:09:21 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <4565AEA676113A449269C2F3A549520FCFC07DCF@cosmail03.lsi.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<4565AEA676113A449269C2F3A549520FCFC07DCF@cosmail03.lsi.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B806D7@corpmail1.na.ads.idt.com>

Hi Rick,

I had commented the message from Raymond. Thanks a lot for your understanding about NVMeCheckPendingCpl.

Regards,
Alex

________________________________
From: Knoblaugh, Rick [mailto:Rick.Knoblaugh at lsi.com]
Sent: Thursday, September 06, 2012 6:00 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org; Chang, Alex
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Alex,
                We were also curious about the structure change Ray mentioned. No worries on the NvmeCheckPendingCpl, as I see you are checking completion queue for newly completed entries -- our routine is for a different purpose, as it checks submission queue for commands that are pending.

                Thanks,

                             -Rick


From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org; 'Chang, Alex (Alex.Chang at idt.com)'
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?


-          nvmeInit.c:

o    NVMeResetAdapter:

§  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

o    NVMeNormalShutdown:

§  Same comment as above for reset adapter (same check is performed here).

§  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

o    NvmeCheckPendingCpl:

§  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

§  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/90430bae/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/90430bae/attachment.png>

From Kwok.Kong at idt.com  Thu Sep  6 18:28:05 2012
From: Kwok.Kong at idt.com (Kong, Kwok)
Date: Fri, 7 Sep 2012 01:28:05 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
Message-ID: <05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f106d74b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f106d74b/attachment.png>

From Alex.Chang at idt.com  Fri Sep  7 08:58:35 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Fri, 7 Sep 2012 15:58:35 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/d8e7c87e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/d8e7c87e/attachment.png>

From Arpit.Patel at lsi.com  Fri Sep  7 12:07:00 2012
From: Arpit.Patel at lsi.com (Patel, Arpit)
Date: Fri, 7 Sep 2012 13:07:00 -0600
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
Message-ID: <217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>

Hi Guys,
Another thing I noticed, and this is not part of Alex's changes but was there from early days, is the following in NVMeAERCompletionRoutine -
The logPage value is derived using AssociatedLogPage, instead shouldn't it be the value returned by AsynchronousEventType field as defined in Fig 30 of 1.0c spec? Also, the #defines for these event type is confused with Log Page Identifier field of Get Log page.
Am I misinterpreting the spec or is it a bug?

Thanks
Arpit.

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[cid:image001.png at 01CD8CED.EBB703B0]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/237bd327/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/237bd327/attachment.png>

From paul.e.luse at intel.com  Fri Sep  7 13:35:33 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Fri, 7 Sep 2012 20:35:33 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
Message-ID: <82C9F782B054C94B9FC04A331649C77A07B5100C@FMSMSX106.amr.corp.intel.com>

We just finished testing the AER code (recall it was never tested before) and it required several updates.  After Alex's patch is applied we'll push that next

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Patel, Arpit
Sent: Friday, September 07, 2012 12:07 PM
To: Kong, Kwok; Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Guys,
Another thing I noticed, and this is not part of Alex's changes but was there from early days, is the following in NVMeAERCompletionRoutine -
The logPage value is derived using AssociatedLogPage, instead shouldn't it be the value returned by AsynchronousEventType field as defined in Fig 30 of 1.0c spec? Also, the #defines for these event type is confused with Log Page Identifier field of Get Log page.
Am I misinterpreting the spec or is it a bug?

Thanks
Arpit.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f9fc10e7/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f9fc10e7/attachment.png>

From paul.e.luse at intel.com  Fri Sep  7 13:48:21 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Fri, 7 Sep 2012 20:48:21 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
Message-ID: <82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>

Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/ec16b391/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/ec16b391/attachment.png>

From Arpit.Patel at lsi.com  Fri Sep  7 13:50:35 2012
From: Arpit.Patel at lsi.com (Patel, Arpit)
Date: Fri, 7 Sep 2012 14:50:35 -0600
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B5100C@FMSMSX106.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
	<82C9F782B054C94B9FC04A331649C77A07B5100C@FMSMSX106.amr.corp.intel.com>
Message-ID: <217BF3CF80E93540B3049F95A676F09D015F87509A@cosmail01.lsi.com>

Thanks Paul.
Arpit.

From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Friday, September 07, 2012 1:36 PM
To: Patel, Arpit; Kong, Kwok; Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

We just finished testing the AER code (recall it was never tested before) and it required several updates.  After Alex's patch is applied we'll push that next

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Patel, Arpit
Sent: Friday, September 07, 2012 12:07 PM
To: Kong, Kwok; Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Guys,
Another thing I noticed, and this is not part of Alex's changes but was there from early days, is the following in NVMeAERCompletionRoutine -
The logPage value is derived using AssociatedLogPage, instead shouldn't it be the value returned by AsynchronousEventType field as defined in Fig 30 of 1.0c spec? Also, the #defines for these event type is confused with Log Page Identifier field of Get Log page.
Am I misinterpreting the spec or is it a bug?

Thanks
Arpit.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[cid:image001.png at 01CD8CFF.C176A620]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/57db02b9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/57db02b9/attachment.png>

From Alex.Chang at idt.com  Fri Sep  7 14:09:44 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Fri, 7 Sep 2012 21:09:44 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B807A1@corpmail1.na.ads.idt.com>

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/4ed8cbe8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/4ed8cbe8/attachment.png>

From paul.e.luse at intel.com  Fri Sep  7 14:10:27 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Fri, 7 Sep 2012 21:10:27 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <548C5470AAD9DA4A85D259B663190D3602B807A1@corpmail1.na.ads.idt.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B807A1@corpmail1.na.ads.idt.com>
Message-ID: <82C9F782B054C94B9FC04A331649C77A07B51168@FMSMSX106.amr.corp.intel.com>

Good stuff Alex, thanks for contributing!

From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f6d78088/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/f6d78088/attachment.jpg>

From Alex.Chang at idt.com  Fri Sep  7 16:50:55 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Fri, 7 Sep 2012 23:50:55 +0000
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance
	and	others)
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B51168@FMSMSX106.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B807A1@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B51168@FMSMSX106.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B80800@corpmail1.na.ads.idt.com>

Hi all,

I don't receive any more new feedbacks and assume everyone agrees the patch is good to go. Here comes the new sources after removing the RDY bit checking in NVMeResetAdapter. Password is idt123. Thanks again.

Regards,
Alex


________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Friday, September 07, 2012 2:10 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Good stuff Alex, thanks for contributing!

From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/b3d4bf0a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/b3d4bf0a/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_10c_new.zip
Type: application/x-zip-compressed
Size: 166232 bytes
Desc: sources_10c_new.zip
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120907/b3d4bf0a/attachment.bin>

From Alex.Chang at idt.com  Fri Sep  7 17:06:34 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Sat, 8 Sep 2012 00:06:34 +0000
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance
	and	others)
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B51168@FMSMSX106.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<548C5470AAD9DA4A85D259B663190D3602B8071D@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B510F1@FMSMSX106.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B807A1@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B51168@FMSMSX106.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B8080E@corpmail1.na.ads.idt.com>

Hi all,

I don't receive any more new feedbacks and assume everyone agrees the patch is good to go. Here comes the new sources after removing the RDY bit checking in NVMeResetAdapter. Password is idt123. Thanks again.

Regards,
Alex


________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Friday, September 07, 2012 2:10 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Good stuff Alex, thanks for contributing!

From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120908/221edf8f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image002.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120908/221edf8f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_10c_new.zip
Type: application/x-zip-compressed
Size: 166232 bytes
Desc: sources_10c_new.zip
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120908/221edf8f/attachment.bin>

From raymond.c.robles at intel.com  Mon Sep 10 13:42:46 2012
From: raymond.c.robles at intel.com (Robles, Raymond C)
Date: Mon, 10 Sep 2012 20:42:46 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
Message-ID: <49158E750348AA499168FD41D88983601807CA05@FMSMSX105.amr.corp.intel.com>

Arpit,

Just to add some clarification from you question/comment below... the AssociatedLogPage field is used to determine the log page that must be read in order to clear the event. The fields Asynchronous Event Information and Type indicate additional information about the event and should not be used to determine which log page the host must read to clear the event. From 1.0c - Figure 30:

[cid:image002.png at 01CD8F5A.28BC7380]


As for the defines, the value returned for the Associated Log Page will be the log page identifier. So, it is valid to use the log page identifiers.

Thanks,
Ray

From: Patel, Arpit [mailto:Arpit.Patel at lsi.com]
Sent: Friday, September 07, 2012 12:07 PM
To: Kong, Kwok; Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Guys,
Another thing I noticed, and this is not part of Alex's changes but was there from early days, is the following in NVMeAERCompletionRoutine -
The logPage value is derived using AssociatedLogPage, instead shouldn't it be the value returned by AsynchronousEventType field as defined in Fig 30 of 1.0c spec? Also, the #defines for these event type is confused with Log Page Identifier field of Get Log page.
Am I misinterpreting the spec or is it a bug?

Thanks
Arpit.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120910/de61e59b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120910/de61e59b/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 50026 bytes
Desc: image002.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120910/de61e59b/attachment-0001.png>

From Alex.Chang at idt.com  Tue Sep 11 12:21:08 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Tue, 11 Sep 2012 19:21:08 +0000
Subject: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance
 and others) --- Intel Feedback
In-Reply-To: <49158E750348AA499168FD41D88983601807CA05@FMSMSX105.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807C12D@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B806CA@corpmail1.na.ads.idt.com>
	<05CD7821AE397547A01AC160FBC231472E7F74AD@corpmail1.na.ads.idt.com>
	<217BF3CF80E93540B3049F95A676F09D015F875058@cosmail01.lsi.com>
	<49158E750348AA499168FD41D88983601807CA05@FMSMSX105.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B81AA5@corpmail1.na.ads.idt.com>

Hi Ray,

I think I missed out the changes you suggested below:
1. unsigned int --> ULONG
2. Line 1657 of nvmestd.c, remove CHATHAM1

Both of them are in nvmestd.c. I re-zipped the entire sources for you. Sorry for the inconvenience it may have caused.

Thanks,
Alex


________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Monday, September 10, 2012 1:43 PM
To: Patel, Arpit; Chang, Alex; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Arpit,

Just to add some clarification from you question/comment below... the AssociatedLogPage field is used to determine the log page that must be read in order to clear the event. The fields Asynchronous Event Information and Type indicate additional information about the event and should not be used to determine which log page the host must read to clear the event. From 1.0c - Figure 30:

[cid:639381919 at 11092012-05A8]


As for the defines, the value returned for the Associated Log Page will be the log page identifier. So, it is valid to use the log page identifiers.

Thanks,
Ray

From: Patel, Arpit [mailto:Arpit.Patel at lsi.com]
Sent: Friday, September 07, 2012 12:07 PM
To: Kong, Kwok; Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Guys,
Another thing I noticed, and this is not part of Alex's changes but was there from early days, is the following in NVMeAERCompletionRoutine -
The logPage value is derived using AssociatedLogPage, instead shouldn't it be the value returned by AsynchronousEventType field as defined in Fig 30 of 1.0c spec? Also, the #defines for these event type is confused with Log Page Identifier field of Get Log page.
Am I misinterpreting the spec or is it a bug?

Thanks
Arpit.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/df74b128/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 50026 bytes
Desc: image002.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/df74b128/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/df74b128/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_10c_new.zip
Type: application/x-zip-compressed
Size: 165635 bytes
Desc: sources_10c_new.zip
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/df74b128/attachment.bin>

From raymond.c.robles at intel.com  Tue Sep 11 15:30:54 2012
From: raymond.c.robles at intel.com (Robles, Raymond C)
Date: Tue, 11 Sep 2012 22:30:54 +0000
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance
 and others)
Message-ID: <49158E750348AA499168FD41D88983601807D132@FMSMSX105.amr.corp.intel.com>

Thank you Alex.

If nobody has any more feedback on Alex's changes (IDT) by EOD tomorrow (Wed. 9/12), then I'll push the patch.

Rick and Arpit - are you both ok with the last revision of Alex's changes?

Thanks,
Ray

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 5:07 PM
To: nvmewin at lists.openfabrics.org
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)

Hi all,

I don't receive any more new feedbacks and assume everyone agrees the patch is good to go. Here comes the new sources after removing the RDY bit checking in NVMeResetAdapter. Password is idt123. Thanks again.

Regards,
Alex


________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Good stuff Alex, thanks for contributing!

From: Chang, Alex [mailto:Alex.Chang at idt.com]<mailto:[mailto:Alex.Chang at idt.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/8f3f731b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/8f3f731b/attachment.jpg>

From Alex.Chang at idt.com  Tue Sep 11 15:51:49 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Tue, 11 Sep 2012 22:51:49 +0000
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance
 and others)
In-Reply-To: <49158E750348AA499168FD41D88983601807D15D@FMSMSX105.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807D132@FMSMSX105.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B81B0C@corpmail1.na.ads.idt.com>
	<49158E750348AA499168FD41D88983601807D15D@FMSMSX105.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B81B25@corpmail1.na.ads.idt.com>

Hi Ray,

Hope it would go thru this time...

Thanks,
Alex


________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Tuesday, September 11, 2012 3:44 PM
To: Chang, Alex
Subject: RE: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)

Hi Alex,
No I did not receive anything from the nvmwin mailing list from you. The last zip file I have from you came on the 7th last week.
Can you resend the zip to the distribution list?
Thanks,
Ray
From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Tuesday, September 11, 2012 3:42 PM
To: Robles, Raymond C
Subject: RE: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)
Hi Ray,
I just sent out another zip file earlier today. Did you receive it? It includes the changes you suggested.
Thanks,
Alex
________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]<mailto:[mailto:raymond.c.robles at intel.com]>
Sent: Tuesday, September 11, 2012 3:31 PM
To: Chang, Alex; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)
Thank you Alex.
If nobody has any more feedback on Alex's changes (IDT) by EOD tomorrow (Wed. 9/12), then I'll push the patch.
Rick and Arpit - are you both ok with the last revision of Alex's changes?
Thanks,
Ray
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 5:07 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)
Hi all,
I don't receive any more new feedbacks and assume everyone agrees the patch is good to go. Here comes the new sources after removing the RDY bit checking in NVMeResetAdapter. Password is idt123. Thanks again.
Regards,
Alex
________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Good stuff Alex, thanks for contributing!
From: Chang, Alex [mailto:Alex.Chang at idt.com]<mailto:[mailto:Alex.Chang at idt.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Paul,
I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.
Regards,
Alex
________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-
Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.
Thx
Paul
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.
Thanks,
Alex
________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,
Please see my embedded comment ...
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Hi Raymond,
Please see my comments in red...
Thanks,
Alex
________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,
Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.

-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.

-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.

-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.
Thanks,
Ray
[Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/10e3f4d6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/10e3f4d6/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sources_10c_new.zip
Type: application/x-zip-compressed
Size: 165635 bytes
Desc: sources_10c_new.zip
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/10e3f4d6/attachment.bin>

From Rick.Knoblaugh at lsi.com  Tue Sep 11 18:30:36 2012
From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick)
Date: Tue, 11 Sep 2012 19:30:36 -0600
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance
 and others)
In-Reply-To: <49158E750348AA499168FD41D88983601807D132@FMSMSX105.amr.corp.intel.com>
References: <49158E750348AA499168FD41D88983601807D132@FMSMSX105.amr.corp.intel.com>
Message-ID: <4565AEA676113A449269C2F3A549520FCFC0865A@cosmail03.lsi.com>

Hi Ray,
                     Yes, we are good with that last revision.

Thanks,
                         -Rick

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Tuesday, September 11, 2012 3:31 PM
To: Chang, Alex; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)

Thank you Alex.

If nobody has any more feedback on Alex's changes (IDT) by EOD tomorrow (Wed. 9/12), then I'll push the patch.

Rick and Arpit - are you both ok with the last revision of Alex's changes?

Thanks,
Ray

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 5:07 PM
To: nvmewin at lists.openfabrics.org
Subject: [nvmewin] ***UNCHECKED*** Sep 7 - Patch (NVMe 1.0c Compliance and others)

Hi all,

I don't receive any more new feedbacks and assume everyone agrees the patch is good to go. Here comes the new sources after removing the RDY bit checking in NVMeResetAdapter. Password is idt123. Thanks again.

Regards,
Alex


________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Good stuff Alex, thanks for contributing!

From: Chang, Alex [mailto:Alex.Chang at idt.com]<mailto:[mailto:Alex.Chang at idt.com]>
Sent: Friday, September 07, 2012 2:10 PM
To: Luse, Paul E; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Paul,

I agree that "for error recovery we can't rely on any register values to be correct;" I will remove the checking and send out another zip file by the end of today if no more feedbacks or comments received. I appreciate all the inputs.

Regards,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Friday, September 07, 2012 1:48 PM
To: Chang, Alex; Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex-

Wrt checking RDY before letting the reset through, I'd like to remove that check.  We already check in the DPC code that issues error recovery resets to make sure we aren't sending more than 1 outstanding and for error recovery we can't rely on any register values to be correct; we don't know the condition is that we're wanting to reset the card for.

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Friday, September 07, 2012 8:59 AM
To: Kong, Kwok; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Thanks a lot, Kwok, for addressing the issue in the specification. For the other changes in 1.0c are new features, such as ECN 23/29. Some size of fields got changed, I kept the same naming to avoid problems. I think we are fine.

Thanks,
Alex

________________________________
From: Kong, Kwok
Sent: Thursday, September 06, 2012 6:28 PM
To: Chang, Alex; Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Please see my embedded comment ...

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Chang, Alex
Sent: Thursday, September 06, 2012 6:07 PM
To: Robles, Raymond C; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback

Hi Raymond,

Please see my comments in red...

Thanks,
Alex

________________________________
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, September 06, 2012 5:25 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>; Chang, Alex
Subject: ***UNCHECKED*** Aug 28 - Patch (NVMe 1.0c Compliance and others) --- Intel Feedback
Alex,

Here is Intel's feedback on your patch.  Let us know if you need any more info on our comments/questions.


-          nvme.h:

o    ADMIN_SET_FEATURES_LBA_COMMAND_RANGE_TYPE_ENTRY Structure: GUID field (changed from ULONGLONG to UCHAR [16]) - what was the reason for this change?

To match the size of GUID defined in NVMe spec, which is 16 bytes in length. If I understand it right that ULONGLONG is only 8-byte long.

o    General Comment: With the 1.0c changes, will the driver be backward compatible with 1.0b? If not, do we need a mechanism to do so or have you thought about what we should be doing in this case?  Did you attempt any testing of this?

No, I don't think it's backward compatible with 1.0b. The only thing I can think of as compatibility issue is the 0's based NUMD of Firmware Image download and Get Log Page command. In 1.0b, the spec did not indicate it clearly. Now, 1.0c clarifies it. I don't mind to add an ifdef to differenciate them.
<Kwok> I think you meant it is backward compatible with 1.0b. The 0's based NUMD was not clearly indicated in 1.0b. We may have misinterpreted it but it was a bug in the driver if we mis-interpreted it.   It was a bug fix then and not a compatibility problem with 1.0b.


-          nvmeInit.c:

o    NVMeResetAdapter:

*  What is the use case for having a check for RDY already being 0 (we can never have nested resets so it would seem this would never be the case but not totally sure)?

The code is checking the RDY bit to find out if the controller had already been reset. If so, there is no point to write 0 to EN bit of CC register again.

o    NVMeNormalShutdown:

*  Same comment as above for reset adapter (same check is performed here).

*  The comment on line 2469 states that the code is waiting for all queues to be deleted, but really you are just checking to see that the RDY bit has been set to 0 indicating the transition of the EN bit from 1 to 0.

Per NVMe specification, when RDY bit becomes 0 due to a reset, it indicates the created queues have been deleted.

o    NvmeCheckPendingCpl:

*  "unsigned int" is used for a variable declaration. We always use typedef types... should be ULONG.

Will change it.

*  General Comment: There is already a function to detect if commands are pending... NVMeDetectPendingCmds in nvmeIo.c. This was done as part of the S3/S4 work that Rick/Arpit (LSI) did. Did you take a look at this function and see if it was similar to your new function? Was there something specific that you needed differently than what was already coded in the existing function?

I think they are for different purposes. NVMeDetectPendingCmds is called to ensure there is no pending IO before enterring power saving modes. NVMeCheckPendingCpl is called to see if we have any pending completed entries in any one of the created completion queues to determine if we do own the INTx interrupt. In other words, pending IOs don't mean they had just been completed when INTx interrupt happens.


-          nvmeStd.c:

o    Line 1657: Paul removed all support for CHATHAM in a previous patch (but left in CHATHAM2 support). Please remove the CHATHAM check from the code.

Will do it.

Thanks,
Ray

[cid:image001.jpg at 01CD904B.8912A5A0]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/c2eff50a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 1476 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120911/c2eff50a/attachment.jpg>

From raymond.c.robles at intel.com  Mon Sep 17 17:31:51 2012
From: raymond.c.robles at intel.com (Robles, Raymond C)
Date: Tue, 18 Sep 2012 00:31:51 +0000
Subject: [nvmewin] NVMe Windows DB is LOCKED - Pushing latest patch from
 Alex Change (IDT) - Legacy INTx fix, NVMe 1.0c comp., misc. fixes
Message-ID: <49158E750348AA499168FD41D88983601807F838@FMSMSX105.amr.corp.intel.com>

Locking the NVMe Windows DB.

Thanks,
Ray

[Description: Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120918/c5927d4d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120918/c5927d4d/attachment.png>

From raymond.c.robles at intel.com  Mon Sep 17 17:43:26 2012
From: raymond.c.robles at intel.com (Robles, Raymond C)
Date: Tue, 18 Sep 2012 00:43:26 +0000
Subject: [nvmewin] NVMe Windows DB is UNLOCKED - Pushing latest patch from
 Alex Change (IDT) - Legacy INTx fix, NVMe 1.0c comp., misc. fixes
Message-ID: <49158E750348AA499168FD41D88983601807F85A@FMSMSX105.amr.corp.intel.com>

Latest patch by IDT (Alex Chang) has been pushed to the trunk.  And, as always, I've created a tag for the latest push (idt_patch_nvme_1_0_c_compliance).  Please make sure to update to get the latest code on the trunk and the new tag.

If anyone has any questions, please free to contact me.

Thanks,
Ray

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Monday, September 17, 2012 5:32 PM
To: nvmewin at lists.openfabrics.org
Subject: [nvmewin] NVMe Windows DB is LOCKED - Pushing latest patch from Alex Change (IDT) - Legacy INTx fix, NVMe 1.0c comp., misc. fixes

Locking the NVMe Windows DB.

Thanks,
Ray

[Description: Description: cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Attached Platform Storage Software
Datacenter Software Division
Intel Corporation
Desk: 480.554.2600
Mobile: 480.399.0645

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120918/7013dd10/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1756 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120918/7013dd10/attachment.png>

From Alex.Chang at idt.com  Mon Sep 24 09:45:29 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Mon, 24 Sep 2012 16:45:29 +0000
Subject: [nvmewin] Issues Found
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) {
It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120924/d1b8f4a9/attachment.html>

From paul.e.luse at intel.com  Mon Sep 24 11:59:00 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Mon, 24 Sep 2012 18:59:00 +0000
Subject: [nvmewin] Issues Found
In-Reply-To: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
References: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
Message-ID: <466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>

Alex

I'll cover your questions this evening plus have some non bug fix changes in some of these areas anyways.

Thx
Paul

Sent from my iPhone

On Sep 24, 2012, at 9:46 AM, "Chang, Alex" <Alex.Chang at idt.com<mailto:Alex.Chang at idt.com>> wrote:

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) {
It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin


From paul.e.luse at intel.com  Mon Sep 24 13:46:21 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Mon, 24 Sep 2012 20:46:21 +0000
Subject: [nvmewin] Issues Found
In-Reply-To: <466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>
References: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
	<466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>
Message-ID: <82C9F782B054C94B9FC04A331649C77A07B65D1C@FMSMSX106.amr.corp.intel.com>

So I found myself with some time here and was able to prepare my next patch, I won't send it out til I have a chance to test it but I wanted to run through all the changes real quick before I responded.

The changes I'll be sending out are primarily focused around sharing the admin queue MSIX with one other queue (which ones depends on learning).  This came about because of a bug report from someone running on a 32 core system - in this case we ask for 33 vectors and when we don't get them we end up dropping to 1 and sharing everything.  This, of course, works but under heavy IO it causes DPC watchdog timeouts simply due to the amount of time we spend looking through all the queues processing IOs.  The load in question was 32 workers (iometer) and 64 IO depth with 512B reads.  There are several different ways we could address this but the one I'm suggesting as a generic improvement is to have the admin queue share with another queue so that we require an even number of vectors and can readily support 32 cores which is a pretty common config.

In the process of putting this together I ran into the item you mention below Alex so have already fixed that.  Had not previously tested on a system with multiple NUMA nodes but clearly with that LOC in there, we don't init enough queues, we setup 32 allright but we do the same set of 16 twice.  So, this is fixed in my patch.

On your second question, good question BYW, this is one of the reasons why learning mode works.  We know which queue to look in only because we are still in learning mode and we set the queues up so that we can count on QID==MsgId.  Remember, we're learning the association between MSIX vector and completing core, then updating the tables and deleting/recreating the CQ so once learning is done we use the table but before we count on how we set things up.

On your 3rd question, we didn't write or test that code, I forget who added it but I would consider it untested and a prime candidate for anyone wanting to contribute :)  We at Intel will be looking more closely at that code in the coming months.

Anyway, hope that answers your questions and I'll send out my patch either tonight that includes the fix the first item below, some additional debug prints (via compile switch) to dump our PRP info as you go, a few additional assert, etc.  Its not very big.  After that we'll be coming with a series of AER fixes.

Thx
Paul


-----Original Message-----
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E
Sent: Monday, September 24, 2012 11:59 AM
To: Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] Issues Found

Alex

I'll cover your questions this evening plus have some non bug fix changes in some of these areas anyways.

Thx
Paul

Sent from my iPhone

On Sep 24, 2012, at 9:46 AM, "Chang, Alex" <Alex.Chang at idt.com<mailto:Alex.Chang at idt.com>> wrote:

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) { It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin
_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin


From Alex.Chang at idt.com  Mon Sep 24 14:11:05 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Mon, 24 Sep 2012 21:11:05 +0000
Subject: [nvmewin] Issues Found
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B65D1C@FMSMSX106.amr.corp.intel.com>
References: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
	<466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>
	<82C9F782B054C94B9FC04A331649C77A07B65D1C@FMSMSX106.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B8363C@corpmail1.na.ads.idt.com>

Hi Paul,

As for the 2nd item I brought up, I believe we have set up a mapping between cores and queues before learning:
Cores# Queues#
0      1
1      2
2      3
...
While learning, we decide which queue to use based on the above mappings. When commands complete, the value of MsgID depends on the APIC settings, which is the purpose of learning. In other words, MsgId is not necessarily equal to QID. After fixing Item# 1, I have seen the failure of Driver State machine with Dbg build driver due to timeout in the learning state.

Alex
 

-----Original Message-----
From: Luse, Paul E [mailto:paul.e.luse at intel.com] 
Sent: Monday, September 24, 2012 1:46 PM
To: Luse, Paul E; Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

So I found myself with some time here and was able to prepare my next patch, I won't send it out til I have a chance to test it but I wanted to run through all the changes real quick before I responded.

The changes I'll be sending out are primarily focused around sharing the admin queue MSIX with one other queue (which ones depends on learning).  This came about because of a bug report from someone running on a 32 core system - in this case we ask for 33 vectors and when we don't get them we end up dropping to 1 and sharing everything.  This, of course, works but under heavy IO it causes DPC watchdog timeouts simply due to the amount of time we spend looking through all the queues processing IOs.  The load in question was 32 workers (iometer) and 64 IO depth with 512B reads.  There are several different ways we could address this but the one I'm suggesting as a generic improvement is to have the admin queue share with another queue so that we require an even number of vectors and can readily support 32 cores which is a pretty common config.

In the process of putting this together I ran into the item you mention below Alex so have already fixed that.  Had not previously tested on a system with multiple NUMA nodes but clearly with that LOC in there, we don't init enough queues, we setup 32 allright but we do the same set of 16 twice.  So, this is fixed in my patch.

On your second question, good question BYW, this is one of the reasons why learning mode works.  We know which queue to look in only because we are still in learning mode and we set the queues up so that we can count on QID==MsgId.  Remember, we're learning the association between MSIX vector and completing core, then updating the tables and deleting/recreating the CQ so once learning is done we use the table but before we count on how we set things up.

On your 3rd question, we didn't write or test that code, I forget who added it but I would consider it untested and a prime candidate for anyone wanting to contribute :)  We at Intel will be looking more closely at that code in the coming months.

Anyway, hope that answers your questions and I'll send out my patch either tonight that includes the fix the first item below, some additional debug prints (via compile switch) to dump our PRP info as you go, a few additional assert, etc.  Its not very big.  After that we'll be coming with a series of AER fixes.

Thx
Paul


-----Original Message-----
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E
Sent: Monday, September 24, 2012 11:59 AM
To: Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] Issues Found

Alex

I'll cover your questions this evening plus have some non bug fix changes in some of these areas anyways.

Thx
Paul

Sent from my iPhone

On Sep 24, 2012, at 9:46 AM, "Chang, Alex" <Alex.Chang at idt.com<mailto:Alex.Chang at idt.com>> wrote:

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) { It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin
_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin


From paul.e.luse at intel.com  Mon Sep 24 14:19:23 2012
From: paul.e.luse at intel.com (Luse, Paul E)
Date: Mon, 24 Sep 2012 21:19:23 +0000
Subject: [nvmewin] Issues Found
In-Reply-To: <548C5470AAD9DA4A85D259B663190D3602B8363C@corpmail1.na.ads.idt.com>
References: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
	<466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>
	<82C9F782B054C94B9FC04A331649C77A07B65D1C@FMSMSX106.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B8363C@corpmail1.na.ads.idt.com>
Message-ID: <82C9F782B054C94B9FC04A331649C77A07B65E88@FMSMSX106.amr.corp.intel.com>

I'm not sure what you are seeing Alex, but the MSID does not depend on the mapping, it depends on what vector we put in the CQ when we created it.  I'm not sure what you think you fixed but lets you and I grab some time and review my patch before doing much more.  I have a feeling that you are seeing the impact of your change - if you just delete that queueid=0 line then you haven't done enough, that's not a complete fix.  My change to share MSID0 with a queue addresses what you are seeing simply by how its implemented.

I'm actually in San Jose this week, might be able to whip over later this week if that's easier.  I'll send out the new code this eve.

-----Original Message-----
From: Chang, Alex [mailto:Alex.Chang at idt.com] 
Sent: Monday, September 24, 2012 2:11 PM
To: Luse, Paul E
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

Hi Paul,

As for the 2nd item I brought up, I believe we have set up a mapping between cores and queues before learning:
Cores# Queues#
0      1
1      2
2      3
...
While learning, we decide which queue to use based on the above mappings. When commands complete, the value of MsgID depends on the APIC settings, which is the purpose of learning. In other words, MsgId is not necessarily equal to QID. After fixing Item# 1, I have seen the failure of Driver State machine with Dbg build driver due to timeout in the learning state.

Alex
 

-----Original Message-----
From: Luse, Paul E [mailto:paul.e.luse at intel.com] 
Sent: Monday, September 24, 2012 1:46 PM
To: Luse, Paul E; Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

So I found myself with some time here and was able to prepare my next patch, I won't send it out til I have a chance to test it but I wanted to run through all the changes real quick before I responded.

The changes I'll be sending out are primarily focused around sharing the admin queue MSIX with one other queue (which ones depends on learning).  This came about because of a bug report from someone running on a 32 core system - in this case we ask for 33 vectors and when we don't get them we end up dropping to 1 and sharing everything.  This, of course, works but under heavy IO it causes DPC watchdog timeouts simply due to the amount of time we spend looking through all the queues processing IOs.  The load in question was 32 workers (iometer) and 64 IO depth with 512B reads.  There are several different ways we could address this but the one I'm suggesting as a generic improvement is to have the admin queue share with another queue so that we require an even number of vectors and can readily support 32 cores which is a pretty common config.

In the process of putting this together I ran into the item you mention below Alex so have already fixed that.  Had not previously tested on a system with multiple NUMA nodes but clearly with that LOC in there, we don't init enough queues, we setup 32 allright but we do the same set of 16 twice.  So, this is fixed in my patch.

On your second question, good question BYW, this is one of the reasons why learning mode works.  We know which queue to look in only because we are still in learning mode and we set the queues up so that we can count on QID==MsgId.  Remember, we're learning the association between MSIX vector and completing core, then updating the tables and deleting/recreating the CQ so once learning is done we use the table but before we count on how we set things up.

On your 3rd question, we didn't write or test that code, I forget who added it but I would consider it untested and a prime candidate for anyone wanting to contribute :)  We at Intel will be looking more closely at that code in the coming months.

Anyway, hope that answers your questions and I'll send out my patch either tonight that includes the fix the first item below, some additional debug prints (via compile switch) to dump our PRP info as you go, a few additional assert, etc.  Its not very big.  After that we'll be coming with a series of AER fixes.

Thx
Paul


-----Original Message-----
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E
Sent: Monday, September 24, 2012 11:59 AM
To: Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] Issues Found

Alex

I'll cover your questions this evening plus have some non bug fix changes in some of these areas anyways.

Thx
Paul

Sent from my iPhone

On Sep 24, 2012, at 9:46 AM, "Chang, Alex" <Alex.Chang at idt.com<mailto:Alex.Chang at idt.com>> wrote:

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) { It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin
_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin


From Alex.Chang at idt.com  Mon Sep 24 15:10:25 2012
From: Alex.Chang at idt.com (Chang, Alex)
Date: Mon, 24 Sep 2012 22:10:25 +0000
Subject: [nvmewin] Issues Found
In-Reply-To: <82C9F782B054C94B9FC04A331649C77A07B65E88@FMSMSX106.amr.corp.intel.com>
References: <548C5470AAD9DA4A85D259B663190D3602B825EA@corpmail1.na.ads.idt.com>
	<466EE916-A665-4085-A3A6-91F94EBD3FED@intel.com>
	<82C9F782B054C94B9FC04A331649C77A07B65D1C@FMSMSX106.amr.corp.intel.com>
	<548C5470AAD9DA4A85D259B663190D3602B8363C@corpmail1.na.ads.idt.com>
	<82C9F782B054C94B9FC04A331649C77A07B65E88@FMSMSX106.amr.corp.intel.com>
Message-ID: <548C5470AAD9DA4A85D259B663190D3602B83664@corpmail1.na.ads.idt.com>

Hi Paul,

Sure, I would like to try it out your changes later...

Thanks,
Alex

-----Original Message-----
From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Monday, September 24, 2012 2:19 PM
To: Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

I'm not sure what you are seeing Alex, but the MSID does not depend on the mapping, it depends on what vector we put in the CQ when we created it.  I'm not sure what you think you fixed but lets you and I grab some time and review my patch before doing much more.  I have a feeling that you are seeing the impact of your change - if you just delete that queueid=0 line then you haven't done enough, that's not a complete fix.  My change to share MSID0 with a queue addresses what you are seeing simply by how its implemented.

I'm actually in San Jose this week, might be able to whip over later this week if that's easier.  I'll send out the new code this eve.

-----Original Message-----
From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Monday, September 24, 2012 2:11 PM
To: Luse, Paul E
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

Hi Paul,

As for the 2nd item I brought up, I believe we have set up a mapping between cores and queues before learning:
Cores# Queues#
0      1
1      2
2      3
...
While learning, we decide which queue to use based on the above mappings. When commands complete, the value of MsgID depends on the APIC settings, which is the purpose of learning. In other words, MsgId is not necessarily equal to QID. After fixing Item# 1, I have seen the failure of Driver State machine with Dbg build driver due to timeout in the learning state.

Alex


-----Original Message-----
From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Monday, September 24, 2012 1:46 PM
To: Luse, Paul E; Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] Issues Found

So I found myself with some time here and was able to prepare my next patch, I won't send it out til I have a chance to test it but I wanted to run through all the changes real quick before I responded.

The changes I'll be sending out are primarily focused around sharing the admin queue MSIX with one other queue (which ones depends on learning).  This came about because of a bug report from someone running on a 32 core system - in this case we ask for 33 vectors and when we don't get them we end up dropping to 1 and sharing everything.  This, of course, works but under heavy IO it causes DPC watchdog timeouts simply due to the amount of time we spend looking through all the queues processing IOs.  The load in question was 32 workers (iometer) and 64 IO depth with 512B reads.  There are several different ways we could address this but the one I'm suggesting as a generic improvement is to have the admin queue share with another queue so that we require an even number of vectors and can readily support 32 cores which is a pretty common config.

In the process of putting this together I ran into the item you mention below Alex so have already fixed that.  Had not previously tested on a system with multiple NUMA nodes but clearly with that LOC in there, we don't init enough queues, we setup 32 allright but we do the same set of 16 twice.  So, this is fixed in my patch.

On your second question, good question BYW, this is one of the reasons why learning mode works.  We know which queue to look in only because we are still in learning mode and we set the queues up so that we can count on QID==MsgId.  Remember, we're learning the association between MSIX vector and completing core, then updating the tables and deleting/recreating the CQ so once learning is done we use the table but before we count on how we set things up.

On your 3rd question, we didn't write or test that code, I forget who added it but I would consider it untested and a prime candidate for anyone wanting to contribute :)  We at Intel will be looking more closely at that code in the coming months.

Anyway, hope that answers your questions and I'll send out my patch either tonight that includes the fix the first item below, some additional debug prints (via compile switch) to dump our PRP info as you go, a few additional assert, etc.  Its not very big.  After that we'll be coming with a series of AER fixes.

Thx
Paul


-----Original Message-----
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E
Sent: Monday, September 24, 2012 11:59 AM
To: Chang, Alex
Cc: nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] Issues Found

Alex

I'll cover your questions this evening plus have some non bug fix changes in some of these areas anyways.

Thx
Paul

Sent from my iPhone

On Sep 24, 2012, at 9:46 AM, "Chang, Alex" <Alex.Chang at idt.com<mailto:Alex.Chang at idt.com>> wrote:

Hi Paul,

When testing the latest patch I added, I came across couple issues in the driver:
1. In the patch you sent out on July 13 (later tagged as misc_bug_fixes_and_enum), within NVMeAllocIoQueues function, you reset the QueueID for each NUMA node loop as below:
        for (Node = 0; Node < pRMT->NumNumaNodes; Node++) {
            pNNT = pRMT->pNumaNodeTbl + Node;
            QueueID = 0;
            for (Core = pNNT->FirstCoreNum; Core <= pNNT->LastCoreNum; Core++) { It turns out only allocating the number of cores of a given NUMA node for the entire system. I wonder why?

2. When the driver is in learning phase where it tries to find out the mappings between cores and MSI vectors, in IoCompletionDpcRoutine, the driver limits the pending completion entry checking based on MsgID:
        if (!learning) {
            firstCheckQueue = lastCheckQueue = pMMT->CplQueueNum;
        } else {
            firstCheckQueue = lastCheckQueue = (USHORT)MsgID;
        }
Since it's still in learning phase, shouldn't it look up every created completion queue to find out the mapping?

3. In NVMeInitialize, the driver call StorPortInitializePerfOpts in both normal and Crashdump/Hibernation cases. The routine returns failure and I wonder if it makes sense to call it in Crashdump/Hibernation case.

Thanks,
Alex

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin
_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120924/330dcdea/attachment.html>