[nvmewin] review comments for patch from Google

Thomas Freeman thomas.freeman at hgst.com
Thu Jun 16 14:03:25 PDT 2016


Iuliu,
Disregard my response, I confused your usage of startio and dpclocks.
I’ll review again with my head screwed on correctly ☺

Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital Corporation
e.  Thomas.freeman at hgst.com
o.  +1-507-322-2311

[cid:image001.jpg at 01D1C7E8.9A8A73C0]

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Thomas Freeman
Sent: Thursday, June 16, 2016 3:51 PM
To: Iuliu Rus <iuliur at google.com>
Cc: nvmewin <nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] review comments for patch from Google

Hi Iuliu,
Here my view, I hope someone else can chime in to verify or correct what I'm saying

The purpose of the concurrent channels change is to prevent Storport from acquiring the StartIo lock before calling Startio. This allows multiple simultaneous calls to StartIo. If you require ProcessIo get the StartIo lock, you are replicating what Storport used to - it defeats the point of concurrent channels.

Here is my suggestion for what needs to be changed
1. There is no need for StartIo Locks - remove the requests to acquire them
2. Acquire the Dpc locks in the following situations
   -ProcessIo, before calling GetCmdEntry - if there is queue sharing
   -ProcessIo, before calling IssueCmd - always
   -IoCompletionRoutine, before calling NVMeCompleteCmd - if there is queue sharing

   -recoveryDpcRoutine - get all of the Dpc locks


Here is my reasoning
-GetCmdEntry and NVMeCompleteCmd both access the FreeQList. With queue sharing, these could run simultaneously and have a collision when they update the list.
-IssueCmd. With queue sharing, different cpu's could attempt to put entries on the queue simultaneously. But, to make RecoveryDpcRoutine work, this must be protected by the dpcLock whether or not there is queue sharing.

I think this will address the deadlock problem I reported also.

Regards,
Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital Corporation
e.  Thomas.freeman at hgst.com<mailto:Thomas.freeman at hgst.com>
o.  +1-507-322-2311

[cid:image007.jpg at 01D1C7E8.9A7D7B70]

From: Iuliu Rus [mailto:iuliur at google.com]
Sent: Thursday, June 16, 2016 2:42 PM
To: Thomas Freeman <thomas.freeman at hgst.com<mailto:thomas.freeman at hgst.com>>
Cc: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; nvmewin <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>
Subject: Re: [nvmewin] review comments for patch from Google

Here's my revised patch for concurrent channels. The synchronization rules are now as follows:
1. Init is single threaded so it doesn't matter if acquirelock is true or false
2. Anything called from the DPC has to call ProcessIo with false, since the dpc spinlock is already acquired.
3. Anything called from StartIo has to call ProcessIo with true, since now the startio is not automatically acquired.

Thomas, i think the bug that you found is because the ProcessIo was incorrectly called with TRUE instead of FALSE.
Let me know what are your thoughts on this.

On Thu, Jun 16, 2016 at 10:41 AM, Iuliu Rus <iuliur at google.com<mailto:iuliur at google.com>> wrote:
Guys, am i OK continuing with the patch on the current code base or do you want me to wait until you pull in some more changes?

On Wed, Jun 15, 2016 at 8:37 AM, Thomas Freeman <thomas.freeman at hgst.com<mailto:thomas.freeman at hgst.com>> wrote:
Ray,
It appears that the latest patch file from Iuliu is applied to revision 137. I don’t see Suman’s changes in that revision.

As an example, the use of StorPortRequestTimer is not in nvmeStat.c.

I thought there were some newer revisions than 137, but 137 is the latest I’m able to download.
137 is also the revision that appears when I go to https://svn.openfabrics.org/svnrepo/nvmewin/

SVN is giving me some troubles, so I can’t tell for sure if this is my environment or a server issue.

Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital Corporation
e.  Thomas.freeman at hgst.com<mailto:Thomas.freeman at hgst.com>
o.  +1-507-322-2311<tel:%2B1-507-322-2311>

[cid:image008.jpg at 01D1C7E8.9A7D7B70]

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org>] On Behalf Of Robles, Raymond C
Sent: Friday, June 10, 2016 12:23 PM
To: Iuliu Rus <iuliur at google.com<mailto:iuliur at google.com>>; suman.p at samsung.com<mailto:suman.p at samsung.com>
Cc: nvmewin <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>

Subject: Re: [nvmewin] review comments for patch from Google

Hi Iuliu,

If you send out the updated patch with the protection in place for both list entries and doorbells, I can run it on an Intel NVMe SSD (P3700) and see if there is any performance impact (up or down).

Thank you…
Ray

From: Iuliu Rus [mailto:iuliur at google.com]
Sent: Friday, June 10, 2016 10:17 AM
To: suman.p at samsung.com<mailto:suman.p at samsung.com>
Cc: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Judy Brock <judy.brock at samsung.com<mailto:judy.brock at samsung.com>>; nvmewin <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>
Subject: Re: RE: review comments for patch from Google

we are running this in a virtual machine, and we do see a perf difference with concurrent channels. We did not observe any performance drop with synchronized access to the list entry.

On Fri, Jun 10, 2016 at 7:15 AM, SUMAN PRAKASH B <suman.p at samsung.com<mailto:suman.p at samsung.com>> wrote:

Hi Iuliu,



As you have the implementation in place, can you let us know if there is any performance improvement when STOR_PERF_CONCURRENT_CHANNELS is set in case of #core and #queues not matching.

Also, if there is any drop in performance if the list entry and doorbell is protected after setting STOR_PERF_CONCURRENT_CHANNELS.



Perhaps a bit off topic:

Even in case when #cores and #queues are same and when STOR_PERF_CONCURRENT_CHANNELS is set, we did not observe any performance gain. Maybe we are hitting the max performance from device even without setting STOR_PERF_CONCURRENT_CHANNELS. Can you please let us know your observation?



Thanks,

Suman



------- Original Message -------

Sender : Robles, Raymond C<raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>

Date : Jun 10, 2016 08:30 (GMT+05:30)

Title : RE: review comments for patch from Google


Hi Judy,

Thanks for the additional info. I agree with the assessment. My statement was directed at the general direction of the solution… and the fact that I think option 1 is not as robust as option 2.

Thank you…
Ray

From: Judy Brock [mailto:judy.brock at samsung.com<mailto:judy.brock at samsung.com>]
Sent: Thursday, June 09, 2016 5:14 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Iuliu Rus <iuliur at google.com<mailto:iuliur at google.com>>; suman.p at samsung.com<mailto:suman.p at samsung.com>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: review comments for patch from Google

Hi Ray et al,
[Suman wrote] >> Protect the listentry and doorbell before access/update
                             >>Do not set STOR_PERF_CONCURRENT_CHANNELS when number of cores and queues are not same.

Ray wrote] >>In the case Suman points out, concurrent channels should still be enabled even if the number of cores and queues do not match. This is especially true for large core count systems (>128 logical cpu cores) but devices may only support 32 queues or less

I think Suman was saying, we have two options:

1.       either don’t enable it at all if # cores != #queues or

2.       protect both the list entries and doorbell writes before any accesses/updates if #cores != #queues.

So we will need to check to see if we need to use locks or not in the performance path based on whether #cores and # queues DO match or not if we want to enable concurrent channels no matter what.   It’s probably better than option 1) above if this is a configuration that can come up frequently.

Thanks,
Judy


From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Thursday, June 09, 2016 2:32 PM
To: Iuliu Rus; suman.p at samsung.com<mailto:suman.p at samsung.com>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] review comments for patch from Google

Hi Iuliu/Suman,

I’m of the opinion that any solution provided should be as robust as possible. In the case Suman points out, concurrent channels should still be enabled even if the number of cores and queues do not match. This is especially true for large core count systems (>128 logical cpu cores) but devices may only support 32 queues or less.

Thank you…
Ray

From: Iuliu Rus [mailto:iuliur at google.com]<mailto:[mailto:iuliur at google.com]>
Sent: Thursday, June 09, 2016 2:09 PM
To: suman.p at samsung.com<mailto:suman.p at samsung.com>
Cc: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: review comments for patch from Google

Which one do you prefer? Our google driver has the synchronization in place already for a different reason. Or, I can follow your second suggestion.

On Thu, Jun 9, 2016 at 7:39 AM, SUMAN PRAKASH B <suman.p at samsung.com<mailto:suman.p at samsung.com>> wrote:

Hi Iuliu,



Please find my review comments below:



1. If the number of Cores and Queues supported by device are not same, then setting STOR_PERF_CONCURRENT_CHANNELS will corrupt the submission queue. For example, if device has 4 queues and connected to target PC which has 8 logical processors, and if STOR_PERF_CONCURRENT_CHANNELS is set, all the 8 logical processors will get startIo calls concurrently. As per OFA driver NUMA design, the IO Queue 1 will be shared by core 0 and core 4. There is possibility that core 0 and core 4 will get startio() calls concurrently and will access/update the listenty and queue 1 doorbell register at same time, which will lead to listentry and doorbell register corruption.
This problem can be address by following ways:
a. Protect the listentry and doorbell before access/update.
b. Do not set STOR_PERF_CONCURRENT_CHANNELS when number of cores and queues are not same.



2. We can use pRMT->NumActiveCores which is updated in NVMeEnumNumaCores() instead of using KeQueryActiveProcessorCountEx/KeQueryActiveProcessorCount.



Please let us know your opinion.



Thanks,
Suman



------- Original Message -------

Sender : Robles, Raymond C<raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>

Date : Jun 09, 2016 03:56 (GMT+05:30)

Title : Re: [nvmewin] Happy New Year... and status update


Thank you Iuliu!

Reviewing companies, please try to target the review completion for 2 weeks from today (6/22/16).

Thank you!
Ray

From: Iuliu Rus [mailto:iuliur at google.com<mailto:iuliur at google.com>]
Sent: Wednesday, June 08, 2016 12:26 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: Happy New Year... and status update

I attached our patch.

On Tue, Jun 7, 2016 at 2:37 PM, Iuliu Rus <iuliur at google.com<mailto:iuliur at google.com>> wrote:
Let me see if i can get it done by tomorrow eod...

On Tue, Jun 7, 2016 at 2:31 PM, Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>> wrote:
Hi Iuliu,

That’s great… thank you. Yes, the process is still the same (using SVN patches).

Please let me know if you have any questions/comments. My only question for you is an ETA for the patch to be submitted.

Thank you…
Ray

From: Iuliu Rus [mailto:iuliur at google.com<mailto:iuliur at google.com>]
Sent: Tuesday, June 07, 2016 2:28 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: Happy New Year... and status update

Yes we can send that to you. Is the process still the same?

On Tue, Jun 7, 2016 at 1:25 PM, Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>> wrote:
Hello Iuliu,

As the OFA community is closing on its mid-year release, we have only one more patch that was targeted… concurrent channels. Will Google still be submitting the concurrent channels patch?

Thank you…
Ray

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org>] On Behalf Of Robles, Raymond C
Sent: Thursday, January 14, 2016 4:55 PM
To: 'nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>' <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>
Subject: [nvmewin] Happy New Year... and status update

Hello and Happy New Year,

I hope everyone had a great holiday season and is off to a great start to the new year!

As communicated last month, the patch from HGST for the SCSI multi-initiator changes has been approved and pushed. The holidays slowed down much of the progress on the OFA driver and there were several patches that did not get pushed prior to the end of the year. The list of patches remained to be pushed are as follows…


•         Namespace Management (Intel)

•         Perf Opts (Samsung)

•         Win 8.1 Timers (Samsung)

•         EOL Read Only (Samsung)

•         Concurrent channels (Google)

The namespace management patch will be sent out for review tomorrow (look for the patch email from Carolyn)… stay tuned!!!

However, once this patch is resolved, we as a community will have to make a decision on an official release strategy. The patch process and cadence was significantly slower in 2015 which leaves us with a few options.


1.       Release what is in the trunk today (or after the namespace management patch)… and call that the 2015 release (albeit later than expected)

2.       In lieu of an official 2015 release, we push the remaining patches listed above… and then release in ~Q2 of 2015. Basically skip a 2015 release and go right into the mid-2016 release.

3.       Remove the concept of “official releases” from the OFA Windows NVMe driver and just allow people, companies, and users to pull from the OFA trunk as necessary.

For #3 above, my thoughts are that because the OFA driver is not a production driver… but a reference and enabling driver, it should/could be managed as just that… a reference baseline driver that any potential user can go and grab the source, or contribute a patch. Nothing more… nothing less.

For the release decision, I’ll be happy to call a meeting… but we can also handle it via email as well… just let me know. Feedback from all is welcome… but I would request mandatory feedback form the 4 reviewing companies: Samsung, HGST, PMC-Sierra, and Intel. How would you like to proceed?

Thanks,
Ray

[cid:image001.png at 01CB3870.4BB88E70]
Raymond C. Robles
Non-Volatile Memory Solutions Group
Intel Corporation
Office: 480-554-2600<tel:480-554-2600>
Mobile: 480-399-0645<tel:480-399-0645>









[cid:image006.gif at 01D1C6EF.5D9CDE20]

[Image removed by sender. http://ext.samsung.net/mailcheck/SeenTimeChecker?do=9226f2572c3ad117df6a46883b3f3dde1d283838fa336dda194352576c69fe9bfdd53f331c9856e34bea3c1765014a1208cece8541bc14eacf878f9a26ce15a0]






[cid:image006.gif at 01D1C6EF.5D9CDE20]

[Image removed by sender. http://ext.samsung.net/mailcheck/SeenTimeChecker?do=36e611fbbc499bdb45df2c678ea6c804daf257cfe78672b8194352576c69fe9bfdd53f331c9856e34bea3c1765014a1208cece8541bc14eacf878f9a26ce15a0]

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/mailman/listinfo/nvmewin


Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.jpg
Type: image/jpeg
Size: 2938 bytes
Desc: image007.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.jpg
Type: image/jpeg
Size: 2937 bytes
Desc: image008.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image009.png
Type: image/png
Size: 1756 bytes
Desc: image009.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image010.gif
Type: image/gif
Size: 13168 bytes
Desc: image010.gif
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image011.jpg
Type: image/jpeg
Size: 344 bytes
Desc: image011.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2934 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/47d38b77/attachment-0003.jpg>


More information about the nvmewin mailing list