[nvmewin] review comments for patch from Google

Tue Jun 14 10:11:32 PDT 2016

I think we can fix the recovery dpc routine by instead taking every dpc
lock in addition or instead the startio lock. Do you see any problems with
that?

On Tue, Jun 14, 2016 at 7:41 AM, Thomas Freeman <thomas.freeman at hgst.com>
wrote:

> Hi Iuliu,
>
> While looking at this change, it caused me to review other code paths that
> utilize the StartIo lock.
>
>
>
> That brought up 2 concerns
>
> 1. The routine recoveryDpcRoutine gets the Startio lock to ensure no new
> sub/cpl queue entries are processed on any CPU. With the changes for
> concurrent channels, that synchronization no longer occurs.
>
>
>
> 2. The second concern relates to an issue that existed before your change
> - There is a path for the driver to deadlock on attempting to get the
> StartIo lock.
>
>
>
> An example of the deadlock occurs when a command is issued to delete an
> attached namespace AND MultipleCoresToSingleQueueFlag is set to TRUE.
>
> In this example, when the namespace delete is issued, here are the steps
> leading up to the deadlock:
>
> -Driver issues command Namespace Attachment (SEL = 1, controller detach)
>
> -Command completion is processed by IoCompletionRoutine. In that method
> the driver acquires the StartIo lock (if MultipleCoresToSingleQueueFlag is
> set)
>
> -IoCompletionRoutine calls the completion method - pNvmeCompletionRoutine.
> That eventually ends up in NVMeCompletionNSAttachment.
>
> -NVMeCompletionNSAttachment proceeds to the next step, building the
> Namespace management command (SEL = 1, delete).
>
> -NVMeCompletionNSAttachment calls ProcessIo (with AcquireLock = TRUE).
>
> -The first step in ProcessIo is to acquire the StartIo lock, which is
> already held by IoCompletionRoutine.
>
>
>
> I'm not sure what value is provided by having ProcessIo getting the
> StartIo lock. Removing that step in ProcessIo would certainly solve this
> deadlock. I don't see it causing other problems. Maybe some others get
> express an opinion on this.
>
>
>
> Regards,
>
> *Tom Freeman*
>
> *Software Engineer, Device Manager and Driver Development*
>
> Western Digital Corporation
>
> e.  Thomas.freeman at hgst.com
>
> o.  +1-507-322-2311
>
>
>
>
>
> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On Behalf
> Of *Iuliu Rus
> *Sent:* Friday, June 10, 2016 2:31 PM
> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>
> *Subject:* Re: [nvmewin] review comments for patch from Google
>
>
>
> Our tests are still running and i'll be in meetings for the rest of the
> day, so i'm sending you my patch now so you can evaluate and review.
>
>
>
> On Fri, Jun 10, 2016 at 10:24 AM, Iuliu Rus <iuliur at google.com> wrote:
>
> I'm just running through the tests and will send it to you as soon as they
> pass.
>
>
>
> On Fri, Jun 10, 2016 at 10:22 AM, Robles, Raymond C <
> raymond.c.robles at intel.com> wrote:
>
> Hi Iuliu,
>
>
>
> If you send out the updated patch with the protection in place for both
> list entries and doorbells, I can run it on an Intel NVMe SSD (P3700) and
> see if there is any performance impact (up or down).
>
>
>
> Thank you…
>
> Ray
>
>
>
> *From:* Iuliu Rus [mailto:iuliur at google.com]
> *Sent:* Friday, June 10, 2016 10:17 AM
> *To:* suman.p at samsung.com
> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; Judy Brock <
> judy.brock at samsung.com>; nvmewin <nvmewin at lists.openfabrics.org>
> *Subject:* Re: RE: review comments for patch from Google
>
>
>
> we are running this in a virtual machine, and we do see a perf difference
> with concurrent channels. We did not observe any performance drop with
> synchronized access to the list entry.
>
>
>
> On Fri, Jun 10, 2016 at 7:15 AM, SUMAN PRAKASH B <suman.p at samsung.com>
> wrote:
>
> Hi Iuliu,
>
>
>
> As you have the implementation in place, can you let us know if there is
> any performance improvement when STOR_PERF_CONCURRENT_CHANNELS is set in
> case of #core and #queues not matching.
>
> Also, if there is any drop in performance if the list entry and doorbell
> is protected after setting STOR_PERF_CONCURRENT_CHANNELS.
>
>
>
> Perhaps a bit off topic:
>
> Even in case when #cores and #queues are same and when
> STOR_PERF_CONCURRENT_CHANNELS is set, we did not observe any performance
> gain. Maybe we are hitting the max performance from device even without
> setting STOR_PERF_CONCURRENT_CHANNELS. Can you please let us know your
> observation?
>
>
>
> Thanks,
>
> Suman
>
>
>
> ------- *Original Message* -------
>
> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>
> *Date* : Jun 10, 2016 08:30 (GMT+05:30)
>
> *Title* : RE: review comments for patch from Google
>
>
>
> Hi Judy,
>
>
>
> Thanks for the additional info. I agree with the assessment. My statement
> was directed at the general direction of the solution… and the fact that I
> think option 1 is not as robust as option 2.
>
>
>
> Thank you…
>
> Ray
>
>
>
> *From:* Judy Brock [mailto:judy.brock at samsung.com]
> *Sent:* Thursday, June 09, 2016 5:14 PM
> *To:* Robles, Raymond C <raymond.c.robles at intel.com>; Iuliu Rus <
> iuliur at google.com>; suman.p at samsung.com
> *Cc:* nvmewin at lists.openfabrics.org
> *Subject:* RE: review comments for patch from Google
>
>
>
> Hi Ray et al,
>
> [Suman wrote] >> Protect the listentry and doorbell before access/update
>
>                              >>Do not set STOR_PERF_CONCURRENT_CHANNELS
> when number of cores and queues are not same.
>
>
>
> Ray wrote] >>In the case Suman points out, concurrent channels should
> still be enabled even if the number of cores and queues do not match. This
> is especially true for large core count systems (>128 logical cpu cores)
> but devices may only support 32 queues or less
>
>
>
> I think Suman was saying, we have two options:
>
> 1.       either don’t enable it at all if # cores != #queues or
>
> 2.       protect both the list entries and doorbell writes before any
> accesses/updates if #cores != #queues.
>
>
>
> So we will need to check to see if we need to use locks or not in the
> performance path based on whether #cores and # queues DO match or not if we
> want to enable concurrent channels no matter what.   It’s probably better
> than option 1) above if this is a configuration that can come up frequently.
>
>
>
> Thanks,
>
> Judy
>
>
>
>
>
> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org
> <nvmewin-bounces at lists.openfabrics.org>] *On Behalf Of *Robles, Raymond C
> *Sent:* Thursday, June 09, 2016 2:32 PM
> *To:* Iuliu Rus; suman.p at samsung.com
> *Cc:* nvmewin at lists.openfabrics.org
> *Subject:* Re: [nvmewin] review comments for patch from Google
>
>
>
> Hi Iuliu/Suman,
>
>
>
> I’m of the opinion that any solution provided should be as robust as
> possible. In the case Suman points out, concurrent channels should still be
> enabled even if the number of cores and queues do not match. This is
> especially true for large core count systems (>128 logical cpu cores) but
> devices may only support 32 queues or less.
>
>
>
> Thank you…
>
> Ray
>
>
>
> *From:* Iuliu Rus [mailto:iuliur at google.com]
> *Sent:* Thursday, June 09, 2016 2:09 PM
> *To:* suman.p at samsung.com
> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>;
> nvmewin at lists.openfabrics.org
> *Subject:* Re: review comments for patch from Google
>
>
>
> Which one do you prefer? Our google driver has the synchronization in
> place already for a different reason. Or, I can follow your second
> suggestion.
>
>
>
> On Thu, Jun 9, 2016 at 7:39 AM, SUMAN PRAKASH B <suman.p at samsung.com>
> wrote:
>
> Hi Iuliu,
>
>
>
> Please find my review comments below:
>
>
>
> 1. If the number of Cores and Queues supported by device are not same,
> then setting STOR_PERF_CONCURRENT_CHANNELS will corrupt the submission
> queue. For example, if device has 4 queues and connected to target PC which
> has 8 logical processors, and if STOR_PERF_CONCURRENT_CHANNELS is set, all
> the 8 logical processors will get startIo calls concurrently. As per OFA
> driver NUMA design, the IO Queue 1 will be shared by core 0 and core 4.
> There is possibility that core 0 and core 4 will get startio() calls
> concurrently and will access/update the listenty and queue 1 doorbell
> register at same time, which will lead to listentry and doorbell register
> corruption.
> This problem can be address by following ways:
> a. Protect the listentry and doorbell before access/update.
> b. Do not set STOR_PERF_CONCURRENT_CHANNELS when number of cores and
> queues are not same.
>
>
>
> 2. We can use pRMT->NumActiveCores which is updated in NVMeEnumNumaCores()
> instead of using KeQueryActiveProcessorCountEx/KeQueryActiveProcessorCount.
>
>
>
> Please let us know your opinion.
>
>
>
> Thanks,
> Suman
>
>
>
> ------- *Original Message* -------
>
> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>
> *Date* : Jun 09, 2016 03:56 (GMT+05:30)
>
> *Title* : Re: [nvmewin] Happy New Year... and status update
>
>
>
> Thank you Iuliu!
>
>
>
> Reviewing companies, please try to target the review completion for 2
> weeks from today (6/22/16).
>
>
>
> Thank you!
>
> Ray
>
>
>
> *From:* Iuliu Rus [mailto:iuliur at google.com]
> *Sent:* Wednesday, June 08, 2016 12:26 PM
> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
> *Cc:* nvmewin at lists.openfabrics.org
> *Subject:* Re: Happy New Year... and status update
>
>
>
> I attached our patch.
>
>
>
> On Tue, Jun 7, 2016 at 2:37 PM, Iuliu Rus <iuliur at google.com> wrote:
>
> Let me see if i can get it done by tomorrow eod...
>
>
>
> On Tue, Jun 7, 2016 at 2:31 PM, Robles, Raymond C <
> raymond.c.robles at intel.com> wrote:
>
> Hi Iuliu,
>
>
>
> That’s great… thank you. Yes, the process is still the same (using SVN
> patches).
>
>
>
> Please let me know if you have any questions/comments. My only question
> for you is an ETA for the patch to be submitted.
>
>
>
> Thank you…
>
> Ray
>
>
>
> *From:* Iuliu Rus [mailto:iuliur at google.com]
> *Sent:* Tuesday, June 07, 2016 2:28 PM
> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
> *Cc:* nvmewin at lists.openfabrics.org
> *Subject:* Re: Happy New Year... and status update
>
>
>
> Yes we can send that to you. Is the process still the same?
>
>
>
> On Tue, Jun 7, 2016 at 1:25 PM, Robles, Raymond C <
> raymond.c.robles at intel.com> wrote:
>
> Hello Iuliu,
>
>
>
> As the OFA community is closing on its mid-year release, we have only one
> more patch that was targeted… concurrent channels. Will Google still be
> submitting the concurrent channels patch?
>
>
>
> Thank you…
>
> Ray
>
>
>
> *From:* nvmewin-bounces at lists.openfabrics.org [mailto:
> nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Robles, Raymond C
> *Sent:* Thursday, January 14, 2016 4:55 PM
> *To:* 'nvmewin at lists.openfabrics.org' <nvmewin at lists.openfabrics.org>
> *Subject:* [nvmewin] Happy New Year... and status update
>
>
>
> Hello and Happy New Year,
>
>
>
> I hope everyone had a great holiday season and is off to a great start to
> the new year!
>
>
>
> As communicated last month, the patch from HGST for the SCSI
> multi-initiator changes has been approved and pushed. The holidays slowed
> down much of the progress on the OFA driver and there were several patches
> that did not get pushed prior to the end of the year. The list of patches
> remained to be pushed are as follows…
>
>
>
> ·         Namespace Management (Intel)
>
> ·         Perf Opts (Samsung)
>
> ·         Win 8.1 Timers (Samsung)
>
> ·         EOL Read Only (Samsung)
>
> ·         Concurrent channels (Google)
>
>
>
> The namespace management patch will be sent out for review tomorrow (look
> for the patch email from Carolyn)… stay tuned!!!
>
>
>
> However, once this patch is resolved, we as a community will have to make
> a decision on an official release strategy. The patch process and cadence
> was significantly slower in 2015 which leaves us with a few options.
>
>
>
> 1.       Release what is in the trunk today (or after the namespace
> management patch)… and call that the 2015 release (albeit later than
> expected)
>
> 2.       In lieu of an official 2015 release, we push the remaining
> patches listed above… and then release in ~Q2 of 2015. Basically skip a
> 2015 release and go right into the mid-2016 release.
>
> 3.       Remove the concept of “official releases” from the OFA Windows
> NVMe driver and just allow people, companies, and users to pull from the
> OFA trunk as necessary.
>
>
>
> For #3 above, my thoughts are that because the OFA driver is not a
> production driver… but a reference and enabling driver, it should/could be
> managed as just that… a reference baseline driver that any potential user
> can go and grab the source, or contribute a patch. Nothing more… nothing
> less.
>
>
>
> For the release decision, I’ll be happy to call a meeting… but we can also
> handle it via email as well… just let me know. Feedback from all is
> welcome… but I would request mandatory feedback form the 4 reviewing
> companies: Samsung, HGST, PMC-Sierra, and Intel. How would you like to
> proceed?
>
>
>
> Thanks,
>
> Ray
>
>
>
> *[image: cid:image001.png at 01CB3870.4BB88E70]*
>
> *Raymond C. Robles*
>
> Non-Volatile Memory Solutions Group
>
> Intel Corporation
>
> Office: 480-554-2600
>
> Mobile: 480-399-0645
>
>
>
>
>
>
>
>
>
>
>
>
>
> [image: Image removed by sender.]
>
>
>
>
>
>
>
> [image: Image removed by sender.]
>
>
>
>
>
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality
> Notice & Disclaimer:
> This e-mail and any files transmitted with it may contain confidential or
> legally privileged information of WDC and/or its affiliates, and are
> intended solely for the use of the individual or entity to which they are
> addressed. If you are not the intended recipient, any disclosure, copying,
> distribution or any action taken or omitted to be taken in reliance on it,
> is prohibited. If you have received this e-mail in error, please notify the
> sender immediately and delete the e-mail in its entirety from your system.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160614/e2513fab/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2938 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160614/e2513fab/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 1756 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160614/e2513fab/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160614/e2513fab/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.gif
Type: image/gif
Size: 13168 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160614/e2513fab/attachment.gif>