[nvmewin] review comments for patch from Google

Iuliu Rus iuliur at google.com
Thu Jun 16 14:14:45 PDT 2016


Let me know if i have to sync to the latest and re-do the patch.

On Thu, Jun 16, 2016 at 2:10 PM, Iuliu Rus <iuliur at google.com> wrote:

> We already have a pool of locks - the dpc locks, which is what i'm using
> in my latest patch. You cannot create spinlocks directly in a storport
> driver, the only option is to use either the StartIoLock, the dpc lock, the
> interrupt lock or one of the MSI locks.
>
> On Thu, Jun 16, 2016 at 2:04 PM, Robles, Raymond C <
> raymond.c.robles at intel.com> wrote:
>
>> Hello,
>>
>>
>>
>> Here are my thoughts…
>>
>>
>>
>> In short, I agree with Thomas’s assessment.
>>
>>
>>
>> Concurrent channels allows Storport to call StartIo concurrently… and
>> with no locks. Therefore the locking of critical sections is now the
>> responsibility of the miniport.
>>
>>
>>
>> Any solution proposed should protect critical resources (cmd id pool, cmd
>> entries, doorbells) with a lock that is granular enough for our driver, but
>> not at the visibility of the OS or Storport. Acquiring the StartIo lock to
>> protect resources in our driver only is too broad of a lock. My suggestion
>> would be to create a pool of locks (one for every SQ/CQ pair) that is
>> maintained within our driver. Again, crashdump mode is a non-issue because
>> it is single threaded.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Thomas Freeman [mailto:thomas.freeman at hgst.com]
>> *Sent:* Thursday, June 16, 2016 1:51 PM
>> *To:* Iuliu Rus <iuliur at google.com>
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* RE: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu,
>>
>> Here my view, I hope someone else can chime in to verify or correct what
>> I'm saying
>>
>>
>>
>> The purpose of the concurrent channels change is to prevent Storport from
>> acquiring the StartIo lock before calling Startio. This allows multiple
>> simultaneous calls to StartIo. If you require ProcessIo get the StartIo
>> lock, you are replicating what Storport used to - it defeats the point of
>> concurrent channels.
>>
>>
>>
>> Here is my suggestion for what needs to be changed
>>
>> 1. There is no need for StartIo Locks - remove the requests to acquire
>> them
>>
>> 2. Acquire the Dpc locks in the following situations
>>
>>    -ProcessIo, before calling GetCmdEntry - if there is queue sharing
>>
>>    -ProcessIo, before calling IssueCmd - always
>>
>>    -IoCompletionRoutine, before calling NVMeCompleteCmd - if there is
>> queue sharing
>>
>>
>>
>>    -recoveryDpcRoutine - get all of the Dpc locks
>>
>>
>>
>>
>>
>> Here is my reasoning
>>
>> -GetCmdEntry and NVMeCompleteCmd both access the FreeQList. With queue
>> sharing, these could run simultaneously and have a collision when they
>> update the list.
>>
>> -IssueCmd. With queue sharing, different cpu's could attempt to put
>> entries on the queue simultaneously. But, to make RecoveryDpcRoutine work,
>> this must be protected by the dpcLock whether or not there is queue
>> sharing.
>>
>>
>>
>> I think this will address the deadlock problem I reported also.
>>
>>
>>
>> Regards,
>>
>> *Tom Freeman*
>>
>> *Software Engineer, Device Manager and Driver Development*
>>
>> Western Digital Corporation
>>
>> e.  Thomas.freeman at hgst.com
>>
>> o.  +1-507-322-2311
>>
>>
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com <iuliur at google.com>]
>> *Sent:* Thursday, June 16, 2016 2:42 PM
>> *To:* Thomas Freeman <thomas.freeman at hgst.com>
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Here's my revised patch for concurrent channels. The synchronization
>> rules are now as follows:
>>
>> 1. Init is single threaded so it doesn't matter if acquirelock is true or
>> false
>>
>> 2. Anything called from the DPC has to call ProcessIo with false, since
>> the dpc spinlock is already acquired.
>>
>> 3. Anything called from StartIo has to call ProcessIo with true, since
>> now the startio is not automatically acquired.
>>
>>
>>
>> Thomas, i think the bug that you found is because the ProcessIo was
>> incorrectly called with TRUE instead of FALSE.
>>
>> Let me know what are your thoughts on this.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 10:41 AM, Iuliu Rus <iuliur at google.com> wrote:
>>
>> Guys, am i OK continuing with the patch on the current code base or do
>> you want me to wait until you pull in some more changes?
>>
>>
>>
>> On Wed, Jun 15, 2016 at 8:37 AM, Thomas Freeman <thomas.freeman at hgst.com>
>> wrote:
>>
>> Ray,
>>
>> It appears that the latest patch file from Iuliu is applied to revision
>> 137. I don’t see Suman’s changes in that revision.
>>
>>
>>
>> As an example, the use of StorPortRequestTimer is not in nvmeStat.c.
>>
>>
>>
>> I thought there were some newer revisions than 137, but 137 is the latest
>> I’m able to download.
>>
>> 137 is also the revision that appears when I go to
>> https://svn.openfabrics.org/svnrepo/nvmewin/
>>
>>
>>
>> SVN is giving me some troubles, so I can’t tell for sure if this is my
>> environment or a server issue.
>>
>>
>>
>> *Tom Freeman*
>>
>> *Software Engineer, Device Manager and Driver Development*
>>
>> Western Digital Corporation
>>
>> e.  Thomas.freeman at hgst.com
>>
>> o.  +1-507-322-2311
>>
>>
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On
>> Behalf Of *Robles, Raymond C
>> *Sent:* Friday, June 10, 2016 12:23 PM
>> *To:* Iuliu Rus <iuliur at google.com>; suman.p at samsung.com
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>>
>>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu,
>>
>>
>>
>> If you send out the updated patch with the protection in place for both
>> list entries and doorbells, I can run it on an Intel NVMe SSD (P3700) and
>> see if there is any performance impact (up or down).
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com <iuliur at google.com>]
>> *Sent:* Friday, June 10, 2016 10:17 AM
>> *To:* suman.p at samsung.com
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; Judy Brock <
>> judy.brock at samsung.com>; nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: RE: review comments for patch from Google
>>
>>
>>
>> we are running this in a virtual machine, and we do see a perf difference
>> with concurrent channels. We did not observe any performance drop with
>> synchronized access to the list entry.
>>
>>
>>
>> On Fri, Jun 10, 2016 at 7:15 AM, SUMAN PRAKASH B <suman.p at samsung.com>
>> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> As you have the implementation in place, can you let us know if there is
>> any performance improvement when STOR_PERF_CONCURRENT_CHANNELS is set in
>> case of #core and #queues not matching.
>>
>> Also, if there is any drop in performance if the list entry and doorbell
>> is protected after setting STOR_PERF_CONCURRENT_CHANNELS.
>>
>>
>>
>> Perhaps a bit off topic:
>>
>> Even in case when #cores and #queues are same and when
>> STOR_PERF_CONCURRENT_CHANNELS is set, we did not observe any performance
>> gain. Maybe we are hitting the max performance from device even without
>> setting STOR_PERF_CONCURRENT_CHANNELS. Can you please let us know your
>> observation?
>>
>>
>>
>> Thanks,
>>
>> Suman
>>
>>
>>
>> ------- *Original Message* -------
>>
>> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>>
>> *Date* : Jun 10, 2016 08:30 (GMT+05:30)
>>
>> *Title* : RE: review comments for patch from Google
>>
>>
>>
>> Hi Judy,
>>
>>
>>
>> Thanks for the additional info. I agree with the assessment. My statement
>> was directed at the general direction of the solution… and the fact that I
>> think option 1 is not as robust as option 2.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Judy Brock [mailto:judy.brock at samsung.com]
>> *Sent:* Thursday, June 09, 2016 5:14 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>; Iuliu Rus <
>> iuliur at google.com>; suman.p at samsung.com
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* RE: review comments for patch from Google
>>
>>
>>
>> Hi Ray et al,
>>
>> [Suman wrote] >> Protect the listentry and doorbell before access/update
>>
>>                              >>Do not set STOR_PERF_CONCURRENT_CHANNELS
>> when number of cores and queues are not same.
>>
>>
>>
>> Ray wrote] >>In the case Suman points out, concurrent channels should
>> still be enabled even if the number of cores and queues do not match. This
>> is especially true for large core count systems (>128 logical cpu cores)
>> but devices may only support 32 queues or less
>>
>>
>>
>> I think Suman was saying, we have two options:
>>
>> 1.       either don’t enable it at all if # cores != #queues or
>>
>> 2.       protect both the list entries and doorbell writes before any
>> accesses/updates if #cores != #queues.
>>
>>
>>
>> So we will need to check to see if we need to use locks or not in the
>> performance path based on whether #cores and # queues DO match or not if we
>> want to enable concurrent channels no matter what.   It’s probably better
>> than option 1) above if this is a configuration that can come up frequently.
>>
>>
>>
>> Thanks,
>>
>> Judy
>>
>>
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org
>> <nvmewin-bounces at lists.openfabrics.org>] *On Behalf Of *Robles, Raymond C
>> *Sent:* Thursday, June 09, 2016 2:32 PM
>> *To:* Iuliu Rus; suman.p at samsung.com
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu/Suman,
>>
>>
>>
>> I’m of the opinion that any solution provided should be as robust as
>> possible. In the case Suman points out, concurrent channels should still be
>> enabled even if the number of cores and queues do not match. This is
>> especially true for large core count systems (>128 logical cpu cores) but
>> devices may only support 32 queues or less.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Thursday, June 09, 2016 2:09 PM
>> *To:* suman.p at samsung.com
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>;
>> nvmewin at lists.openfabrics.org
>> *Subject:* Re: review comments for patch from Google
>>
>>
>>
>> Which one do you prefer? Our google driver has the synchronization in
>> place already for a different reason. Or, I can follow your second
>> suggestion.
>>
>>
>>
>> On Thu, Jun 9, 2016 at 7:39 AM, SUMAN PRAKASH B <suman.p at samsung.com>
>> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> Please find my review comments below:
>>
>>
>>
>> 1. If the number of Cores and Queues supported by device are not same,
>> then setting STOR_PERF_CONCURRENT_CHANNELS will corrupt the submission
>> queue. For example, if device has 4 queues and connected to target PC which
>> has 8 logical processors, and if STOR_PERF_CONCURRENT_CHANNELS is set, all
>> the 8 logical processors will get startIo calls concurrently. As per OFA
>> driver NUMA design, the IO Queue 1 will be shared by core 0 and core 4.
>> There is possibility that core 0 and core 4 will get startio() calls
>> concurrently and will access/update the listenty and queue 1 doorbell
>> register at same time, which will lead to listentry and doorbell register
>> corruption.
>> This problem can be address by following ways:
>> a. Protect the listentry and doorbell before access/update.
>> b. Do not set STOR_PERF_CONCURRENT_CHANNELS when number of cores and
>> queues are not same.
>>
>>
>>
>> 2. We can use pRMT->NumActiveCores which is updated in
>> NVMeEnumNumaCores() instead of using
>> KeQueryActiveProcessorCountEx/KeQueryActiveProcessorCount.
>>
>>
>>
>> Please let us know your opinion.
>>
>>
>>
>> Thanks,
>> Suman
>>
>>
>>
>> ------- *Original Message* -------
>>
>> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>>
>> *Date* : Jun 09, 2016 03:56 (GMT+05:30)
>>
>> *Title* : Re: [nvmewin] Happy New Year... and status update
>>
>>
>>
>> Thank you Iuliu!
>>
>>
>>
>> Reviewing companies, please try to target the review completion for 2
>> weeks from today (6/22/16).
>>
>>
>>
>> Thank you!
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Wednesday, June 08, 2016 12:26 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* Re: Happy New Year... and status update
>>
>>
>>
>> I attached our patch.
>>
>>
>>
>> On Tue, Jun 7, 2016 at 2:37 PM, Iuliu Rus <iuliur at google.com> wrote:
>>
>> Let me see if i can get it done by tomorrow eod...
>>
>>
>>
>> On Tue, Jun 7, 2016 at 2:31 PM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> That’s great… thank you. Yes, the process is still the same (using SVN
>> patches).
>>
>>
>>
>> Please let me know if you have any questions/comments. My only question
>> for you is an ETA for the patch to be submitted.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Tuesday, June 07, 2016 2:28 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* Re: Happy New Year... and status update
>>
>>
>>
>> Yes we can send that to you. Is the process still the same?
>>
>>
>>
>> On Tue, Jun 7, 2016 at 1:25 PM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hello Iuliu,
>>
>>
>>
>> As the OFA community is closing on its mid-year release, we have only one
>> more patch that was targeted… concurrent channels. Will Google still be
>> submitting the concurrent channels patch?
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* nvmewin-bounces at lists.openfabrics.org [mailto:
>> nvmewin-bounces at lists.openfabrics.org] *On Behalf Of *Robles, Raymond C
>> *Sent:* Thursday, January 14, 2016 4:55 PM
>> *To:* 'nvmewin at lists.openfabrics.org' <nvmewin at lists.openfabrics.org>
>> *Subject:* [nvmewin] Happy New Year... and status update
>>
>>
>>
>> Hello and Happy New Year,
>>
>>
>>
>> I hope everyone had a great holiday season and is off to a great start to
>> the new year!
>>
>>
>>
>> As communicated last month, the patch from HGST for the SCSI
>> multi-initiator changes has been approved and pushed. The holidays slowed
>> down much of the progress on the OFA driver and there were several patches
>> that did not get pushed prior to the end of the year. The list of patches
>> remained to be pushed are as follows…
>>
>>
>>
>> ·         Namespace Management (Intel)
>>
>> ·         Perf Opts (Samsung)
>>
>> ·         Win 8.1 Timers (Samsung)
>>
>> ·         EOL Read Only (Samsung)
>>
>> ·         Concurrent channels (Google)
>>
>>
>>
>> The namespace management patch will be sent out for review tomorrow (look
>> for the patch email from Carolyn)… stay tuned!!!
>>
>>
>>
>> However, once this patch is resolved, we as a community will have to make
>> a decision on an official release strategy. The patch process and cadence
>> was significantly slower in 2015 which leaves us with a few options.
>>
>>
>>
>> 1.       Release what is in the trunk today (or after the namespace
>> management patch)… and call that the 2015 release (albeit later than
>> expected)
>>
>> 2.       In lieu of an official 2015 release, we push the remaining
>> patches listed above… and then release in ~Q2 of 2015. Basically skip a
>> 2015 release and go right into the mid-2016 release.
>>
>> 3.       Remove the concept of “official releases” from the OFA Windows
>> NVMe driver and just allow people, companies, and users to pull from the
>> OFA trunk as necessary.
>>
>>
>>
>> For #3 above, my thoughts are that because the OFA driver is not a
>> production driver… but a reference and enabling driver, it should/could be
>> managed as just that… a reference baseline driver that any potential user
>> can go and grab the source, or contribute a patch. Nothing more… nothing
>> less.
>>
>>
>>
>> For the release decision, I’ll be happy to call a meeting… but we can
>> also handle it via email as well… just let me know. Feedback from all is
>> welcome… but I would request mandatory feedback form the 4 reviewing
>> companies: Samsung, HGST, PMC-Sierra, and Intel. How would you like to
>> proceed?
>>
>>
>>
>> Thanks,
>>
>> Ray
>>
>>
>>
>> *[image: cid:image001.png at 01CB3870.4BB88E70]*
>>
>> *Raymond C. Robles*
>>
>> Non-Volatile Memory Solutions Group
>>
>> Intel Corporation
>>
>> Office: 480-554-2600
>>
>> Mobile: 480-399-0645
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [image: cid:image006.gif at 01D1C6EF.5D9CDE20]
>>
>> [image: Image removed by sender.
>> http://ext.samsung.net/mailcheck/SeenTimeChecker?do=9226f2572c3ad117df6a46883b3f3dde1d283838fa336dda194352576c69fe9bfdd53f331c9856e34bea3c1765014a1208cece8541bc14eacf878f9a26ce15a0]
>>
>>
>>
>>
>>
>>
>>
>> [image: cid:image006.gif at 01D1C6EF.5D9CDE20]
>>
>> [image: Image removed by sender.
>> http://ext.samsung.net/mailcheck/SeenTimeChecker?do=36e611fbbc499bdb45df2c678ea6c804daf257cfe78672b8194352576c69fe9bfdd53f331c9856e34bea3c1765014a1208cece8541bc14eacf878f9a26ce15a0]
>>
>>
>>
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality
>> Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or
>> legally privileged information of WDC and/or its affiliates, and are
>> intended solely for the use of the individual or entity to which they are
>> addressed. If you are not the intended recipient, any disclosure, copying,
>> distribution or any action taken or omitted to be taken in reliance on it,
>> is prohibited. If you have received this e-mail in error, please notify the
>> sender immediately and delete the e-mail in its entirety from your system.
>>
>>
>>
>> _______________________________________________
>> nvmewin mailing list
>> nvmewin at lists.openfabrics.org
>> http://lists.openfabrics.org/mailman/listinfo/nvmewin
>>
>>
>>
>>
>>
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality
>> Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or
>> legally privileged information of WDC and/or its affiliates, and are
>> intended solely for the use of the individual or entity to which they are
>> addressed. If you are not the intended recipient, any disclosure, copying,
>> distribution or any action taken or omitted to be taken in reliance on it,
>> is prohibited. If you have received this e-mail in error, please notify the
>> sender immediately and delete the e-mail in its entirety from your system.
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 2937 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 1756 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.gif
Type: image/gif
Size: 13168 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2938 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.jpg
Type: image/jpeg
Size: 344 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160616/a4a2b1ac/attachment-0002.jpg>


More information about the nvmewin mailing list