[nvmewin] review comments for patch from Google

Iuliu Rus iuliur at google.com
Wed Jun 29 18:15:21 PDT 2016


Some of your e-mails don't accept zip. Try renaming this.

On Wed, Jun 29, 2016 at 5:56 PM, Iuliu Rus <iuliur at google.com> wrote:

> Attached.... I see no reason to encrypt, the code is public anyway.
>
> On Wed, Jun 29, 2016 at 4:44 PM, Foster, Carolyn D <
> carolyn.d.foster at intel.com> wrote:
>
>> Hi Iuliu,
>>
>> Can you please zip your local source files and send it out to the group?
>> You might want to encrypt it and provide us with the password.
>>
>>
>>
>> Thanks!
>>
>> Carolyn
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On
>> Behalf Of *Iuliu Rus
>> *Sent:* Wednesday, June 29, 2016 4:20 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> I think i don't understand how patching works. Did you sync up your local
>> copy before looking at it? I did not remove those lines, they are still
>> present in my local copy.
>>
>>
>>
>> On Wed, Jun 29, 2016 at 3:44 PM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> Thanks for the updated patch. After reviewing again, I have some
>> additional questions…
>>
>>
>>
>> ·         In ProcessIo, the dpc lock is obtained later in the code than
>> originally implemented. Why move the lock acquire down?
>>
>> ·         In ProcessIo, it appears that you do not call
>> StorPortGetStartIoPerfParams() to obtain the completing MSI-x message
>> number. Getting this MSI-x message number ensures that you know which
>> interrupt vector will be used to complete this request… and therefore you
>> know on which SQ/CQ pair to post the SQE. Was this left out intentionally?
>> See sample code below:
>>
>>
>>
>>
>>
>> if (((DeviceExtension->perfData.Flags &
>> STOR_PERF_INTERRUPT_MESSAGE_RANGES) == STOR_PERF_INTERRUPT_MESSAGE_RANGES)
>> &&
>>
>> (SrbExtension->pSrb != NULL)) {
>>
>> Perf.Size = sizeof(STARTIO_PERFORMANCE_PARAMETERS);
>>
>> StorPortGetStartIoPerfParams((PVOID)DeviceExtension,
>>
>> (PSCSI_REQUEST_BLOCK)pSrbExtension->pSrb,
>>
>> &Perf);
>>
>> StorStatus = NVMeMapMsiToQueue(DeviceExtension,
>>
>> Perf.MessageNumber,
>>
>> &SubQueue,
>>
>> &CplQueue);
>>
>>
>>
>>
>>
>> ·         When setting up concurrent channels, I do not see the Storport
>> Perf Opts changes from Suman’s patch (Samsung). The Perf Opts changes
>> replace the old learning mode and drive the source core interrupt steering
>> in the driver. Without it, the concurrent channels changes will not be
>> optimized correctly. See reference to Perf Opts setup code (directly from
>> OFA source trunk in NVMeInitialize) below:
>>
>>
>>
>>
>>
>> if (pAE->ntldrDump == FALSE) {
>>
>>         /* Enumerate granted MSI/MSI-X messages if there is any */
>>
>>         if (NVMeEnumMsiMessages(pAE) == FALSE) {
>>
>>             NVMeFreeBuffers(pAE);
>>
>>             return (FALSE);
>>
>>         }
>>
>>            perfQueryData.Version = STOR_PERF_VERSION;
>>
>>            perfQueryData.Size = sizeof(PERF_CONFIGURATION_DATA);
>>
>>            Status = StorPortInitializePerfOpts(pAE, TRUE, &perfQueryData);
>>
>>            ASSERT(STOR_STATUS_SUCCESS == Status);
>>
>>            if (STOR_STATUS_SUCCESS == Status) {
>>
>>
>>
>>                 perfData.Version = STOR_PERF_VERSION;
>>
>>                 perfData.Size = sizeof(PERF_CONFIGURATION_DATA);
>>
>>
>>
>>                 /* Allow optimization of storport DPCs */
>>
>>                 if (perfQueryData.Flags & STOR_PERF_DPC_REDIRECTION) {
>>
>>                      perfData.Flags = STOR_PERF_DPC_REDIRECTION;
>>
>>
>>
>>                      /* Allow optimization of Interrupt Redirection and
>> Group Affinity */
>>
>>                      if ((perfQueryData.Flags &
>> (STOR_PERF_INTERRUPT_MESSAGE_RANGES | STOR_PERF_ADV_CONFIG_LOCALITY)) ==
>>
>>                            (STOR_PERF_INTERRUPT_MESSAGE_RANGES |
>> STOR_PERF_ADV_CONFIG_LOCALITY)){
>>
>>                            perfData.Flags |=
>> STOR_PERF_INTERRUPT_MESSAGE_RANGES;
>>
>>                            perfData.Flags |=
>> STOR_PERF_ADV_CONFIG_LOCALITY;
>>
>>
>>
>>                            perfData.FirstRedirectionMessageNumber = 0;
>>
>>                            perfData.LastRedirectionMessageNumber =
>> (pRMT->NumMsiMsgGranted - 1);
>>
>>                            perfData.MessageTargets = pAE->pArrGrpAff;
>>
>>                      }
>>
>>                 }
>>
>>
>>
>>                 Status = StorPortInitializePerfOpts(pAE, FALSE,
>> &perfData);
>>
>>                 ASSERT(STOR_STATUS_SUCCESS == Status);
>>
>>                 if (STOR_STATUS_SUCCESS == Status){
>>
>>                      pAE->IsMsiMappingComplete = TRUE;
>>
>>                 }
>>
>>            }
>>
>>
>>
>>         /* Call StorPortPassiveInitialization to enable passive init */
>>
>>         StorPortEnablePassiveInitialization(pAE, NVMePassiveInitialize);
>>
>>
>>
>>         return (TRUE);
>>
>>     } else {
>>
>>
>>
>>
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>>
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Wednesday, June 29, 2016 10:56 AM
>> *To:* Thomas Freeman <thomas.freeman at hgst.com>
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Sorry about Thomas. I resolved that conflict but i think i sent you the
>> patch pre-resolve by mistake. Here is the correct patch. I also triggered
>> the RecoveryDpc from the debugger, to make sure i have coverage on that. I
>> don't think the normal tests cover it.
>>
>>
>>
>>
>>
>> On Wed, Jun 29, 2016 at 7:25 AM, Thomas Freeman <thomas.freeman at hgst.com>
>> wrote:
>>
>> Iuliu,
>>
>> It looks like the patch file has an unresolved conflict:
>>
>>
>>
>> +<<<<<<< .mine
>>
>> +                             ReleaseDpcSpinLocks(pAE, dpcLocks);
>>
>> +             }  /* reset the controller */
>>
>> +             if (pAE->NumDpc > MAX_DPC) {
>>
>> +                             StorPortFreePool(pHwDeviceExtension,
>> dpcLocks);
>>
>> +             }
>>
>> +||||||| .r137
>>
>> +        StorPortReleaseSpinLock(pAE, &startLockhandle);
>>
>> +    }  /* reset the controller */
>>
>> +=======
>>
>>                                pAE->DeviceRemovedDuringIO = TRUE;
>>
>> #if (NTDDI_VERSION > NTDDI_WIN7)
>>
>>                                if (pAE->Timerhandle != NULL) {
>>
>> @@ -2634,6 +2680,7 @@
>>
>>          NVMeFreeBuffers(pAE);
>>
>>                                StorPortResume(pAE);
>>
>>      }  /* reset the controller */
>>
>> +>>>>>>> .r139
>>
>>
>>
>>
>>
>>
>>
>> *Tom Freeman*
>>
>> *Software Engineer, Device Manager and Driver Development*
>>
>> Western Digital Corporation
>>
>> e.  Thomas.freeman at hgst.com
>>
>> o.  +1-507-322-2311
>>
>>
>>
>> [image: cid:image001.jpg at 01D1D1F7.245A73D0]
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On
>> Behalf Of *Iuliu Rus
>> *Sent:* Monday, June 27, 2016 11:12 AM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>>
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Done.
>>
>>
>>
>> On Mon, Jun 27, 2016 at 8:52 AM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hi Suman,
>>
>>
>>
>> Thank you for checking on your patches. I was unaware that your latest
>> patch for IOCTL Surprise Removal was not applied correctly. I’ve just
>> applied that patch again and the trunk should now be at revision 138. Could
>> you please double check to ensure you patch is applied properly and let me
>> know? Thank you!!!
>>
>>
>>
>> Hi Iuliu,
>>
>>
>>
>> Sorry about the thrash… but I think we’ve finally got the trunk up to
>> date to the latest (once confirmed by Suman from Samsung). Once confirmed,
>> please rebase you trunk and add your modifications to that latest revision
>> (138). Thank you!!!
>>
>>
>>
>> Thanks…
>> Ray
>>
>>
>>
>> *From:* SUMAN PRAKASH B [mailto:suman.p at samsung.com]
>> *Sent:* Monday, June 27, 2016 1:12 AM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>; Iuliu Rus <
>> iuliur at google.com>
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Ray,
>>
>>
>>
>> Samsung submitted 3 patches. Revision 134 is first patch, Revision 136
>> and 137 is second patch.
>>
>> We submitted third patch for Handling device Surprise removal in IOCTL
>> path which was approved by HGST and Intel on 1st June. Please find attached
>> the mail thread for reference.
>>
>> The third patch is not yet checked-in to trunk.
>>
>>
>>
>> Thanks,
>>
>> Suman
>>
>>
>>
>> ------- *Original Message* -------
>>
>> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>>
>> *Date* : Jun 27, 2016 12:19 (GMT+05:30)
>>
>> *Title* : Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu,
>>
>>
>>
>> I apologize for the delay in responding. I was performing a manual diff
>> of the last several merges to complete the investigation. Here is what I
>> found.
>>
>>
>>
>> The latest revision on the OFA source trunk is actually 137 (refer to my
>> screenshot below). I’ve manually gone back and made sure that all the
>> patches submitted are in this latest revision. I found no errors or issues.
>> The latest trunk is up to date with all of the patches. If you were trying
>> to apply your patch yourself, you cannot. Only myself (OFA co-chair) and
>> Carolyn Foster (OFA source maintainer) have write permissions to the repo.
>>
>>
>>
>> Each patch listed below is applied to trunk. The only think you need to
>> insure is that your latest concurrent channels modifications are based off
>> the 137 revision. If you have any questions, please me know.
>>
>>
>>
>> Here is the screenshot of all the patches I’ve checked…
>>
>>
>>
>>
>>
>> [image: cid:image002.png at 01D1D1F7.245A73D0]
>>
>>
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On
>> Behalf Of *Robles, Raymond C
>> *Sent:* Friday, June 17, 2016 12:08 PM
>> *To:* Iuliu Rus <iuliur at google.com>
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Thanks Iuliu. I’m currently investigating the issue. Will let everyone
>> know when the trunk is updated.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com <iuliur at google.com>]
>> *Sent:* Friday, June 17, 2016 10:29 AM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>> *Cc:* Thomas Freeman <thomas.freeman at hgst.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> I don't see any changes when i try to update. My svn says : completed at
>> revision 137.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 2:42 PM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hi Iuliu- Please rebase your patch with the latest OFA trunk source. If
>> you do not see any updates during your rebase, please let me know and I’ll
>> investigate. Thanks!
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Thursday, June 16, 2016 2:15 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>
>> *Cc:* Thomas Freeman <thomas.freeman at hgst.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>>
>>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Let me know if i have to sync to the latest and re-do the patch.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 2:10 PM, Iuliu Rus <iuliur at google.com> wrote:
>>
>> We already have a pool of locks - the dpc locks, which is what i'm using
>> in my latest patch. You cannot create spinlocks directly in a storport
>> driver, the only option is to use either the StartIoLock, the dpc lock, the
>> interrupt lock or one of the MSI locks.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 2:04 PM, Robles, Raymond C <
>> raymond.c.robles at intel.com> wrote:
>>
>> Hello,
>>
>>
>>
>> Here are my thoughts…
>>
>>
>>
>> In short, I agree with Thomas’s assessment.
>>
>>
>>
>> Concurrent channels allows Storport to call StartIo concurrently… and
>> with no locks. Therefore the locking of critical sections is now the
>> responsibility of the miniport.
>>
>>
>>
>> Any solution proposed should protect critical resources (cmd id pool, cmd
>> entries, doorbells) with a lock that is granular enough for our driver, but
>> not at the visibility of the OS or Storport. Acquiring the StartIo lock to
>> protect resources in our driver only is too broad of a lock. My suggestion
>> would be to create a pool of locks (one for every SQ/CQ pair) that is
>> maintained within our driver. Again, crashdump mode is a non-issue because
>> it is single threaded.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Thomas Freeman [mailto:thomas.freeman at hgst.com]
>> *Sent:* Thursday, June 16, 2016 1:51 PM
>> *To:* Iuliu Rus <iuliur at google.com>
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* RE: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu,
>>
>> Here my view, I hope someone else can chime in to verify or correct what
>> I'm saying
>>
>>
>>
>> The purpose of the concurrent channels change is to prevent Storport from
>> acquiring the StartIo lock before calling Startio. This allows multiple
>> simultaneous calls to StartIo. If you require ProcessIo get the StartIo
>> lock, you are replicating what Storport used to - it defeats the point of
>> concurrent channels.
>>
>>
>>
>> Here is my suggestion for what needs to be changed
>>
>> 1. There is no need for StartIo Locks - remove the requests to acquire
>> them
>>
>> 2. Acquire the Dpc locks in the following situations
>>
>>    -ProcessIo, before calling GetCmdEntry - if there is queue sharing
>>
>>    -ProcessIo, before calling IssueCmd - always
>>
>>    -IoCompletionRoutine, before calling NVMeCompleteCmd - if there is
>> queue sharing
>>
>>
>>
>>    -recoveryDpcRoutine - get all of the Dpc locks
>>
>>
>>
>>
>>
>> Here is my reasoning
>>
>> -GetCmdEntry and NVMeCompleteCmd both access the FreeQList. With queue
>> sharing, these could run simultaneously and have a collision when they
>> update the list.
>>
>> -IssueCmd. With queue sharing, different cpu's could attempt to put
>> entries on the queue simultaneously. But, to make RecoveryDpcRoutine work,
>> this must be protected by the dpcLock whether or not there is queue
>> sharing.
>>
>>
>>
>> I think this will address the deadlock problem I reported also.
>>
>>
>>
>> Regards,
>>
>> *Tom Freeman*
>>
>> *Software Engineer, Device Manager and Driver Development*
>>
>> Western Digital Corporation
>>
>> e.  Thomas.freeman at hgst.com
>>
>> o.  +1-507-322-2311
>>
>>
>>
>> [image: cid:image001.jpg at 01D1D1F7.245A73D0]
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com <iuliur at google.com>]
>> *Sent:* Thursday, June 16, 2016 2:42 PM
>> *To:* Thomas Freeman <thomas.freeman at hgst.com>
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; nvmewin <
>> nvmewin at lists.openfabrics.org>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Here's my revised patch for concurrent channels. The synchronization
>> rules are now as follows:
>>
>> 1. Init is single threaded so it doesn't matter if acquirelock is true or
>> false
>>
>> 2. Anything called from the DPC has to call ProcessIo with false, since
>> the dpc spinlock is already acquired.
>>
>> 3. Anything called from StartIo has to call ProcessIo with true, since
>> now the startio is not automatically acquired.
>>
>>
>>
>> Thomas, i think the bug that you found is because the ProcessIo was
>> incorrectly called with TRUE instead of FALSE.
>>
>> Let me know what are your thoughts on this.
>>
>>
>>
>> On Thu, Jun 16, 2016 at 10:41 AM, Iuliu Rus <iuliur at google.com> wrote:
>>
>> Guys, am i OK continuing with the patch on the current code base or do
>> you want me to wait until you pull in some more changes?
>>
>>
>>
>> On Wed, Jun 15, 2016 at 8:37 AM, Thomas Freeman <thomas.freeman at hgst.com>
>> wrote:
>>
>> Ray,
>>
>> It appears that the latest patch file from Iuliu is applied to revision
>> 137. I don’t see Suman’s changes in that revision.
>>
>>
>>
>> As an example, the use of StorPortRequestTimer is not in nvmeStat.c.
>>
>>
>>
>> I thought there were some newer revisions than 137, but 137 is the latest
>> I’m able to download.
>>
>> 137 is also the revision that appears when I go to
>> https://svn.openfabrics.org/svnrepo/nvmewin/
>>
>>
>>
>> SVN is giving me some troubles, so I can’t tell for sure if this is my
>> environment or a server issue.
>>
>>
>>
>> *Tom Freeman*
>>
>> *Software Engineer, Device Manager and Driver Development*
>>
>> Western Digital Corporation
>>
>> e.  Thomas.freeman at hgst.com
>>
>> o.  +1-507-322-2311
>>
>>
>>
>> [image: cid:image003.jpg at 01D1D1F7.245A73D0]
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] *On
>> Behalf Of *Robles, Raymond C
>> *Sent:* Friday, June 10, 2016 12:23 PM
>> *To:* Iuliu Rus <iuliur at google.com>; suman.p at samsung.com
>> *Cc:* nvmewin <nvmewin at lists.openfabrics.org>
>>
>>
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu,
>>
>>
>>
>> If you send out the updated patch with the protection in place for both
>> list entries and doorbells, I can run it on an Intel NVMe SSD (P3700) and
>> see if there is any performance impact (up or down).
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com <iuliur at google.com>]
>> *Sent:* Friday, June 10, 2016 10:17 AM
>> *To:* suman.p at samsung.com
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>; Judy Brock <
>> judy.brock at samsung.com>; nvmewin <nvmewin at lists.openfabrics.org>
>> *Subject:* Re: RE: review comments for patch from Google
>>
>>
>>
>> we are running this in a virtual machine, and we do see a perf difference
>> with concurrent channels. We did not observe any performance drop with
>> synchronized access to the list entry.
>>
>>
>>
>> On Fri, Jun 10, 2016 at 7:15 AM, SUMAN PRAKASH B <suman.p at samsung.com>
>> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> As you have the implementation in place, can you let us know if there is
>> any performance improvement when STOR_PERF_CONCURRENT_CHANNELS is set in
>> case of #core and #queues not matching.
>>
>> Also, if there is any drop in performance if the list entry and doorbell
>> is protected after setting STOR_PERF_CONCURRENT_CHANNELS.
>>
>>
>>
>> Perhaps a bit off topic:
>>
>> Even in case when #cores and #queues are same and when
>> STOR_PERF_CONCURRENT_CHANNELS is set, we did not observe any performance
>> gain. Maybe we are hitting the max performance from device even without
>> setting STOR_PERF_CONCURRENT_CHANNELS. Can you please let us know your
>> observation?
>>
>>
>>
>> Thanks,
>>
>> Suman
>>
>>
>>
>> ------- *Original Message* -------
>>
>> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>>
>> *Date* : Jun 10, 2016 08:30 (GMT+05:30)
>>
>> *Title* : RE: review comments for patch from Google
>>
>>
>>
>> Hi Judy,
>>
>>
>>
>> Thanks for the additional info. I agree with the assessment. My statement
>> was directed at the general direction of the solution… and the fact that I
>> think option 1 is not as robust as option 2.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Judy Brock [mailto:judy.brock at samsung.com]
>> *Sent:* Thursday, June 09, 2016 5:14 PM
>> *To:* Robles, Raymond C <raymond.c.robles at intel.com>; Iuliu Rus <
>> iuliur at google.com>; suman.p at samsung.com
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* RE: review comments for patch from Google
>>
>>
>>
>> Hi Ray et al,
>>
>> [Suman wrote] >> Protect the listentry and doorbell before access/update
>>
>>                              >>Do not set STOR_PERF_CONCURRENT_CHANNELS
>> when number of cores and queues are not same.
>>
>>
>>
>> Ray wrote] >>In the case Suman points out, concurrent channels should
>> still be enabled even if the number of cores and queues do not match. This
>> is especially true for large core count systems (>128 logical cpu cores)
>> but devices may only support 32 queues or less
>>
>>
>>
>> I think Suman was saying, we have two options:
>>
>> 1.       either don’t enable it at all if # cores != #queues or
>>
>> 2.       protect both the list entries and doorbell writes before any
>> accesses/updates if #cores != #queues.
>>
>>
>>
>> So we will need to check to see if we need to use locks or not in the
>> performance path based on whether #cores and # queues DO match or not if we
>> want to enable concurrent channels no matter what.   It’s probably better
>> than option 1) above if this is a configuration that can come up frequently.
>>
>>
>>
>> Thanks,
>>
>> Judy
>>
>>
>>
>>
>>
>> *From:* nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org
>> <nvmewin-bounces at lists.openfabrics.org>] *On Behalf Of *Robles, Raymond C
>> *Sent:* Thursday, June 09, 2016 2:32 PM
>> *To:* Iuliu Rus; suman.p at samsung.com
>> *Cc:* nvmewin at lists.openfabrics.org
>> *Subject:* Re: [nvmewin] review comments for patch from Google
>>
>>
>>
>> Hi Iuliu/Suman,
>>
>>
>>
>> I’m of the opinion that any solution provided should be as robust as
>> possible. In the case Suman points out, concurrent channels should still be
>> enabled even if the number of cores and queues do not match. This is
>> especially true for large core count systems (>128 logical cpu cores) but
>> devices may only support 32 queues or less.
>>
>>
>>
>> Thank you…
>>
>> Ray
>>
>>
>>
>> *From:* Iuliu Rus [mailto:iuliur at google.com]
>> *Sent:* Thursday, June 09, 2016 2:09 PM
>> *To:* suman.p at samsung.com
>> *Cc:* Robles, Raymond C <raymond.c.robles at intel.com>;
>> nvmewin at lists.openfabrics.org
>> *Subject:* Re: review comments for patch from Google
>>
>>
>>
>> Which one do you prefer? Our google driver has the synchronization in
>> place already for a different reason. Or, I can follow your second
>> suggestion.
>>
>>
>>
>> On Thu, Jun 9, 2016 at 7:39 AM, SUMAN PRAKASH B <suman.p at samsung.com>
>> wrote:
>>
>> Hi Iuliu,
>>
>>
>>
>> Please find my review comments below:
>>
>>
>>
>> 1. If the number of Cores and Queues supported by device are not same,
>> then setting STOR_PERF_CONCURRENT_CHANNELS will corrupt the submission
>> queue. For example, if device has 4 queues and connected to target PC which
>> has 8 logical processors, and if STOR_PERF_CONCURRENT_CHANNELS is set, all
>> the 8 logical processors will get startIo calls concurrently. As per OFA
>> driver NUMA design, the IO Queue 1 will be shared by core 0 and core 4.
>> There is possibility that core 0 and core 4 will get startio() calls
>> concurrently and will access/update the listenty and queue 1 doorbell
>> register at same time, which will lead to listentry and doorbell register
>> corruption.
>> This problem can be address by following ways:
>> a. Protect the listentry and doorbell before access/update.
>> b. Do not set STOR_PERF_CONCURRENT_CHANNELS when number of cores and
>> queues are not same.
>>
>>
>>
>> 2. We can use pRMT->NumActiveCores which is updated in
>> NVMeEnumNumaCores() instead of using
>> KeQueryActiveProcessorCountEx/KeQueryActiveProcessorCount.
>>
>>
>>
>> Please let us know your opinion.
>>
>>
>>
>> Thanks,
>> Suman
>>
>>
>>
>> ------- *Original Message* -------
>>
>> *Sender* : Robles, Raymond C<raymond.c.robles at intel.com>
>>
>> *Date* : Jun 09, 2016 03:56 (GMT+05:30)
>>
>> *Title* : Re: [nvmewin] Happy New Year... and status update
>>
>>
>>
>> Thank you Iuliu!
>>
>>
>>
>> Reviewing companies, please try to target the review completion for 2
>> weeks from today (6/22/16).
>>
>>
>>
>> ...
>>
>> [Message clipped]
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160629/578c5ee9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2938 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160629/578c5ee9/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 2937 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160629/578c5ee9/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 236073 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160629/578c5ee9/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nvme_source.z_i_p
Type: application/octet-stream
Size: 268314 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20160629/578c5ee9/attachment.obj>


More information about the nvmewin mailing list