[nvmewin] Compatibility issue with 1.5 version nvme driver

Tom Freeman thomas.freeman at wdc.com
Thu Aug 10 05:58:12 PDT 2017


Hi Meenakshikumar,
Thanks for the response.
Regarding the comment about issuing StartIo from another DPC – I agree with your assessment. I’m not sure of Ray’s motivation for that comment.
Ray should be back next week. Let’s wait to get an explanation from him.


Tom Freeman
Software Engineer, Device Manager and Driver Development

Western Digital®
Email:  Thomas.Freeman at wdc.com
Office:  +1-507-322-2311


From: Meenakshikumar Somasundaram [mailto:sm.kumar at samsung.com]
Sent: Wednesday, August 9, 2017 9:29 AM
To: Tom Freeman <thomas.freeman at wdc.com>; Judy Brock <judy.brock at samsung.com>; Robles, Raymond C <raymond.c.robles at intel.com>; Justina Lai <justina_lai at phison.com>
Cc: nvmewin at lists.openfabrics.org
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver


Dear Tom,



Kindly check my comments in brown color and share your opinion.



Regards,

Meenakshikumar



--------- Original Message ---------

Sender : Tom Freeman <thomas.freeman at wdc.com<mailto:thomas.freeman at wdc.com>>

Date : 2017-08-07 22:14 (GMT+5:30)

Title : Re: [nvmewin] Compatibility issue with 1.5 version nvme driver


Hi Judy and others,
In Ray's absense, I'm trying to pick up this thread again. I'd like to verify we agree on what is required to complete this patch.

1. As Judy stated before, the following is missing from Justina's original patch to nvmeIo.c on 5/25 - (cores == queues), all command completion handlers issuing a new command should require ProcessIO to get the StartIO lock.

This affects calls to ProcessIo from these methods - SntiTranslatePersReserveOutResponse, SntiTranslateTemperatureResponse, SntiTranslateModeSenseResponse, SntiTranslateStartStopUnitResponse, SntiTranslateWriteBufferResponse, NVMeHandleSmartThresholds, NVMeCompletionNsAttachment, NVMeCompletionNsAttachment, FormatNVMGetIdentify.

Yes, we agree. We need to this.

2. Based on Ray's response on 6/27 "I’m ok with acquiring the StartIo lock as long as it’s in a new DPC spawned by the IOCompletionRoutine", instead of calling ProcessIo directly, IoCompletionRoutine should issue a dpc that will take the startIo flag and invoke processIo.

As the IoCompletionRoutine is already called in DPC context, should we need to queue another DPC again to issue secondary admin command?

Do you agree? Is there anything else that should be included?

Tom Freeman
Software Engineer, Device Manager and Driver Development

Western Digital®
Email:  Thomas.Freeman at wdc.com<mailto:Thomas.Freeman at wdc.com>
Office:  +1-507-322-2311


From: Judy Brock [mailto:judy.brock at samsung.com]
Sent: Friday, June 30, 2017 12:37 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Tom Freeman <thomas.freeman at wdc.com<mailto:thomas.freeman at wdc.com>>; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Yes, we agree. That is why we included the following statement below:

“The various translation completion command handlers should all be consistent and SHOULD take the StartIO lock in ProcessIO if trying to issue a new command”

Right now, they are not all consistent.

Thanks,
Judy

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Friday, June 30, 2017 10:31 AM
To: Judy Brock; Tom Freeman; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai
Cc: Larry Li; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Judy,

Thank you for your feedback. I think we are all in agreement on Phison’s patch.

My only concern is the original patch did not address any changes in nvmeSnti.c where the completion path needed to invoke the StartIo lock appropriately. Did you observe the same?

Thanks…
Ray

From: Judy Brock [mailto:judy.brock at samsung.com]
Sent: Friday, June 30, 2017 6:16 AM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Tom Freeman <thomas.freeman at wdc.com<mailto:thomas.freeman at wdc.com>>; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Ray et al,

We just want to clarify our position/feedback after all the recent back and forth.

In the MultipleCoresToSingleQueueFlag == TRUE completion path case:

•        we understand and agree with acquiring the StartIO lock in the IOCompletion Routine. We do not support acquiring a DPC lock instead.

•        we agree the code SHOULD NOT try to take the StartIO lock in ProcessIO  if trying to issue a new secondary command (this caused the deadlock Justina reported)

•        we are ok with Justina’s original fix with respect to this issue

In the MultipleCoresToSingleQueueFlag == FALSE completion path case (i.e. cores == queues):

•        The IOCompletion routine should take a DPC lock (as it does today)

•       The various translation completion command handlers should all be consistent and SHOULD take the StartIO lock in ProcessIO if trying to issue a new command

Thanks,
Judy

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Tuesday, June 27, 2017 4:29 PM
To: Tom Freeman; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai
Cc: Larry Li; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Tom,

Thanks for bearing with me on the details of this. After looking more closely at the code, I agree with your assessment in the case where a DPC is issued for a secondary, internal command (we do need to acquire the StartIo lock).

I’ll wait for Phison to catch up on this thread and provide the update on their unit testing.

Thanks…
Ray

From: Tom Freeman [mailto:thomas.freeman at wdc.com]
Sent: Tuesday, June 27, 2017 11:33 AM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Thanks Ray,
I'm still confused - and I think it relates to a basic question I have about dpc routing. If I understand it correctly, IoCompletionRoutine is queued to run on the core matching its submission/completion queue. But, that instance of IoCompletionRoutine could simultaneously run on other cores. That's why we get the DpcLock, to synchronize multiple threads running the same instance of the dpc, IoCompletionRoutine.

If that's correct, I think the following example shows why we need to get the StartIo lock in the completion path when it is issuing another NVMe command.

The code path StartIo->...->ProcessIo issues i/o to submission queue 3 (on core 2). StartIo lock is automatically taken.
Simultaneously, IoCompletionRoutine->....->ProcessIo() for comp queue 3 could be executing on a core other than core 2. If so, unless IoCompletionRoutine->....->ProcessIo() gets the StartIo lock, it will not be synchronized with StartIo->...->Process() on core 2.

Let me know your thoughts,
Tom Freeman
Software Engineer, Device Manager and Driver Development

Western Digital®
Email:  Thomas.Freeman at wdc.com<mailto:Thomas.Freeman at wdc.com>
Office:  +1-507-322-2311


From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Monday, June 26, 2017 7:46 PM
To: Tom Freeman <thomas.freeman at wdc.com<mailto:thomas.freeman at wdc.com>>; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Tom,

Your statement is correct. I’m probably not being clear… sorry for the confusion.

My assertion is that when cpu == cores, the driver should not be grabbing the StartIo lock as a practice. The StartIo lock is meant for Storport to aquire for call to the HwStartIo callback. If we are in the completion DPC, then that means we have already acquired the DPC lock, and we should not be acquiring any other lock as part of that DPC calling back into ProcessIo for a second command. That is the deadlock to which I’m referring (and the issue that Meenakshikumar from Samsung references).

In my opinion, this issue should be resolved as part of this patch.

Thoughts?

Thanks…
Ray

From: Tom Freeman [mailto:thomas.freeman at wdc.com]
Sent: Monday, June 26, 2017 1:31 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Ray,
I missing something in the deadlock discussion. Since IoCompletionRoutine and StartIo run at the same IRQL, wouldn’t one run to completion before the other is scheduled thus preventing the deadlock?

Thanks,
Tom Freeman
Software Engineer, Device Manager and Driver Development

Western Digital®
Email:  Thomas.Freeman at wdc.com<mailto:Thomas.Freeman at wdc.com>
Office:  +1-507-322-2311


From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Friday, June 23, 2017 2:17 PM
To: sm.kumar at samsung.com<mailto:sm.kumar at samsung.com>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Meenakshikumar,

Thanks for reviewing this patch and my comments. I agree with your statement below (highlighted).

I was attempting to drive the possible solution where cores == queues… when admin commands can come from both Storport and the completion path. As you state, there are still several areas where a the lock is acquired for commands in the completion path. In order to address, I believe the correct solution is to actually not grab the StartIO lock in the completion path, but instead a DPC lock. Both run at the same IRQL, and will be handled appropriately by the OS with no deadlock.

Essentially:


•         The OFA driver is acquiring StartIO Lock from processIo & IOCompletionDpcRoutine. So there is possibility for deadlock when we have an OS command requires more than one NVMe command.

•         I’m proposing we acquire the StartIO Lock from processIo and a DPC Lock from IOCompletionDpcRoutine. So there is no possibility of same lock acquired twice in single call stack.

•         I would also suggest removing any lingering areas where we issue a new DPC from IOCompletionDpcRoutine to call processIo… so that DPC waiting time for lock is reduced.

Thanks…
Ray

From: MEENAKSHIKUMAR SOMASUNDARAM [mailto:sm.kumar at samsung.com]
Sent: Friday, June 23, 2017 12:31 AM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver


Hi Ray,



This is Meenakshikumar from Samsung. I have reviewed the fix and it looks fine to handle pAE->MultipleCoresToSingleQueueFlag = TRUE case while a new admin command is issued in completion path.



For your comment, I am sharing my thoughts below :

•        When multiple cpu cores are mapped with single queue, the patch is removing startIoLock from processIO and startIoLock used only in IOCompletionDpcRoutine:

•        This will block NvmeStartIo when driver is processing a completion entry.



==>  In OFA driver, Storport will acquire the StartIoLock before calling NVMeStartIo. We acquire StartIo lock only in the completion path in case if cores != queues, to synchronize multiple cores trying to access single queue.

So in OFA driver, only if cores == queues, then both StartIO and DPCs will run concurrently.



Apart from this, in pAE->MultipleCoresToSingleQueueFlag = FALSE case, the intention to synchronize issuing admin command from StartIO & various completion paths is not handled in similar fashion in all cases.

For example, in SntiTranslateModeSenseResponse(), FormatNVMGetIdentify(), etc., lock is acquired. But in SntiTranslateTemperatureResponse(), SntiTranslateStartStopUnitResponse(), SntiTranslateWriteBufferResponse() etc., StartIO lock is not acquired.

There could be race condition b/w an IOCTL and Completion path trying to issue admin command. This should be addressed in the OFA driver, might be in a different patch.



Thanks,

Meenakshikumar



--------- Original Message ---------

Sender : Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>

Date : 2017-06-23 04:50 (GMT+5:30)

Title : Re: [nvmewin] Compatibility issue with 1.5 version nvme driver


Hi Justina,

Intel has reviewed your and tested your patch. Please see the following comments:


•        Need to remove code blocks in #if 0

•        When multiple cpu cores are mapped with single queue, the patch is removing startIoLock from processIO and startIoLock used only in IOCompletionDpcRoutine:

•        This will block NvmeStartIo when driver is processing a completion entry.

•        A couple of options:

o   Indicate in nvmeSnti.c when handling operations that require more than one command to the drive, and have that locking needs passed to ProcessIo appropriately. It does not seem plausible to isolate the changes to nvmeIO.c only.

o   You could also instead remove startIoLock from IoCompletionDpcRoutine while keeping startIoLock at processIo. This way we can avoid double lock and also we can make NvmeStartIo and IoCompletionDpcRoutine parallel.

The key takeaway is that StartIo and IoCompletionDpcRoutine must be able to run concurrently with no deadlocks. As you’ve coded the patch, a deadlock is possible. Please revise your patch to account for this scenario.

Thanks…
Ray

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Thursday, June 22, 2017 3:55 PM
To: Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi Justina,

Please go the Open Fabrics Alliance website and subscribe to our reflector on this page:  http://lists.openfabrics.org/mailman/listinfo/nvmewin

The main NVMe WG page can be found here: https://www.openfabrics.org/index.php/working-groups.html

Thanks…
Ray

From: Justina Lai [mailto:justina_lai at phison.com]
Sent: Monday, June 19, 2017 11:45 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Compatibility issue with 1.5 version nvme driver

Dear Raymond,

Yes, I have received your mail regarding unit test.
We are now carrying on the test and hope to finish it within this week.

May I know how to officially subscribe to the reflector email list?
I saw my mail is already in the Non-digested Members of nvmewin. Should I set as digested mode?

Thank you!

Best Regards,
Justina Lai
#5707

From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Tuesday, June 20, 2017 5:37 AM
To: Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Compatibility issue with 1.5 version nvme driver

Hi Justina,

Yes, we did receive the email and have been emailing you. I notice that your emails keep getting bounced off the reflector list and I have to approve them. Could you please officially subscribe to the reflector email list so that your emails are not bounced (which require me to approve and forward).

There was an additional request for you to run the normal unit tests required for all OFA patches. I’ll forward that email again after you officially subscribe the email list. Did you receive that email?

Thanks…
Ray

From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Justina Lai
Sent: Wednesday, May 24, 2017 10:18 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver

Hi,

Have you received my mail on 5/22?
Thanks!

Best Regards,
Justina Lai
#5707

From: Justina Lai
Sent: Monday, May 22, 2017 1:14 PM
To: 'nvmewin at lists.openfabrics.org' <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>
Subject: Compatibility issue with 1.5 version nvme driver

Dear Sir/Madam,

We are facing compatibility issue on our PS5007 NVMe device with recent 1.5 version nvme driver.
Please check below issue description.

Phison PS5007 supports 7 IO queues, and if we use PS5007 with OFA driver on the platform core number>7, ex: 8-core or 12-core PC, we will see PC hang up and cannot enter OS like below:
[cid:image001.jpg at 01D2EE8D.0DB3EAE0]

After debugging on our side, we found the fail is caused by below flow:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
……….
1.     SntiTranslateModeSense()

……….
2.IoCompletionRoutine()
    if (pDpc != NULL) {
        ASSERT(pAE->ntldrDump == FALSE);
        if (pAE->MultipleCoresToSingleQueueFlag) {
            StorPortAcquireSpinLock(pAE, StartIoLock, NULL, &StartLockHandle); ----------------------> execute lock
        } else {
            StorPortAcquireSpinLock(pAE, DpcLock, pDpc, &DpcLockhandle);
        }
    }
………
callStorportNotification = pSrbExtension->pNvmeCompletionRoutine(pAE, (PVOID)pSrbExtension)
                            && (pSrbExtension->pSrb != NULL);
…….
3.SntiTranslateModeSenseResponse()

…….
case MODE_SENSE_RETURN_ALL:
…….
if (supportsVwc == TRUE) {

                                                          pSrbExt->pNvmeCompletionRoutine = SntiCompletionCallbackRoutine;

                                                          /* Finally, make sure we issue the GET FEATURES command */
                                                          SntiBuildGetFeaturesCmd(pSrbExt, VOLATILE_WRITE_CACHE);

                                                          ioStarted = ProcessIo(pSrbExt->pNvmeDevExt,
                                                                   pSrbExt,
                                                                   NVME_QUEUE_TYPE_ADMIN,
                                                                   TRUE);


4.ProcessIo()
        ……..
        if (AcquireLock == TRUE) {
            StorPortAcquireSpinLock(pAdapterExtension,
                                 StartIoLock,
                                 NULL,
                                 &hStartIoLock);  ------------------------> double execute lock and cause (Assertion failure - code c0000420)


        }
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Since many users are now using our PS5007 device with 8-core or 12-core PC, they are facing this issue right now. Could you please help to modify driver to solve this problem asap?

Any unclear point, please kindly let us know.
Thank you very much for the help!

Best Regards,
Justina Lai
#5707



This message and any attachments are confidential and may be legally privileged. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please immediately notify the sender, completely delete the message and any attachments, and destroy all copies. Your cooperation will be highly appreciated.


This message and any attachments are confidential and may be legally privileged. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please immediately notify the sender, completely delete the message and any attachments, and destroy all copies. Your cooperation will be highly appreciated.

_______________________________________________

nvmewin mailing list

nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>

http://lists.openfabrics.org/mailman/listinfo/nvmewin
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

_______________________________________________

nvmewin mailing list

nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>

http://lists.openfabrics.org/mailman/listinfo/nvmewin

[cid:image002.png at 01D311AE.67AB7D30]

[http://ext.samsung.net/mail/ext/v1/external/status/update?userid=sm.kumar&do=bWFpbElEPTIwMTcwODA5MTQyODQyZXBjbXM1cDNkZGNjODkxZjU5YmI1YjIyMTM4NzY5OWFkMmMxZmI0NSZyZWNpcGllbnRBZGRyZXNzPXRob21hcy5mcmVlbWFuQHdkYy5jb20_]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170810/35898221/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 11633 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170810/35898221/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 33527 bytes
Desc: image002.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170810/35898221/attachment.png>


More information about the nvmewin mailing list