[nvmewin] Compatibility issue with 1.5 version nvme driver
Tom Freeman
thomas.freeman at wdc.com
Mon Jun 26 13:31:26 PDT 2017
Hi Ray,
I missing something in the deadlock discussion. Since IoCompletionRoutine and StartIo run at the same IRQL, wouldn’t one run to completion before the other is scheduled thus preventing the deadlock?
Thanks,
Tom Freeman
Software Engineer, Device Manager and Driver Development
Western Digital®
Email: Thomas.Freeman at wdc.com
Office: +1-507-322-2311
From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Friday, June 23, 2017 2:17 PM
To: sm.kumar at samsung.com; Justina Lai <justina_lai at phison.com>
Cc: Larry Li <larry_li at phison.com>; nvmewin at lists.openfabrics.org
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver
Hi Meenakshikumar,
Thanks for reviewing this patch and my comments. I agree with your statement below (highlighted).
I was attempting to drive the possible solution where cores == queues… when admin commands can come from both Storport and the completion path. As you state, there are still several areas where a the lock is acquired for commands in the completion path. In order to address, I believe the correct solution is to actually not grab the StartIO lock in the completion path, but instead a DPC lock. Both run at the same IRQL, and will be handled appropriately by the OS with no deadlock.
Essentially:
· The OFA driver is acquiring StartIO Lock from processIo & IOCompletionDpcRoutine. So there is possibility for deadlock when we have an OS command requires more than one NVMe command.
· I’m proposing we acquire the StartIO Lock from processIo and a DPC Lock from IOCompletionDpcRoutine. So there is no possibility of same lock acquired twice in single call stack.
· I would also suggest removing any lingering areas where we issue a new DPC from IOCompletionDpcRoutine to call processIo… so that DPC waiting time for lock is reduced.
Thanks…
Ray
From: MEENAKSHIKUMAR SOMASUNDARAM [mailto:sm.kumar at samsung.com]
Sent: Friday, June 23, 2017 12:31 AM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>; Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver
Hi Ray,
This is Meenakshikumar from Samsung. I have reviewed the fix and it looks fine to handle pAE->MultipleCoresToSingleQueueFlag = TRUE case while a new admin command is issued in completion path.
For your comment, I am sharing my thoughts below :
• When multiple cpu cores are mapped with single queue, the patch is removing startIoLock from processIO and startIoLock used only in IOCompletionDpcRoutine:
• This will block NvmeStartIo when driver is processing a completion entry.
==> In OFA driver, Storport will acquire the StartIoLock before calling NVMeStartIo. We acquire StartIo lock only in the completion path in case if cores != queues, to synchronize multiple cores trying to access single queue.
So in OFA driver, only if cores == queues, then both StartIO and DPCs will run concurrently.
Apart from this, in pAE->MultipleCoresToSingleQueueFlag = FALSE case, the intention to synchronize issuing admin command from StartIO & various completion paths is not handled in similar fashion in all cases.
For example, in SntiTranslateModeSenseResponse(), FormatNVMGetIdentify(), etc., lock is acquired. But in SntiTranslateTemperatureResponse(), SntiTranslateStartStopUnitResponse(), SntiTranslateWriteBufferResponse() etc., StartIO lock is not acquired.
There could be race condition b/w an IOCTL and Completion path trying to issue admin command. This should be addressed in the OFA driver, might be in a different patch.
Thanks,
Meenakshikumar
--------- Original Message ---------
Sender : Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>
Date : 2017-06-23 04:50 (GMT+5:30)
Title : Re: [nvmewin] Compatibility issue with 1.5 version nvme driver
Hi Justina,
Intel has reviewed your and tested your patch. Please see the following comments:
• Need to remove code blocks in #if 0
• When multiple cpu cores are mapped with single queue, the patch is removing startIoLock from processIO and startIoLock used only in IOCompletionDpcRoutine:
• This will block NvmeStartIo when driver is processing a completion entry.
• A couple of options:
o Indicate in nvmeSnti.c when handling operations that require more than one command to the drive, and have that locking needs passed to ProcessIo appropriately. It does not seem plausible to isolate the changes to nvmeIO.c only.
o You could also instead remove startIoLock from IoCompletionDpcRoutine while keeping startIoLock at processIo. This way we can avoid double lock and also we can make NvmeStartIo and IoCompletionDpcRoutine parallel.
The key takeaway is that StartIo and IoCompletionDpcRoutine must be able to run concurrently with no deadlocks. As you’ve coded the patch, a deadlock is possible. Please revise your patch to account for this scenario.
Thanks…
Ray
From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Robles, Raymond C
Sent: Thursday, June 22, 2017 3:55 PM
To: Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver
Hi Justina,
Please go the Open Fabrics Alliance website and subscribe to our reflector on this page: http://lists.openfabrics.org/mailman/listinfo/nvmewin
The main NVMe WG page can be found here: https://www.openfabrics.org/index.php/working-groups.html
Thanks…
Ray
From: Justina Lai [mailto:justina_lai at phison.com]
Sent: Monday, June 19, 2017 11:45 PM
To: Robles, Raymond C <raymond.c.robles at intel.com<mailto:raymond.c.robles at intel.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Compatibility issue with 1.5 version nvme driver
Dear Raymond,
Yes, I have received your mail regarding unit test.
We are now carrying on the test and hope to finish it within this week.
May I know how to officially subscribe to the reflector email list?
I saw my mail is already in the Non-digested Members of nvmewin. Should I set as digested mode?
Thank you!
Best Regards,
Justina Lai
#5707
From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Tuesday, June 20, 2017 5:37 AM
To: Justina Lai <justina_lai at phison.com<mailto:justina_lai at phison.com>>
Cc: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Compatibility issue with 1.5 version nvme driver
Hi Justina,
Yes, we did receive the email and have been emailing you. I notice that your emails keep getting bounced off the reflector list and I have to approve them. Could you please officially subscribe to the reflector email list so that your emails are not bounced (which require me to approve and forward).
There was an additional request for you to run the normal unit tests required for all OFA patches. I’ll forward that email again after you officially subscribe the email list. Did you receive that email?
Thanks…
Ray
From: nvmewin [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Justina Lai
Sent: Wednesday, May 24, 2017 10:18 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] Compatibility issue with 1.5 version nvme driver
Hi,
Have you received my mail on 5/22?
Thanks!
Best Regards,
Justina Lai
#5707
From: Justina Lai
Sent: Monday, May 22, 2017 1:14 PM
To: 'nvmewin at lists.openfabrics.org' <nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>>
Cc: Larry Li <larry_li at phison.com<mailto:larry_li at phison.com>>
Subject: Compatibility issue with 1.5 version nvme driver
Dear Sir/Madam,
We are facing compatibility issue on our PS5007 NVMe device with recent 1.5 version nvme driver.
Please check below issue description.
Phison PS5007 supports 7 IO queues, and if we use PS5007 with OFA driver on the platform core number>7, ex: 8-core or 12-core PC, we will see PC hang up and cannot enter OS like below:
[cid:image001.jpg at 01D2EE8D.0DB3EAE0]
After debugging on our side, we found the fail is caused by below flow:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
……….
1. SntiTranslateModeSense()
……….
2.IoCompletionRoutine()
if (pDpc != NULL) {
ASSERT(pAE->ntldrDump == FALSE);
if (pAE->MultipleCoresToSingleQueueFlag) {
StorPortAcquireSpinLock(pAE, StartIoLock, NULL, &StartLockHandle); ----------------------> execute lock
} else {
StorPortAcquireSpinLock(pAE, DpcLock, pDpc, &DpcLockhandle);
}
}
………
callStorportNotification = pSrbExtension->pNvmeCompletionRoutine(pAE, (PVOID)pSrbExtension)
&& (pSrbExtension->pSrb != NULL);
…….
3.SntiTranslateModeSenseResponse()
…….
case MODE_SENSE_RETURN_ALL:
…….
if (supportsVwc == TRUE) {
pSrbExt->pNvmeCompletionRoutine = SntiCompletionCallbackRoutine;
/* Finally, make sure we issue the GET FEATURES command */
SntiBuildGetFeaturesCmd(pSrbExt, VOLATILE_WRITE_CACHE);
ioStarted = ProcessIo(pSrbExt->pNvmeDevExt,
pSrbExt,
NVME_QUEUE_TYPE_ADMIN,
TRUE);
4.ProcessIo()
……..
if (AcquireLock == TRUE) {
StorPortAcquireSpinLock(pAdapterExtension,
StartIoLock,
NULL,
&hStartIoLock); ------------------------> double execute lock and cause (Assertion failure - code c0000420)
}
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Since many users are now using our PS5007 device with 8-core or 12-core PC, they are facing this issue right now. Could you please help to modify driver to solve this problem asap?
Any unclear point, please kindly let us know.
Thank you very much for the help!
Best Regards,
Justina Lai
#5707
This message and any attachments are confidential and may be legally privileged. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please immediately notify the sender, completely delete the message and any attachments, and destroy all copies. Your cooperation will be highly appreciated.
This message and any attachments are confidential and may be legally privileged. Any unauthorized review, use or distribution by anyone other than the intended recipient is strictly prohibited. If you are not the intended recipient, please immediately notify the sender, completely delete the message and any attachments, and destroy all copies. Your cooperation will be highly appreciated.
_______________________________________________
nvmewin mailing list
nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
http://lists.openfabrics.org/mailman/listinfo/nvmewin
[cid:image002.png at 01D2EE8D.0DB3EAE0]
[http://ext.samsung.net/mail/ext/v1/external/status/update?userid=sm.kumar&do=bWFpbElEPTIwMTcwNjIzMDczMTI4ZXBjbXM1cDY4MTZmNDk4MDA2ODllYzdlY2VlMGI0YjdlNzdmN2I3MSZyZWNpcGllbnRBZGRyZXNzPXJheW1vbmQuYy5yb2JsZXNAaW50ZWwuY29t]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170626/9d4953c9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 11633 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170626/9d4953c9/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 33527 bytes
Desc: image002.png
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20170626/9d4953c9/attachment.png>
More information about the nvmewin
mailing list