[nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

Yong Chen Yong.sc.Chen at huawei.com
Fri Jul 19 11:08:25 PDT 2013


Judy, you are right. There is no ISR or DPC in dump mode. The switch simply saves one function call.

From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com]
Sent: Friday, July 19, 2013 4:30 AM
To: Luse, Paul E; Yong Chen; Freyensee, James P; Murray, Kris R; Po-Yen Chang; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

I am wondering now though if there is some need for at least some variant/vestiges of completion-in-isr-instead-of-dpc.  Isn't it the case that in dump mode - ie either during crash-dump or hibernation -  the driver is not allowed to schedule a DPC in it's ISR to do completions?

We could  put some logic in the ISR to check whether we are in dump mode or not and if we are, do the completions in the ISR itself. After all, if we are in dump mode, we are single-threaded anyway and won't have to worry about pre-emption of our StartIo routine by our ISR.

Thanks,
Judy

From: Luse, Paul E [mailto:paul.e.luse at intel.com]
Sent: Thursday, July 18, 2013 7:21 PM
To: Yong Chen; Freyensee, James P; Murray, Kris R; Po-Yen Chang; Judy Brock-SSI; nvmewin at lists.openfabrics.org
Subject: RE: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

All of the issues could of course be fixed, the point I think is that there's no real reason to fix them....

From: Yong Chen [mailto:Yong.sc.Chen at huawei.com]
Sent: Thursday, July 18, 2013 6:36 PM
To: Luse, Paul E; Freyensee, James P; Murray, Kris R; Po-Yen Chang; Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

I think completion in ISR served some purposes as  I used this switch during development.

Unfortunately I found it broken as well during hibernation waking-up:
SntiTranslateStartStopUnit() callback routine issues ProcessIo() directly in ISR , which will bluescreen.

I was about to put a note in the sources in my code change. So they are multiple ways hitting it as we know it now.

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E
Sent: Thursday, July 18, 2013 4:56 PM
To: Freyensee, James P; Murray, Kris R; Po-Yen Chang; Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

So it was originally in there during early dev to measure system performance impact of completing in the ISR or pushing off to a DPC; general old school rule of thumb is to minimize ISR work and finish everything else in a DPC to be friendlier to the system.  With NVMe completion being so lightweight we figured we could get away without the DPC and it used to work both ways under heavy stress.  As both methods are not always tested (after all it's a compile switch) clearly its easy to break one of them.  I actually ran the tests DPC vs ISR back then and there was no significant impact either way.  I was using xperf and I'm sure I shared the results with the other members of the original team as well - doubt I have them anymore but I'll look.  Either way, at this point in time its probably is a good simplification to pick one method and remove the compile switch for the other.  I'd probably stick with the DPC route as (a) there was no major benefit from finishing in ISR and (b) sounds like its busted now anyway :)

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Freyensee, James P
Sent: Thursday, July 18, 2013 4:43 PM
To: Murray, Kris R; Po-Yen Chang; Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

Out of curiosity, what was the original reason to have the ISR path in the first place?   If it is currently in the driver code, there must had been some purpose to be able to either compile it using an ISR or a DPC.


From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Murray, Kris R
Sent: Thursday, July 18, 2013 4:29 PM
To: Po-Yen Chang; Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

Judy,
I have no problems removing it.
~Kris

From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Po-Yen Chang
Sent: Thursday, July 18, 2013 4:27 PM
To: Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: Re: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?

Judy,

I feel the same way as well. Let's wait for the response from LSI and Intel on this. If they all agree, I will go ahead remove it.

Thanks,
Alex

________________________________
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Judy Brock-SSI
Sent: Thursday, July 18, 2013 4:17 PM
To: Judy Brock-SSI; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] COMPLETE_IN_DPC flag & relationship to NvmeStartio path critical section handling not protected from NVMe ISR?
So it looks like  the reason this problem was not seen before is because it only surfaces when the COMPLETE_IN_DPC compile flag is not set. In other words, the COMPLETE_IN_ISR path is broken because it accesses our HwDeviceExtension without being synchronized with other paths in the driver which do the same.

We can either fix the path which does completions in the ISR or get rid of that option entirely. Since it's generally considered bad practice to do that kind of work in an ISR because it's supposed to be as lean and mean as possible, would the team be adverse to getting rid of the logic which optionally allows completions to be handled by the ISR?

If we insist on retaining it,  a) we should come up with a good reason why and b) we should fix it asap because it is definitely not safe to use in its present form.

Personally I vote for removal  - we wouldn't need the COMPLETE_IN_DPC flag either anymore if we go that route.

Thanks,
Judy

From: Judy Brock-SSI
Sent: Thursday, July 18, 2013 6:22 AM
To: Judy Brock-SSI; 'nvmewin at lists.openfabrics.org'
Subject: RE: NvmeStartio path critical section handling not protected from NVMe ISR?

I just thought of another way to handle this problem.

Could we not call StorPortSynchronizeAccess() with a pointer back to our ProcessIo() routine?  ProcessIo would get called before the call to StorPortSynchronizeAccess() returns and this would have the effect of guaranteeing synchronization with our ISR.

This seems like a much cleaner solution that a lock-acquiring approach.

I still don't know if there are any issues with ProcessIo being called multiple times, from non-StartIo code paths, etc. - would still need to be looked at.

Thanks,
Judy


From: Judy Brock-SSI
Sent: Wednesday, July 17, 2013 10:08 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: NvmeStartio path critical section handling not protected from NVMe ISR?

All,

Under Windows Server 2012,  I've seen a crash where NVMeStartIo() gets interrupted by our ISR at a time when it's  in the middle of manipulating a linked list critical data structure which the ISR then goes on to attempt to manipulate also -  which results in a crash. Below is the call stack - see where I've inserted the  comment "<---STARTIO PATH GETS CLOBBERED BY OUR INTERRUPT HANDLER BECAUSE WE AREN'T HOLDING THE INTERRUPT SPIN LOCK"

2: kd> kc
Call Site
nt!RtlpBreakWithStatusInstruction
nt!KiBugCheckDebugBreak
nt!KeBugCheck2
nt!KeBugCheckEx
nt!KiBugCheckDispatch
nt!KiFastFailDispatch
nt!KiRaiseSecurityCheckFailure
nvme!RtlFailFast
nvme!FatalListEntryError
nvme!RtlpCheckListEntry
nvme!InsertTailList
nvme!NVMeCompleteCmd
nvme!NVMeIsrMsix
nt!KiInterruptDispatch       <---STARTIO PATH GETS CLOBBERED BY OUR INTERRUPT HANDLER BECAUSE WE AREN'T HOLDING THE INTERRUPT SPIN LOCK
nvme!RemoveHeadList
nvme!NVMeGetCmdEntry
nvme!ProcessIo
nvme!NVMeStartIo
storport!RaidpAdapterContinueScatterGather
hal!HalpAllocateAdapterCallbackV2
hal!IoFreeAdapterChannelV2
hal!HalAllocateAdapterChannelV2
hal!HalBuildScatterGatherListV2
storport!RaUnitStartIo
storport!RaidUnitCompleteRequest
storport!RaidpAdapterRedirectDpcRoutine
nt!KiExecuteAllDpcs
nt!KiRetireDpcList

I looked through the code and noticed we never call StorPortAcquireSpinLock to acquire the InterruptLock to protect us from such pre-emption. Another way to achieve this would be to indicate we run at half-duplex rather than full-duplex  but that would degrade the general performance of the driver.

I'm not sure why we didn't run into this way before now - is there some other re-entrance protection algorithm besides the two above that others are aware of?  If not,  I believe we need to fix this asap. Suggestions:


A.     Simplest approach is to lock down all of NVMeStartIo as per below (not tested yet) but we almost may as well run half-duplex if we do this:

1 . At the very the top of NVMeStartIo:

/* we should never be holding the interrupt lock upon entry to NVMeStartIo.
       * Acquire the Interrupt Spin Lock to protect against getting hit by our ISR.
        */

       if (NULL == pAdapterExtension->hInterruptLock) {
        (StorPortAcquireSpinLock(pAdapterExtension,
        InterruptLock,
        NULL,
        &pAdapterExtension->hInterruptLock);
       }
       else {
          ASSERT(FALSE);
       }

2.  At the very the top of IO_StorPortNotification

       PNVME_DEVICE_EXTENSION pAE = (PNVME_DEVICE_EXTENSION) pHwDeviceExtension;

    /* if we got here from NvmeStartIo we need to release the interrupt lock */
       if (NULL != pAE->hInterruptLock) {
             STOR_LOCK_HANDLE hInterruptLockCopy = pAE->hInterruptLock;
             pAE->hInterruptLock = NULL;
        StorPortReleaseSpinLock(pAE, &hInterruptLockCopy);
       }


3. At the very bottom of NVMeStartIo:

       /* if we didn't release the Interrupt Lock in one of the calls to
        * IO_StorPortNotification above we need to release before we exit NVMEStartIo
       */

       if (NULL != pAE->hInterruptLock) {
             STOR_LOCK_HANDLE hInterruptLockCopy = pAE->hInterruptLock;
             pAE->hInterruptLock = NULL;
        StorPortReleaseSpinLock(pAE, &hInterruptLockCopy);
       }

    return TRUE;
} /* NVMeStartIo */


B.     Better approach is to just lock ProcessIo().  But code exists in that routine which acquires the StartIo lock - we can't take locks out of order or we'll cause deadlock.  Right now that code never gets invoked - what was it for? Do we still need it? Can ProcessIo() get called from non-StartIo Paths? Can it get called multiple times? Not having been involved in the initial development of this driver, I would need to study the flow to make sure to respect the StorPort lock acquiring/releasing hierarchy rules at all times. If those conversant in the overall developmental history and architecture of this driver could share their thoughts, that would be great.

Thanks,

Judy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20130719/09883728/attachment.html>


More information about the nvmewin mailing list