[nvmewin] NvmeStartio path critical section handling not protected from NVMe ISR?

Judy Brock-SSI judy.brock at ssi.samsung.com
Wed Jul 17 22:08:05 PDT 2013


All,

Under Windows Server 2012,  I've seen a crash where NVMeStartIo() gets interrupted by our ISR at a time when it's  in the middle of manipulating a linked list critical data structure which the ISR then goes on to attempt to manipulate also -  which results in a crash. Below is the call stack - see where I've inserted the  comment "<---STARTIO PATH GETS CLOBBERED BY OUR INTERRUPT HANDLER BECAUSE WE AREN'T HOLDING THE INTERRUPT SPIN LOCK"

2: kd> kc
Call Site
nt!RtlpBreakWithStatusInstruction
nt!KiBugCheckDebugBreak
nt!KeBugCheck2
nt!KeBugCheckEx
nt!KiBugCheckDispatch
nt!KiFastFailDispatch
nt!KiRaiseSecurityCheckFailure
nvme!RtlFailFast
nvme!FatalListEntryError
nvme!RtlpCheckListEntry
nvme!InsertTailList
nvme!NVMeCompleteCmd
nvme!NVMeIsrMsix
nt!KiInterruptDispatch       <---STARTIO PATH GETS CLOBBERED BY OUR INTERRUPT HANDLER BECAUSE WE AREN'T HOLDING THE INTERRUPT SPIN LOCK
nvme!RemoveHeadList
nvme!NVMeGetCmdEntry
nvme!ProcessIo
nvme!NVMeStartIo
storport!RaidpAdapterContinueScatterGather
hal!HalpAllocateAdapterCallbackV2
hal!IoFreeAdapterChannelV2
hal!HalAllocateAdapterChannelV2
hal!HalBuildScatterGatherListV2
storport!RaUnitStartIo
storport!RaidUnitCompleteRequest
storport!RaidpAdapterRedirectDpcRoutine
nt!KiExecuteAllDpcs
nt!KiRetireDpcList

I looked through the code and noticed we never call StorPortAcquireSpinLock to acquire the InterruptLock to protect us from such pre-emption. Another way to achieve this would be to indicate we run at half-duplex rather than full-duplex  but that would degrade the general performance of the driver.

I'm not sure why we didn't run into this way before now - is there some other re-entrance protection algorithm besides the two above that others are aware of?  If not,  I believe we need to fix this asap. Suggestions:


A.     Simplest approach is to lock down all of NVMeStartIo as per below (not tested yet) but we almost may as well run half-duplex if we do this:

1 . At the very the top of NVMeStartIo:

/* we should never be holding the interrupt lock upon entry to NVMeStartIo.
       * Acquire the Interrupt Spin Lock to protect against getting hit by our ISR.
        */

       if (NULL == pAdapterExtension->hInterruptLock) {
        (StorPortAcquireSpinLock(pAdapterExtension,
        InterruptLock,
        NULL,
        &pAdapterExtension->hInterruptLock);
       }
       else {
          ASSERT(FALSE);
       }

2.  At the very the top of IO_StorPortNotification

       PNVME_DEVICE_EXTENSION pAE = (PNVME_DEVICE_EXTENSION) pHwDeviceExtension;

    /* if we got here from NvmeStartIo we need to release the interrupt lock */
       if (NULL != pAE->hInterruptLock) {
             STOR_LOCK_HANDLE hInterruptLockCopy = pAE->hInterruptLock;
             pAE->hInterruptLock = NULL;
        StorPortReleaseSpinLock(pAE, &hInterruptLockCopy);
       }


3. At the very bottom of NVMeStartIo:

       /* if we didn't release the Interrupt Lock in one of the calls to
        * IO_StorPortNotification above we need to release before we exit NVMEStartIo
       */

       if (NULL != pAE->hInterruptLock) {
             STOR_LOCK_HANDLE hInterruptLockCopy = pAE->hInterruptLock;
             pAE->hInterruptLock = NULL;
        StorPortReleaseSpinLock(pAE, &hInterruptLockCopy);
       }

    return TRUE;
} /* NVMeStartIo */


B.     Better approach is to just lock ProcessIo().  But code exists in that routine which acquires the StartIo lock - we can't take locks out of order or we'll cause deadlock.  Right now that code never gets invoked - what was it for? Do we still need it? Can ProcessIo() get called from non-StartIo Paths? Can it get called multiple times? Not having been involved in the initial development of this driver, I would need to study the flow to make sure to respect the StorPort lock acquiring/releasing hierarchy rules at all times. If those conversant in the overall developmental history and architecture of this driver could share their thoughts, that would be great.

Thanks,

Judy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20130718/11490963/attachment.html>


More information about the nvmewin mailing list