[nvmewin] Handling pending commands when processing Format [changing from NVMe WG dist. list to OFA NVMe Windows Driver dist. list]

Robles, Raymond C raymond.c.robles at intel.com
Thu Apr 18 18:07:58 PDT 2013


Hi Judy,

Ahhh… I see now.  I didn’t answer that question below.  The format command is essentially built into the driver by a state machine.

When we receive the format command we immediately issue the *hot remove* command… but that is done inline. So, once we call Storport to kick off the enumeration, we simply return back to handling the format command in NVMeStartIoProcessIoctl().  The appropriate states are set along the way to indicate progress. Once the namespace is removed from the “OS view”, then the format is processed like any other command (via ProcessIo). The callback is setup to call NVMeIoctlFormatNVMCAllback() and the variable “FormatNvmInfo->AddNamespaceNeeded” is set to TRUE so that on the completion side we remember to have the OS re-enumerate after we are done.  Once the NVM format completes, the callback is invoked in the completion DPC. Then on the completion side we issue Identify Controller and Identify Namespace so that our cached driver data for the namespace(s) formatted are up to date. In the last state, after getting the Identify Namespace struct, we’ll call *hot add* which is described below.

Note that at no point do we “wait” for any I/O to finish. Format is a dangerous command… especially via pass through IOCTL. We talked about this quite a bit in the beginning of developing this driver.  But essentially, if a format comes down for a namespace, any I/O outstanding to the controller (there won’t be anything that needs to be sent… I/O will either be at the device or on the CQ) will simply complete via normal operation or be aborted at the controller… but Storport won’t care because the SCSI target was already removed upon initially receiving the format command. Any I/O in the CQ will be completed and handled by Storport correctly.

Thanks,
Ray

From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com]
Sent: Thursday, April 18, 2013 4:47 PM
To: Robles, Raymond C; WONMOON CHEON; ???; nvmewin at lists.openfabrics.org
Subject: RE: RE: Handling pending commands when processing Format [changing from NVMe WG dist. list to OFA NVMe Windows Driver dist. list]

Hi Ray,
    [Ray wrote] Let me know if this answers your question.
I don’t think it does. What I wrote below I think was pretty much the same as what you wrote - or at least that was my intention ☺.
However, the piece I couldn’t explain (cause I haven’t looked into it) is how the driver holds off the beginning of the actual format NVM operation till whatever old IOs that were already in progress for the namespace(s) before the format op request was received are completed back to the caller, aborted, or whatever - so there are no old live requests hanging around, still in the driver , before the format op begins.
In other words, does the driver hold off starting the format cmd till the outstanding IOs are completed? Or do we perhaps just drop them on the floor and let the OS figure out that those requests are permanently lost/gone due to the LUNs having disappeared (my guess is, the latter is what we do)? Or do we try to abort them all? And so on.
So again, we do understand how to get the OS to avoid sending new I/O requests to stale namespaces but how exactly are the old I/O reqs (those existing at the time the format request comes in ) handled?
At least that is my current question ☺
Thanks,
Judy


From: Robles, Raymond C [mailto:raymond.c.robles at intel.com]
Sent: Thursday, April 18, 2013 2:46 PM
To: Judy Brock-SSI; WONMOON CHEON; ???; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: RE: Handling pending commands when processing Format [changing from NVMe WG dist. list to OFA NVMe Windows Driver dist. list]

Hello Judy/Wonmoon,
Sorry for the late response. The “hot remove” state is just a state that we enter when the driver receives a Format command.  Basically, this state will remove the namespace(s) from the topology by calling StorPortNotification() with BusChangeDetected.  This will remove the “SCSI target/disk” associated with each namespace form the OS (because Storport will re-enumerate the controller and the driver will not expose the namespaces about to be formatted) so that the format can occur on the relevant namespaces.
By signaling Windows that the namespaces have been “removed”, all I/O will be stopped by the OS.  Then the format can complete.  Once the format is complete, we perform the opposite action to “hot add” the namespace back into the topology by calling StorPortNotification() with BusChangeDetected… only this time, we will surface the namespace(s) again when Storport re-enumerates.
Not queues are deleted in this state, no memory is de-allocated, and nothing else changes about the namespace.  This is simply the first step (in a 3 step sequence) when formatting a namespace as we cannot format a namespace while the OS is aware of its presence and could be potentially sending I/O to a stale namespace config (i.e. changing LBA/sector size).
Let me know if this answers your question.
Thanks,
Ray

From: Judy Brock-SSI [mailto:judy.brock at ssi.samsung.com]
Sent: Wednesday, April 17, 2013 6:21 AM
To: WONMOON CHEON; 강미경; technical at nvmexpress.org<mailto:technical at nvmexpress.org>
Subject: RE: RE: Handling pending commands when processing Format

  >>Would you elaborate more about  the "hot-remove" state? In this state, do you mean that all the IO SQ/CQs are deleted? Or, waiting for completions of all the outstanding IOs?
The IO SQ/CQs are definitely NOT deleted. I would need to look more closely through the driver code to see how IOs previously sent to the namespaces which are marked as OFFLINE are handled/finished/quiesced.
There are other folks on this thread who no doubt have more history/intimate knowledge of this driver than I do who may answer that question more quickly than I can…also, perhaps this discussion should probably be moved to the OFA driver forum since it has turned into a driver-specific thread at this point.
What do folks think?
Judy
From: 천원문 [mailto:wm.cheon at samsung.com]
Sent: Wednesday, April 17, 2013 1:20 AM
To: Judy Brock-SSI; 강미경; technical at nvmexpress.org<mailto:technical at nvmexpress.org>
Subject: Re: RE: Handling pending commands when processing Format

Hi Judy,

Would you elaborate more about  the "hot-remove" state? In this state, do you mean that all the IO SQ/CQs are deleted? Or, waiting for completions of all the outstanding IOs?

Thanks,
Wonmoon

------- Original Message -------
Sender : Judy Brock-SSI<judy.brock at ssi.samsung.com<mailto:judy.brock at ssi.samsung.com>>
Date : 2013-04-17 16:39 (GMT+09:00)
Title : RE: Handling pending commands when processing Format

Hi,
I should clarify that it is not the Windows operating system – but rather the Windows OFA  NVMe driver -  that, from what I can see,  does a “hot-remove” of all namespace(s) associated with a device before allowing a format operation to begin;  “hot remove” is just the name for an internal state in the driver format nvm state machine.
Before beginning the actual format op, the driver internally marks all  namespaces associated with the format operation “offline”. It then notifies the OS that there has been a “bus change” event (via an OS-specific API).
This in turn will cause the OS to rescan (re-enumerate) the “bus” (the pseudo SCSI bus, that is  –  we expose NVM namespaces as SCSI luns).
Since all the pertinent namespaces have been marked offline internally, the bus rescan won’t detect any valid SCSI luns (because the driver will not report any).
Hence from the OS point of view, any SCSI lun(s) previously mapped to the namespace(s) to be formatted will have disappeared/will be unaddressable while the format operation is in progress.
Judy

From: Judy Brock-SSI
Sent: Tuesday, April 16, 2013 8:22 PM
To: 'mkkang.kang at samsung.com'; technical at nvmexpress.org<mailto:technical at nvmexpress.org>
Subject: RE: Handling pending commands when processing Format

Mikyeong,
I haven’t looked at the Linux driver but I know that Windows hot-removes all namespace(s) associated with a device before allowing a format operation to begin. And a namespace can’t be removed while there is IO outstanding to it so that answers your question regarding IOs being completed before format begins. It also answers the question about requests being sent to a namespace while format is in progress – can’t happen.
Thanks,
Judy



From: 강미경 [mailto:mkkang.kang at samsung.com]
Sent: Tuesday, April 16, 2013 7:07 PM
To: technical at nvmexpress.org<mailto:technical at nvmexpress.org>
Subject: Handling pending commands when processing Format

Dear All,

Format NVM command may change the Namespace repository, and it will be executed out of order like any other commands. Therefore, Format NVM command may affect other commands that are pending execution in the device, if any.

1) How does an OFA/linux driver handle 'Format NVM command'? Does a host make sure that all commands for a particular NSID are completed before sending 'Format NVM command'?

2) If a host driver does not behave like 1) above, how can a device handle other pending commands which were previously submitted in a SQ? It seems like we need an additional status code. e.g. Abort due to Namespace Format

3) Let's suppose that 'Format NVM command' is in progress. If the host driver sends subsequent commands to the namespace being formatted, should the device reply directly with a 'Namespace not Ready'?

[1.0e spec] If the device does not reply directly and the format operation takes long time, then, I/O command will timeout and the host may send the reset. But if commands are responded with 'Namespace Not Ready', host may not issue the reset. Therefore, direct reply seems to be needed.

[1.1 spec. ECN 001] There is Format progress indicator. The host driver can check format progress any time, therefore, there is no concern about reset during format command.


Best Regards,
Mikyeong Kang

Kang MiKyeong

Flash Memory Planning/Enabling Group, Memory Div.
SAMSUNG ELECTRONICS, Co., Ltd..
Phone: 82-31-208-3857
Mobile: 82-10-9369-0177
E-mail: mkkang.kang at samsung.com<mailto:mkkang.kang at samsung.com>



[cid:image001.jpg at 01CE3C59.BF34AF60]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20130419/46f469b4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 34869 bytes
Desc: image001.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20130419/46f469b4/attachment.jpg>


More information about the nvmewin mailing list