[nvmewin] UNCHECKED Learning Mode Patch for Review

Fri Mar 16 09:41:34 PDT 2012

Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch.  Normally wouldn't recommend this but the pending patch is so small it just makes sense.

Password is ofanvme123.  I can schedule a call to walk through any of this if anyone would like.

Learning Mode:  Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors.  Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially.  Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core.  On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core.

Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing)

-         Chatham (DT with 8 cores):  Win8 Server public beta, 2008R2

-         QEMU (configured for 2 cores): Win7-64

-         Tools:  The standard stuff:  format, stress, iometer, SCSI compliance, shutdown/restart

Changes:
nvmeStd.h:

-         Comment changes as needed

-         Removed the logcalmode element from the msgTable as its no longer used

-         Removed DBG directive from procNum element, its now required for learning mode

-         Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later)

nvmeIo.c:

-         Removed DBG directive from procNum element, its now required for learning mode

nvmeInit.c

-         Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used

-         Changes to NVMeMsiMapCores():

o   Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR

o   The other changes are just code simplifications

-         Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX

-         Changed NVMeMapCore2Queue():

o   We now check if we're in learning mode, if not then we simply look up the queue num from the core table

o   If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention

-         Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this).  We disable it by pretending that we've already learned all the cores

nvmeStd.c:

-         In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue

-         In NVMeInitialize():

o   Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core

-         In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD).  You can't return FALSE from this function per MSDN docs.  Always return TRUE

-         In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead:

o   Merged Alex's bugfixes in with my changes

o   Removed the DBG related code for checking core affiliation

o   Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case

o   If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode

o   If we're not learning then, we only search the queue specified in the MMT

o   If we are learning, we know the queue # is the same as the MsdId because we init'd it that way

o   The 'learning' happens in a new coniditonal just after we determine we have an srbExt.  It works as follows:

?  Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core.  Note the lock is only taken on IOs during learning mode (the first few IOs)

?  Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed.  This assures that the next lookup in the table for this core # will complete on this, the same, core.

?  Increment our learning counter which will direct the next IO to the next core

-         Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs.  Just moved things around a bit as we had to add one to our product specific code.  No other changes here other than placing IOCTL specific code in the correct case block

-

____________________________________
Paul Luse
Sr. Staff Engineer
PCG Server Software Engineering
Desk: 480.554.3688, Mobile: 480.334.4630

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120316/2bbfb92e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: source.zip
Type: application/x-zip-compressed
Size: 155940 bytes
Desc: source.zip
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120316/2bbfb92e/attachment.bin>

[nvmewin] ***UNCHECKED*** Learning Mode Patch for Review

[nvmewin] UNCHECKED Learning Mode Patch for Review