[nvmewin] MSID 0 share w/IO queue patch

Luse, Paul E paul.e.luse at intel.com
Tue Oct 2 10:34:01 PDT 2012


Cool, thanks.  I assume you are OK with the prints too (I did it that way so the table is displayed in core order).  Anyway, let us know when you're done testing.

Rick/Arpit, any chance yet to take a look?  Not in any big hurry, just checking.

Thx
Paul

From: Chang, Alex [mailto:Alex.Chang at idt.com]
Sent: Tuesday, October 02, 2012 10:31 AM
To: Luse, Paul E; nvmewin at lists.openfabrics.org
Subject: RE: MSID 0 share w/IO queue patch

Paul,

I agree. Like I motioned it's quite unlikely from driver's current state machine implementation. The reason I brought it up is the state machine relies on "LearningCores" to determine if moving out of learning phase and the value of LearningCores might be incorrect if, for some reasons, the AdminQ has pending completion entries or there are more than one completion entries in any given IO queue. Anyway, it's just a suggestion and let's make a note for now.

Thanks,
Alex

________________________________
From: Luse, Paul E [mailto:paul.e.luse at intel.com]<mailto:[mailto:paul.e.luse at intel.com]>
Sent: Tuesday, October 02, 2012 10:13 AM
To: Chang, Alex; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: MSID 0 share w/IO queue patch
Hi Alex-

Thanks for the review!

Item (1) is not possible today, during learning we are single threaded in the init state machine and IOs are held off from storport; there's no way anything else can come through.  If someone were to introduce an admin command in the code during learning phase they would run into a problem that they would discover immediately (they would miss the INT on the admin command) but I don't see that happening and it can be addressed at that time if it does.  Resisting this change because (a) it can't happen now and is unlikely to happen in the future and (b) the extra check is in the IO path so shouldn't be added unless needed.  Are you OK with that?

On item 2, I already do that however I don't print it in the ISR/DPC as we learn, I do it at the very end so its easier to print out/see on the debugger.  Its in the init callback near the end:

#if DBG
                    PRES_MAPPING_TBL pRMT = &pAE->ResMapTbl;
                    PCORE_TBL pCT = NULL;
                    ULONG i;

                    /* print out the learned table */
                    StorPortDebugPrint(INFO, "Learning Complete.  Core Table:\n");
                    for (i=0; i < pRMT->NumActiveCores; i++) {
                        pCT = pRMT->pCoreTbl + i;
                        StorPortDebugPrint(INFO,
                                           "\tCore(0x%x) MSID(0x%x) QueuePair(0x%x)\n",
                                           pCT->CoreNum,
                                           pCT->MsiMsgID,
                                           pCT->SubQueue);

                    }

                    pAE->LearningComplete = TRUE;
#endif

From: Chang, Alex [mailto:Alex.Chang at idt.com]<mailto:[mailto:Alex.Chang at idt.com]>
Sent: Tuesday, October 02, 2012 10:01 AM
To: Luse, Paul E; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: MSID 0 share w/IO queue patch

Hi Paul,

After reviewing the changes you made, I have couple of feedbacks:
1. In Line#1673, where handles pending entries in AdminQ that shares Msg#0 with another queue:
        if ((firstCheckQueue > 0) &&
            (MsgID == 0)) {
            firstCheckQueue = lastCheckQueue = indexCheckQueue = 0;
        }
Just in case that happens to have pending entries in AdminQ during learning phase (which is quite unlikely), we should add one more checking:
        if ((firstCheckQueue > 0) && (pAE->DriverState.NextDriverState != NVMeWaitOnLearnMapping) &&
            (MsgID == 0)) {
            firstCheckQueue = lastCheckQueue = indexCheckQueue = 0;
        }

2. Could you add the core-vector mapping result after discovered?
#if DBG
                        StorPortDebugPrint(INFO,
                            "Mapping learned: Core(%02d) <=> Vector(%02d)\n",
                            procNum.Number, MsgID);
#endif

Will do more testing and get back to you by Friday.
Thanks,
Alex
________________________________
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Luse, Paul E
Sent: Monday, October 01, 2012 4:03 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] ***UNCHECKED*** MSID 0 share w/IO queue patch
All-

Here is the final patch for admin queue sharing and a few other misc cleanup items.  Fully tested and I also reviewed w/Alex and Kwok F2F last week but will let Alex approve for himself.  PW is intel123

Please let me know if there are any questions

Thx
Paul


nvmeInit.c:
- changes in loops and conditions to account for the number of cores now being the same as the number of vectors requested
- removed unused element pMsiMsgTbl->CoreNum
- removed unused element pMMT->CoreNum
- in NVMeMsiMapCores(), now init the MsgId in the core table to the CQ number -1 (same as core #)
- in DBG mode, track when learning is complete to support an assertion to make sure learning is always working
- for the identify commands during the init state machine, we were DMA'ing directly into elements within devExt structures with no assurance of alignment.  To address this I changed the target address of the xfer to use the driver state machine data buffer and copy it over on completion
- NVMeAllocIoQueues():  fix to use an index that counts up through both core and NUMA loops (what Alex saw) instead of the inner loop.  This val is used to index into the coreTable, the QueuId value will continue to behave as before and wrap at the number of queues

nvmeIo.c:
- two changes both wrapped in DBG, one to print PRP details and one to init the core # element in the srbExt used to make sure learning continues to do its job (check submitting core & completing core)

nvmeStat.c
- changes in loops and conditions to account for the number of cores now being the same as the number of vectors requested
- init of the debug var used for learning mode

nvmeStd.c
- alloc one less entry for the MsgTable
- in the completion routines (either DPC or ISR):
-- switched the logic to check for the common case first (shared == FALSE)
-- got rid of the learningMode var as we detect now based on startState as follows:
If we're done with the init State machine:
? use the msg table to figure out which queue to look in
Else if we're in learning mode:
?  use the msgId + 1.  Recall that when we alloc'd the queues we setup the CQ's such that QP 1 would use MSID0, QP 2 MSID 1, etc.  Learning mode will loop through all of the QPs by walking the core table 0..number of cores.  In the event that there are fewer QPs than cores because of an HBA limitation, this still works we just learn each queue more than once which does not hurt anything.  Clearly when this happens things will not be optimal, they can't be without enough QPs, however we'll still fully utilize all of the available queue pairs
- the loop has changed as the previous for loop didn't have the flexibility to check 2 non back-back queues.  The QP that shared MSIX0 could be any of the other queues.  I reworked the loops to be 2 do while loops and for an actual admin queue request we'll just check the admin queue, for the shared IO queue however we have to always check the admin queue as well.  This logic is at the bottom of the loop and is fairly straightforward

nvmeStd.h:
- a few supporting changes - obvious

Also made a few changes enar the end following review w/IDT:
- replaced Rtl copy commands with storport copy commands
- replaced Rtl zero mem commands with memset
- added print at the end of learning mode to see updated mappings (initial mappings still print)



____________________________________
Paul Luse
Sr. Staff Engineer
PCG Server Software Engineering
Desk: 480.554.3688, Mobile: 480.334.4630

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20121002/414f1261/attachment.html>


More information about the nvmewin mailing list