From Alex.Chang at idt.com Fri Mar 2 10:21:36 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 2 Mar 2012 18:21:36 +0000 Subject: [nvmewin] Patch Review Request In-Reply-To: <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A033980@FMSMSX106.amr.corp.intel.com> <45C2596E6A608A46B0CA10A43A91FE1602FC6068@CORPEXCH1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> Hi all, I am attaching the recent changes in nvmeStd.c, which include two fixes: 1. Fix of system crash when INTx is being used to generate interrupts. - Root cause: The attempt to acquire MSI lock in DPC routine. - Change: Acquiring DpcLock instead if INTx is being used. 2. Fix of causing NVMe controller not responding when INTx is being used to generate interrupts. - Root cause: Programming Doorbell registers with improper Completion Queue Head Pointer values when looping through all completion queues. - Change: Resetting InterruptClaimed to FALSE after updating each Completion Queue Head Pointer via the associated Doorbell register. Please review the changes and feel free to let me know if you have any comments or questions. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nvmeStd.zip Type: application/x-zip-compressed Size: 23891 bytes Desc: nvmeStd.zip URL: From paul.e.luse at intel.com Fri Mar 2 11:22:35 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 2 Mar 2012 19:22:35 +0000 Subject: [nvmewin] Patch Review Request In-Reply-To: <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A033980@FMSMSX106.amr.corp.intel.com> <45C2596E6A608A46B0CA10A43A91FE1602FC6068@CORPEXCH1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0621E3@FMSMSX106.amr.corp.intel.com> Alex- Looks good but I believe our coding style requires {} even with one liners following if/then/else (see excerpt below). Please make that minor update and then it looks good from the Intel side. Thx Paul For single line if statements: if () { ; } From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 11:22 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Patch Review Request Hi all, I am attaching the recent changes in nvmeStd.c, which include two fixes: 1. Fix of system crash when INTx is being used to generate interrupts. - Root cause: The attempt to acquire MSI lock in DPC routine. - Change: Acquiring DpcLock instead if INTx is being used. 2. Fix of causing NVMe controller not responding when INTx is being used to generate interrupts. - Root cause: Programming Doorbell registers with improper Completion Queue Head Pointer values when looping through all completion queues. - Change: Resetting InterruptClaimed to FALSE after updating each Completion Queue Head Pointer via the associated Doorbell register. Please review the changes and feel free to let me know if you have any comments or questions. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Fri Mar 2 11:30:47 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 2 Mar 2012 19:30:47 +0000 Subject: [nvmewin] Patch Review Request In-Reply-To: <82C9F782B054C94B9FC04A331649C77A0621E3@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A033980@FMSMSX106.amr.corp.intel.com> <45C2596E6A608A46B0CA10A43A91FE1602FC6068@CORPEXCH1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0621E3@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36D5B8@corpmail1.na.ads.idt.com> Thanks, Paul. Please review the revised one in the attachment. Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Friday, March 02, 2012 11:23 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Patch Review Request Alex- Looks good but I believe our coding style requires {} even with one liners following if/then/else (see excerpt below). Please make that minor update and then it looks good from the Intel side. Thx Paul For single line if statements: if () { ; } From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 11:22 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Patch Review Request Hi all, I am attaching the recent changes in nvmeStd.c, which include two fixes: 1. Fix of system crash when INTx is being used to generate interrupts. - Root cause: The attempt to acquire MSI lock in DPC routine. - Change: Acquiring DpcLock instead if INTx is being used. 2. Fix of causing NVMe controller not responding when INTx is being used to generate interrupts. - Root cause: Programming Doorbell registers with improper Completion Queue Head Pointer values when looping through all completion queues. - Change: Resetting InterruptClaimed to FALSE after updating each Completion Queue Head Pointer via the associated Doorbell register. Please review the changes and feel free to let me know if you have any comments or questions. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nvmeStd.zip Type: application/x-zip-compressed Size: 23888 bytes Desc: nvmeStd.zip URL: From paul.e.luse at intel.com Mon Mar 5 07:51:46 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 5 Mar 2012 15:51:46 +0000 Subject: [nvmewin] Patch Review Request In-Reply-To: <548C5470AAD9DA4A85D259B663190D36D5B8@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A033980@FMSMSX106.amr.corp.intel.com> <45C2596E6A608A46B0CA10A43A91FE1602FC6068@CORPEXCH1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0621E3@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36D5B8@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0633F1@FMSMSX106.amr.corp.intel.com> Thanks Alex - LSI, you guys good? FYI I met with some Msft folks last week and learned a few things about the storport performance options, I'll be running some experiments this week and let you all know what comes of them, I suspect the following though for my next patch: - Removal of MSI address decode method for determining vector/queue mapping. It doesn't support logical mode which can be a function of OS (and is for Server 8). Prefer the learning method over storport hints for MSI vector because its simple and works independent of OS or HW config as its based on what is happening on the system that its running on - Addition of DPC steering back to init; this controls the storport DPC completion, not the miniport, so it's a benefit regardless of what we do in the miniport. Our DPC will run on the same core that our ISR runs on regardless of this setting. - Have some experimentation to do with concurrent channels, not clear whether this will work for us or not but will let you all know I'll be setting up a meeting shortly for next week to review release plans. Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 12:31 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Patch Review Request Thanks, Paul. Please review the revised one in the attachment. Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Friday, March 02, 2012 11:23 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Patch Review Request Alex- Looks good but I believe our coding style requires {} even with one liners following if/then/else (see excerpt below). Please make that minor update and then it looks good from the Intel side. Thx Paul For single line if statements: if () { ; } From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 11:22 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Patch Review Request Hi all, I am attaching the recent changes in nvmeStd.c, which include two fixes: 1. Fix of system crash when INTx is being used to generate interrupts. - Root cause: The attempt to acquire MSI lock in DPC routine. - Change: Acquiring DpcLock instead if INTx is being used. 2. Fix of causing NVMe controller not responding when INTx is being used to generate interrupts. - Root cause: Programming Doorbell registers with improper Completion Queue Head Pointer values when looping through all completion queues. - Change: Resetting InterruptClaimed to FALSE after updating each Completion Queue Head Pointer via the associated Doorbell register. Please review the changes and feel free to let me know if you have any comments or questions. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Mar 5 07:55:15 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 5 Mar 2012 15:55:15 +0000 Subject: [nvmewin] Review Release Plans for 2012 Message-ID: <82C9F782B054C94B9FC04A331649C77A06341B@FMSMSX106.amr.corp.intel.com> Tuesday, March 13, 2012, 01:00 PM US Pacific Time 916-356-2663, 8-356-2663, Bridge: 2, Passcode: 9540402 Live Meeting: https://webjoin.intel.com/?passcode=9540402 Speed dialer: inteldialer://2,9540402 | Learn more Agenda: - Opens - Release Schedule for 2012 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1777 bytes Desc: not available URL: From paul.e.luse at intel.com Mon Mar 5 07:55:25 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 5 Mar 2012 15:55:25 +0000 Subject: [nvmewin] Review Release Plans for 2012 Message-ID: <82C9F782B054C94B9FC04A331649C77A06342F@FMSMSX106.amr.corp.intel.com> Tuesday, March 13, 2012, 01:00 PM US Pacific Time 916-356-2663, 8-356-2663, Bridge: 2, Passcode: 9540402 Live Meeting: https://webjoin.intel.com/?passcode=9540402 Speed dialer: inteldialer://2,9540402 | Learn more Agenda: - Opens - Release Schedule for 2012 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1777 bytes Desc: not available URL: From Rick.Knoblaugh at lsi.com Mon Mar 5 15:52:48 2012 From: Rick.Knoblaugh at lsi.com (Knoblaugh, Rick) Date: Mon, 5 Mar 2012 16:52:48 -0700 Subject: [nvmewin] Patch Review Request References: <82C9F782B054C94B9FC04A331649C77A033980@FMSMSX106.amr.corp.intel.com> <45C2596E6A608A46B0CA10A43A91FE1602FC6068@CORPEXCH1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D367EEF@corpmail1.na.ads.idt.com> <548C5470AAD9DA4A85D259B663190D36D5A0@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0621E3@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36D5B8@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A0633F1@FMSMSX106.amr.corp.intel.com> Message-ID: <4565AEA676113A449269C2F3A549520FB5DAB25D@cosmail03.lsi.com> Hi Paul, I think we're good on this end. Thanks, -Rick From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Monday, March 05, 2012 7:52 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Patch Review Request Thanks Alex - LSI, you guys good? FYI I met with some Msft folks last week and learned a few things about the storport performance options, I'll be running some experiments this week and let you all know what comes of them, I suspect the following though for my next patch: - Removal of MSI address decode method for determining vector/queue mapping. It doesn't support logical mode which can be a function of OS (and is for Server 8). Prefer the learning method over storport hints for MSI vector because its simple and works independent of OS or HW config as its based on what is happening on the system that its running on - Addition of DPC steering back to init; this controls the storport DPC completion, not the miniport, so it's a benefit regardless of what we do in the miniport. Our DPC will run on the same core that our ISR runs on regardless of this setting. - Have some experimentation to do with concurrent channels, not clear whether this will work for us or not but will let you all know I'll be setting up a meeting shortly for next week to review release plans. Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 12:31 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Patch Review Request Thanks, Paul. Please review the revised one in the attachment. Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Friday, March 02, 2012 11:23 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Patch Review Request Alex- Looks good but I believe our coding style requires {} even with one liners following if/then/else (see excerpt below). Please make that minor update and then it looks good from the Intel side. Thx Paul For single line if statements: if () { ; } From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 02, 2012 11:22 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Patch Review Request Hi all, I am attaching the recent changes in nvmeStd.c, which include two fixes: 1. Fix of system crash when INTx is being used to generate interrupts. - Root cause: The attempt to acquire MSI lock in DPC routine. - Change: Acquiring DpcLock instead if INTx is being used. 2. Fix of causing NVMe controller not responding when INTx is being used to generate interrupts. - Root cause: Programming Doorbell registers with improper Completion Queue Head Pointer values when looping through all completion queues. - Change: Resetting InterruptClaimed to FALSE after updating each Completion Queue Head Pointer via the associated Doorbell register. Please review the changes and feel free to let me know if you have any comments or questions. Thanks, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From amber.huffman at intel.com Mon Mar 5 16:54:36 2012 From: amber.huffman at intel.com (Huffman, Amber) Date: Tue, 6 Mar 2012 00:54:36 +0000 Subject: [nvmewin] first release thoughts In-Reply-To: References: <82C9F782B054C94B9FC04A331649C77A039820@FMSMSX106.amr.corp.intel.com>, <548C5470AAD9DA4A85D259B663190D3689ED@corpmail1.na.ads.idt.com> Message-ID: Hi, For the first release, can we post the binary at nvmexpress.org (in addition to the location on the OFA website)? Other suggestions the team has on the rollout of the first release? Thanks, Amber -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, February 07, 2012 11:14 AM To: Chang, Alex Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] first release thoughts No. We can't sign it right now. I'll check with OFA to see if there's any precedent though Sent from my iPhone On Feb 7, 2012, at 9:49 AM, "Chang, Alex" > wrote: Hi Paul, For the first binary release, does openfabircs.org have its own certificate, etc. to sign it? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Thursday, February 02, 2012 11:30 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] first release thoughts I believe we agreed on 2 patches prior to first release. 1) Alex's Format PT IOCTL: I understand the review went well so suspect as soon as Rick & Ray take a look at the final patch that will go in 2) My performance & stability patch: I'll rebase and re-test once Alex's goes in and then send out for eview Once mine goes in, wanted to level set real quick that we would have at least(more are welcome) IDT, LSI and Intel run the following in their environments and/or on QEMU: - Iometer script - BusTRACE SCSI check and busTRACE data integrity (for those who have it) - Msft SCSI compliance - Msft sdstress All of these will be run in the same manner as we ran them before and we'll document what that means for everyone else before the release and post notes with the release. I don't want to post the tools though, folks can grab those on their own if they'd like. I suspect this will put our first release in mid to late Mar. I'll probably schedule a short call around then so we can all confirm that we're ready and review what it is that we're posting for our very first binary release Thanks! Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Mon Mar 5 16:58:13 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 6 Mar 2012 00:58:13 +0000 Subject: [nvmewin] first release thoughts In-Reply-To: References: <82C9F782B054C94B9FC04A331649C77A039820@FMSMSX106.amr.corp.intel.com>, <548C5470AAD9DA4A85D259B663190D3689ED@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0643D4@FMSMSX106.amr.corp.intel.com> Sure, I don't see why not. We're meeting next week to discuss the build date, test plans and then future builds. Will make sure everyone is cool w/it then but doubt there will be anyone who would have a problem. -----Original Message----- From: Huffman, Amber Sent: Monday, March 05, 2012 5:55 PM To: Luse, Paul E; Chang, Alex Cc: nvmewin at lists.openfabrics.org Subject: RE: first release thoughts Hi, For the first release, can we post the binary at nvmexpress.org (in addition to the location on the OFA website)? Other suggestions the team has on the rollout of the first release? Thanks, Amber -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, February 07, 2012 11:14 AM To: Chang, Alex Cc: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] first release thoughts No. We can't sign it right now. I'll check with OFA to see if there's any precedent though Sent from my iPhone On Feb 7, 2012, at 9:49 AM, "Chang, Alex" > wrote: Hi Paul, For the first binary release, does openfabircs.org have its own certificate, etc. to sign it? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Thursday, February 02, 2012 11:30 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] first release thoughts I believe we agreed on 2 patches prior to first release. 1) Alex's Format PT IOCTL: I understand the review went well so suspect as soon as Rick & Ray take a look at the final patch that will go in 2) My performance & stability patch: I'll rebase and re-test once Alex's goes in and then send out for eview Once mine goes in, wanted to level set real quick that we would have at least(more are welcome) IDT, LSI and Intel run the following in their environments and/or on QEMU: - Iometer script - BusTRACE SCSI check and busTRACE data integrity (for those who have it) - Msft SCSI compliance - Msft sdstress All of these will be run in the same manner as we ran them before and we'll document what that means for everyone else before the release and post notes with the release. I don't want to post the tools though, folks can grab those on their own if they'd like. I suspect this will put our first release in mid to late Mar. I'll probably schedule a short call around then so we can all confirm that we're ready and review what it is that we're posting for our very first binary release Thanks! Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Tue Mar 13 14:42:48 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 13 Mar 2012 21:42:48 +0000 Subject: [nvmewin] OFA NVMe Driver Working Group Minutes 3/13 Message-ID: <82C9F782B054C94B9FC04A331649C77A06AE99@FMSMSX106.amr.corp.intel.com> Agenda: - Opens o Paul presenting release plans (attached) to NVMe promoters later this week o Paul presenting brief NVMe overview and info on our group at the OFA workshop later this month ? https://www.openfabrics.org/press-room/ofa-workshop.html - 2012 Release Plans o See attached o Release 1 will be built following review/decision of inclusion of 'learning mode' described during the meeting, patch with details will go out later this week. o For 1.1 the Public IOCTLs will be contributed by Intel Corporation o For 1.1, the Win8 storport specifics are up for grab. Public info is available below, we'll use email to discuss what we think we should include in the NVMe driver and who is available to work on each one and contribute. ? http://msdn.microsoft.com/en-us/library/windows/hardware/hh451200(v=vs.85).aspx Attendees: [cid:image001.png at 01CD0122.3217AA60] ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 12568 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: OFA NVMe Driver 2012.pptx Type: application/vnd.openxmlformats-officedocument.presentationml.presentation Size: 63783 bytes Desc: OFA NVMe Driver 2012.pptx URL: From paul.e.luse at intel.com Wed Mar 14 11:29:27 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Wed, 14 Mar 2012 18:29:27 +0000 Subject: [nvmewin] PT doc Message-ID: <82C9F782B054C94B9FC04A331649C77A08C578@FMSMSX106.amr.corp.intel.com> Alex- I know I asked you this a while back, sorry if I dropped the ball, but can you send the latest PT doc out again and I'll push this time for reviewer feedback so we can get it published with the code? Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Fri Mar 16 09:41:34 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 16 Mar 2012 16:41:34 +0000 Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Message-ID: <82C9F782B054C94B9FC04A331649C77A08E690@FMSMSX106.amr.corp.intel.com> Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: ? Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) ? Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. ? Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source.zip Type: application/x-zip-compressed Size: 155940 bytes Desc: source.zip URL: From paul.e.luse at intel.com Fri Mar 16 09:44:10 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 16 Mar 2012 16:44:10 +0000 Subject: [nvmewin] storport assertions in Win8 Checked public server beta Message-ID: <82C9F782B054C94B9FC04A331649C77A08E6EB@FMSMSX106.amr.corp.intel.com> FYI I found an assertion that we seem to be causing in storport however there's not enough info for me to determine what it is either with storport symbols or with prints. I have sent an email to a Msft contact for maybe some hint as to what it could be. If anyone else cares to give it a shot please feel free J ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Fri Mar 16 10:49:58 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 16 Mar 2012 17:49:58 +0000 Subject: [nvmewin] Learning Mode Patch for Review In-Reply-To: <82C9F782B054C94B9FC04A331649C77A08E690@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A08E690@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36E818@corpmail1.na.ads.idt.com> Hi Paul, I have a question regarding the Learning Mode. When you finish the learning and find out the mappings between cores and vectors, do we need to delete the created Completion Queues and re-create them? When the driver creates the Completion Queues before learning, it specifies the associated vector for each Completion Queue with some assumptions in mappings between Completion Queues and vectors. After the learning, the associations between Completion Queues and vectors need to be corrected via deleting the queues and re-creating them. Correct me if I am wrong. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: ? Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) ? Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. ? Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.r.harris at intel.com Fri Mar 16 10:58:30 2012 From: james.r.harris at intel.com (Harris, James R) Date: Fri, 16 Mar 2012 17:58:30 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: References: Message-ID: Paul, I didn't see the assertion message come through. Can you paste into your message? Thanks, -Jim From paul.e.luse at intel.com Fri Mar 16 11:00:48 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 16 Mar 2012 18:00:48 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: References: Message-ID: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> Sorry... this output starts right as we return true from passiveInit nvme!NVMePassiveInitialize+0x3cb: fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) 7: kd> p Breakpoint 2 hit nvme!NVMePassiveInitialize+0x396: fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] 7: kd> p nvme!NVMePassiveInitialize+0x3bc: fffff880`051a8ccc ba88130000 mov edx,1388h 7: kd> p nvme!NVMePassiveInitialize+0x3cb: fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) 7: kd> p Breakpoint 2 hit nvme!NVMePassiveInitialize+0x396: fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] 7: kd> p nvme!NVMePassiveInitialize+0x3bc: fffff880`051a8ccc ba88130000 mov edx,1388h 7: kd> p nvme!NVMePassiveInitialize+0x3cb: fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) 7: kd> p Breakpoint 2 hit nvme!NVMePassiveInitialize+0x396: fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] 7: kd> g Breakpoint 0 hit nvme!NVMeRunningWaitOnIoSQ: fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx 0: kd> g Breakpoint 3 hit nvme!NVMePassiveInitialize+0x3cd: fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] 7: kd> p nvme!NVMePassiveInitialize+0x3f6: fffff880`051a8d06 4883c468 add rsp,68h 7: kd> p storport!RaidAdapterStartMiniport+0x244: fffff880`01700840 f6d8 neg al 7: kd> p storport!RaidAdapterStartMiniport+0x246: fffff880`01700842 1bff sbb edi,edi 7: kd> p storport!RaidAdapterStartMiniport+0x248: fffff880`01700844 f7d7 not edi 7: kd> p storport!RaidAdapterStartMiniport+0x24a: fffff880`01700846 81e7010000c0 and edi,0C0000001h 7: kd> p storport!RaidAdapterStartMiniport+0x250: fffff880`0170084c 85ff test edi,edi 7: kd> p storport!RaidAdapterStartMiniport+0x252: fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a (fffff880`01700816) 7: kd> p storport!RaidAdapterStartMiniport+0x254: fffff880`01700850 488bcb mov rcx,rbx 7: kd> p storport!RaidAdapterStartMiniport+0x257: fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive (fffff880`01700a18) 7: kd> p storport!RaidAdapterStartMiniport+0x25c: fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] 7: kd> p storport!RaidAdapterStartMiniport+0x264: fffff880`01700860 4181fbff000000 cmp r11d,0FFh 7: kd> p storport!RaidAdapterStartMiniport+0x26b: fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f (fffff880`0170086b) 7: kd> p storport!RaidAdapterStartMiniport+0x26f: fffff880`0170086b 418bfb mov edi,r11d 7: kd> p storport!RaidAdapterStartMiniport+0x272: fffff880`0170086e c1ef05 shr edi,5 7: kd> p storport!RaidAdapterStartMiniport+0x275: fffff880`01700871 41f6c31f test r11b,1Fh 7: kd> p storport!RaidAdapterStartMiniport+0x279: fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e (fffff880`0170087a) 7: kd> p storport!RaidAdapterStartMiniport+0x27b: fffff880`01700877 4103fe add edi,r14d 7: kd> p storport!RaidAdapterStartMiniport+0x27e: fffff880`0170087a 8bf7 mov esi,edi 7: kd> p storport!RaidAdapterStartMiniport+0x280: fffff880`0170087c 41b85261564d mov r8d,4D566152h 7: kd> p storport!RaidAdapterStartMiniport+0x286: fffff880`01700882 b900020000 mov ecx,200h 7: kd> p storport!RaidAdapterStartMiniport+0x28b: fffff880`01700887 48c1e602 shl rsi,2 7: kd> p storport!RaidAdapterStartMiniport+0x28f: fffff880`0170088b 488bd6 mov rdx,rsi 7: kd> p storport!RaidAdapterStartMiniport+0x292: fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag (fffff880`0172a018)] 7: kd> p storport!RaidAdapterStartMiniport+0x298: fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax 7: kd> p storport!RaidAdapterStartMiniport+0x29f: fffff880`0170089b 4885c0 test rax,rax 7: kd> p storport!RaidAdapterStartMiniport+0x2a2: fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf (fffff880`017008cb) 7: kd> p storport!RaidAdapterStartMiniport+0x2a4: fffff880`017008a0 4c8bc6 mov r8,rsi 7: kd> p storport!RaidAdapterStartMiniport+0x2a7: fffff880`017008a3 33d2 xor edx,edx 7: kd> p storport!RaidAdapterStartMiniport+0x2a9: fffff880`017008a5 488bc8 mov rcx,rax 7: kd> p storport!RaidAdapterStartMiniport+0x2ac: fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) 7: kd> p storport!RaidAdapterStartMiniport+0x2b1: fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] 7: kd> p storport!RaidAdapterStartMiniport+0x2b8: fffff880`017008b4 c1e705 shl edi,5 7: kd> p storport!RaidAdapterStartMiniport+0x2bb: fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] 7: kd> p storport!RaidAdapterStartMiniport+0x2c2: fffff880`017008be 448bc7 mov r8d,edi 7: kd> p storport!RaidAdapterStartMiniport+0x2c5: fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap (fffff880`0172a0f0)] 7: kd> p storport!RaidAdapterStartMiniport+0x2cb: fffff880`017008c7 33c0 xor eax,eax 7: kd> p storport!RaidAdapterStartMiniport+0x2cd: fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 (fffff880`017008d0) 7: kd> p storport!RaidAdapterStartMiniport+0x2d4: fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] 7: kd> p storport!RaidAdapterStartMiniport+0x2dc: fffff880`017008d8 4883c450 add rsp,50h 7: kd> p storport!RaidAdapterStartMiniport+0x2e0: fffff880`017008dc 415e pop r14 7: kd> p storport!RaidAdapterStartMiniport+0x2e2: fffff880`017008de 5f pop rdi 7: kd> p storport!RaidAdapterStartMiniport+0x2e3: fffff880`017008df 5e pop rsi 7: kd> p storport!RaidAdapterStartMiniport+0x2e4: fffff880`017008e0 c3 ret 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1db: fffff880`0173182f 8bf8 mov edi,eax 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1dd: fffff880`01731831 85c0 test eax,eax 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1df: fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 (fffff880`017317c6) 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1e1: fffff880`01731835 488bcb mov rcx,rbx 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1e4: fffff880`01731838 4584f6 test r14b,r14b 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1e7: fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d (fffff880`01731861) 7: kd> p storport!RaidAdapterStartDeviceIrp+0x1e9: fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization (fffff880`01701330) 7: kd> p Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) storport!StorAllocateContiguousIoResources+0x2d: fffff880`017184a1 cd2c int 2Ch -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, James R Sent: Friday, March 16, 2012 10:59 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] Win8 assertion message Paul, I didn't see the assertion message come through. Can you paste into your message? Thanks, -Jim _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From james.r.harris at intel.com Fri Mar 16 11:21:39 2012 From: james.r.harris at intel.com (Harris, James R) Date: Fri, 16 Mar 2012 18:21:39 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> Message-ID: You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On >Behalf Of Harris, James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Fri Mar 16 11:56:49 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 16 Mar 2012 18:56:49 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> Worth a shot, thanks Jim. -----Original Message----- From: Harris, James R Sent: Friday, March 16, 2012 11:22 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org >[mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, >James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Fri Mar 16 13:37:55 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 16 Mar 2012 20:37:55 +0000 Subject: [nvmewin] Learning Mode Patch for Review In-Reply-To: <548C5470AAD9DA4A85D259B663190D36E818@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A08E690@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36E818@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A08EE19@FMSMSX106.amr.corp.intel.com> You know what, you're right. I was thinking we had a match on the CQ side even after learning which is the case for making sure that we're building the table such that, after learning, our DPC knows exactly which queue to look in w/o searching them but we're not NUMA optimized on the CQ at that point. Missed the forest for the trees there for a minute but there are a lot of trees J I believe I can address this pretty quick and easy and it will actually be equivalent in attributes to when we had physical more and MSI address decomp as our method. Not today though, have to head out soon but little, if any, of this code will change I just need to go ahead and add code early on where we submit the N test IOs and then re-do the queues and turn off learning. Thanks for giving this some extra thought! Look for a new patch early next week... Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Friday, March 16, 2012 10:50 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review Hi Paul, I have a question regarding the Learning Mode. When you finish the learning and find out the mappings between cores and vectors, do we need to delete the created Completion Queues and re-create them? When the driver creates the Completion Queues before learning, it specifies the associated vector for each Completion Queue with some assumptions in mappings between Completion Queues and vectors. After the learning, the associations between Completion Queues and vectors need to be corrected via deleting the queues and re-creating them. Correct me if I am wrong. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Fri Mar 16 13:42:45 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 16 Mar 2012 20:42:45 +0000 Subject: [nvmewin] Learning Mode Patch for Review In-Reply-To: <82C9F782B054C94B9FC04A331649C77A08EE19@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A08E690@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36E818@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A08EE19@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36E888@corpmail1.na.ads.idt.com> Thank you very much for all the efforts, Paul. Regards, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 1:38 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review You know what, you're right. I was thinking we had a match on the CQ side even after learning which is the case for making sure that we're building the table such that, after learning, our DPC knows exactly which queue to look in w/o searching them but we're not NUMA optimized on the CQ at that point. Missed the forest for the trees there for a minute but there are a lot of trees J I believe I can address this pretty quick and easy and it will actually be equivalent in attributes to when we had physical more and MSI address decomp as our method. Not today though, have to head out soon but little, if any, of this code will change I just need to go ahead and add code early on where we submit the N test IOs and then re-do the queues and turn off learning. Thanks for giving this some extra thought! Look for a new patch early next week... Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Friday, March 16, 2012 10:50 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review Hi Paul, I have a question regarding the Learning Mode. When you finish the learning and find out the mappings between cores and vectors, do we need to delete the created Completion Queues and re-create them? When the driver creates the Completion Queues before learning, it specifies the associated vector for each Completion Queue with some assumptions in mappings between Completion Queues and vectors. After the learning, the associations between Completion Queues and vectors need to be corrected via deleting the queues and re-creating them. Correct me if I am wrong. Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Mar 19 10:57:14 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Mon, 19 Mar 2012 17:57:14 +0000 Subject: [nvmewin] status of learning mode patch Message-ID: <82C9F782B054C94B9FC04A331649C77A0907E9@FMSMSX106.amr.corp.intel.com> Quick update: I finished the code this weekend and will probably not have time to test until tomorrow or Wed, here's basically what I did: I handle all learning (including q delete/recreate) as part of the init state machine by adding 2 new states at the end, one submits a flush through each queue (FLUSH is a mandatory command so I hope no IHV has an issue with it). The last state then destroys and recreates the queues using the knowledge learned. New cover over the last patch is fairly minimal - some new flags to control things and a change to the IO path to accommodate synchronous (but still INT driven) IO for the Q deletions to avoid more mini-states inside of a larger state (plus the shutdown routine needed this as well). I'll measure the impact (extra time before we're ready) and doubt it will be an issue but if it is I think the backup would be to keep things as they are however I'll create one temp Q for early IOs and return from passive init at the point in the init state machine when that queue is ready and then, once the full machine has done, the IO path will auto-switch over to the new queues via simple flag. I'll probably just schedule a meeting to walk through it to at least get Alex/Kwok's feedback on the phone but everyone, of course, is welcome and encouraged to attend. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kwok.Kong at idt.com Tue Mar 20 09:45:24 2012 From: Kwok.Kong at idt.com (Kong, Kwok) Date: Tue, 20 Mar 2012 16:45:24 +0000 Subject: [nvmewin] status of learning mode patch In-Reply-To: <82C9F782B054C94B9FC04A331649C77A0907E9@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A0907E9@FMSMSX106.amr.corp.intel.com> Message-ID: <05CD7821AE397547A01AC160FBC23147020686@corpmail1.na.ads.idt.com> Paul, Please set up a meeting to walk through the code change. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Monday, March 19, 2012 10:57 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] status of learning mode patch Quick update: I finished the code this weekend and will probably not have time to test until tomorrow or Wed, here's basically what I did: I handle all learning (including q delete/recreate) as part of the init state machine by adding 2 new states at the end, one submits a flush through each queue (FLUSH is a mandatory command so I hope no IHV has an issue with it). The last state then destroys and recreates the queues using the knowledge learned. New cover over the last patch is fairly minimal - some new flags to control things and a change to the IO path to accommodate synchronous (but still INT driven) IO for the Q deletions to avoid more mini-states inside of a larger state (plus the shutdown routine needed this as well). I'll measure the impact (extra time before we're ready) and doubt it will be an issue but if it is I think the backup would be to keep things as they are however I'll create one temp Q for early IOs and return from passive init at the point in the init state machine when that queue is ready and then, once the full machine has done, the IO path will auto-switch over to the new queues via simple flag. I'll probably just schedule a meeting to walk through it to at least get Alex/Kwok's feedback on the phone but everyone, of course, is welcome and encouraged to attend. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Wed Mar 21 16:59:38 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Wed, 21 Mar 2012 23:59:38 +0000 Subject: [nvmewin] status of learning mode patch In-Reply-To: <05CD7821AE397547A01AC160FBC23147020686@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A0907E9@FMSMSX106.amr.corp.intel.com> <05CD7821AE397547A01AC160FBC23147020686@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0973EF@FMSMSX106.amr.corp.intel.com> For sure, I haven't forgotten - just got diverted this week with some unrelated activities. Should be able to get back on this tomorrow though so will look for a time early next week. Thx Paul From: Kong, Kwok [mailto:Kwok.Kong at idt.com] Sent: Tuesday, March 20, 2012 9:45 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: status of learning mode patch Paul, Please set up a meeting to walk through the code change. Thanks -Kwok From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Monday, March 19, 2012 10:57 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] status of learning mode patch Quick update: I finished the code this weekend and will probably not have time to test until tomorrow or Wed, here's basically what I did: I handle all learning (including q delete/recreate) as part of the init state machine by adding 2 new states at the end, one submits a flush through each queue (FLUSH is a mandatory command so I hope no IHV has an issue with it). The last state then destroys and recreates the queues using the knowledge learned. New cover over the last patch is fairly minimal - some new flags to control things and a change to the IO path to accommodate synchronous (but still INT driven) IO for the Q deletions to avoid more mini-states inside of a larger state (plus the shutdown routine needed this as well). I'll measure the impact (extra time before we're ready) and doubt it will be an issue but if it is I think the backup would be to keep things as they are however I'll create one temp Q for early IOs and return from passive init at the point in the init state machine when that queue is ready and then, once the full machine has done, the IO path will auto-switch over to the new queues via simple flag. I'll probably just schedule a meeting to walk through it to at least get Alex/Kwok's feedback on the phone but everyone, of course, is welcome and encouraged to attend. Thx Paul ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Thu Mar 22 09:32:10 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Thu, 22 Mar 2012 16:32:10 +0000 Subject: [nvmewin] NVMe Windows Driver WG SCSI Translation Overview Message-ID: <82C9F782B054C94B9FC04A331649C77A097942@FMSMSX106.amr.corp.intel.com> Ray Robles will be providing an overview of the Windows NVMe Driver SCSI Translations Friday, April 06, 2012, 10:00 AM US Pacific Time 916-356-2663, 8-356-2663, Bridge: 4, Passcode: 8119640 Live Meeting: https://webjoin.intel.com/?passcode=8119640 Speed dialer: inteldialer://4,8119640 | Learn more -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1921 bytes Desc: not available URL: From Alex.Chang at idt.com Fri Mar 23 11:54:45 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 23 Mar 2012 18:54:45 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36EC55@corpmail1.na.ads.idt.com> Hi Paul, Just have Windows 8 (Consumer Preview, 64-bit) installed and tried with current driver. I did drive formatting and some Ios tests, did not receive the assertion. Is it happening randomly or via certain scenarios? Thanks, Alex -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 11:57 AM To: Harris, James R; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Worth a shot, thanks Jim. -----Original Message----- From: Harris, James R Sent: Friday, March 16, 2012 11:22 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org >[mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, >James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Fri Mar 23 11:57:25 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 23 Mar 2012 18:57:25 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: <548C5470AAD9DA4A85D259B663190D36EC55@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EC55@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A098F66@FMSMSX106.amr.corp.intel.com> You need to use the checked version of the OS, were you? Thx Paul -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 23, 2012 11:55 AM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Hi Paul, Just have Windows 8 (Consumer Preview, 64-bit) installed and tried with current driver. I did drive formatting and some Ios tests, did not receive the assertion. Is it happening randomly or via certain scenarios? Thanks, Alex -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 11:57 AM To: Harris, James R; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Worth a shot, thanks Jim. -----Original Message----- From: Harris, James R Sent: Friday, March 16, 2012 11:22 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org >[mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, >James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From Alex.Chang at idt.com Fri Mar 23 13:56:02 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Fri, 23 Mar 2012 20:56:02 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: <82C9F782B054C94B9FC04A331649C77A098F66@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EC55@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A098F66@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36EC68@corpmail1.na.ads.idt.com> No. I installed a free-built version. Is it quite easy to re-produce? Thanks, Alex -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Friday, March 23, 2012 11:57 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You need to use the checked version of the OS, were you? Thx Paul -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 23, 2012 11:55 AM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Hi Paul, Just have Windows 8 (Consumer Preview, 64-bit) installed and tried with current driver. I did drive formatting and some Ios tests, did not receive the assertion. Is it happening randomly or via certain scenarios? Thanks, Alex -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 11:57 AM To: Harris, James R; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Worth a shot, thanks Jim. -----Original Message----- From: Harris, James R Sent: Friday, March 16, 2012 11:22 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org >[mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, >James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Fri Mar 23 14:32:18 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 23 Mar 2012 21:32:18 +0000 Subject: [nvmewin] Win8 assertion message In-Reply-To: <548C5470AAD9DA4A85D259B663190D36EC68@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A08E9B6@FMSMSX106.amr.corp.intel.com> <82C9F782B054C94B9FC04A331649C77A08EC6F@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EC55@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A098F66@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EC68@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A0994CE@FMSMSX106.amr.corp.intel.com> Yes, just install the checked OS and it will assert when the driver exits from passive init. I haven't tried Jim's suggestion, should take just a few minutes to check it out so feel free and let us know. Thx Paul -----Original Message----- From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Friday, March 23, 2012 1:56 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message No. I installed a free-built version. Is it quite easy to re-produce? Thanks, Alex -----Original Message----- From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Friday, March 23, 2012 11:57 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You need to use the checked version of the OS, were you? Thx Paul -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Chang, Alex Sent: Friday, March 23, 2012 11:55 AM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Hi Paul, Just have Windows 8 (Consumer Preview, 64-bit) installed and tried with current driver. I did drive formatting and some Ios tests, did not receive the assertion. Is it happening randomly or via certain scenarios? Thanks, Alex -----Original Message----- From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 11:57 AM To: Harris, James R; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Win8 assertion message Worth a shot, thanks Jim. -----Original Message----- From: Harris, James R Sent: Friday, March 16, 2012 11:22 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Win8 assertion message You may need to call GetUncachedExtension, even if you're not going to use it. IIRC, on Win7 Storport would allocate the DMA adapter object during the GetUncachedExtension context. Your adapter likely doesn't have any DMA restrictions, so Storport probably doesn't really need the DMA adapter object, which is why everything works without the call. This is all guessing though - I did some quick searches on the online SVN repo and didn't see any calls to GetUncachedExtension, but I didn't look especially hard... -Jim >-----Original Message----- >From: Luse, Paul E >Sent: Friday, March 16, 2012 11:01 AM >To: Harris, James R; nvmewin at lists.openfabrics.org >Subject: RE: Win8 assertion message > >Sorry... this output starts right as we return true from passiveInit > >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3bc: >fffff880`051a8ccc ba88130000 mov edx,1388h >7: kd> p >nvme!NVMePassiveInitialize+0x3cb: >fffff880`051a8cdb ebc9 jmp nvme!NVMePassiveInitialize+0x396 (fffff880`051a8ca6) >7: kd> p >Breakpoint 2 hit >nvme!NVMePassiveInitialize+0x396: >fffff880`051a8ca6 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> g >Breakpoint 0 hit >nvme!NVMeRunningWaitOnIoSQ: >fffff880`051ad900 48894c2408 mov qword ptr [rsp+8],rcx >0: kd> g >Breakpoint 3 hit >nvme!NVMePassiveInitialize+0x3cd: >fffff880`051a8cdd 488b442448 mov rax,qword ptr [rsp+48h] >7: kd> p >nvme!NVMePassiveInitialize+0x3f6: >fffff880`051a8d06 4883c468 add rsp,68h >7: kd> p >storport!RaidAdapterStartMiniport+0x244: >fffff880`01700840 f6d8 neg al >7: kd> p >storport!RaidAdapterStartMiniport+0x246: >fffff880`01700842 1bff sbb edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x248: >fffff880`01700844 f7d7 not edi >7: kd> p >storport!RaidAdapterStartMiniport+0x24a: >fffff880`01700846 81e7010000c0 and edi,0C0000001h >7: kd> p >storport!RaidAdapterStartMiniport+0x250: >fffff880`0170084c 85ff test edi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x252: >fffff880`0170084e 78c6 js storport!RaidAdapterStartMiniport+0x21a >(fffff880`01700816) >7: kd> p >storport!RaidAdapterStartMiniport+0x254: >fffff880`01700850 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartMiniport+0x257: >fffff880`01700853 e8c0010000 call storport!RaidInitializePerfOptsPassive >(fffff880`01700a18) >7: kd> p >storport!RaidAdapterStartMiniport+0x25c: >fffff880`01700858 440fb69b68010000 movzx r11d,byte ptr [rbx+168h] >7: kd> p >storport!RaidAdapterStartMiniport+0x264: >fffff880`01700860 4181fbff000000 cmp r11d,0FFh >7: kd> p >storport!RaidAdapterStartMiniport+0x26b: >fffff880`01700867 7602 jbe storport!RaidAdapterStartMiniport+0x26f >(fffff880`0170086b) >7: kd> p >storport!RaidAdapterStartMiniport+0x26f: >fffff880`0170086b 418bfb mov edi,r11d >7: kd> p >storport!RaidAdapterStartMiniport+0x272: >fffff880`0170086e c1ef05 shr edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x275: >fffff880`01700871 41f6c31f test r11b,1Fh >7: kd> p >storport!RaidAdapterStartMiniport+0x279: >fffff880`01700875 7403 je storport!RaidAdapterStartMiniport+0x27e >(fffff880`0170087a) >7: kd> p >storport!RaidAdapterStartMiniport+0x27b: >fffff880`01700877 4103fe add edi,r14d >7: kd> p >storport!RaidAdapterStartMiniport+0x27e: >fffff880`0170087a 8bf7 mov esi,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x280: >fffff880`0170087c 41b85261564d mov r8d,4D566152h >7: kd> p >storport!RaidAdapterStartMiniport+0x286: >fffff880`01700882 b900020000 mov ecx,200h >7: kd> p >storport!RaidAdapterStartMiniport+0x28b: >fffff880`01700887 48c1e602 shl rsi,2 >7: kd> p >storport!RaidAdapterStartMiniport+0x28f: >fffff880`0170088b 488bd6 mov rdx,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x292: >fffff880`0170088e ff1584970200 call qword ptr [storport!_imp_ExAllocatePoolWithTag >(fffff880`0172a018)] >7: kd> p >storport!RaidAdapterStartMiniport+0x298: >fffff880`01700894 48898350120000 mov qword ptr [rbx+1250h],rax >7: kd> p >storport!RaidAdapterStartMiniport+0x29f: >fffff880`0170089b 4885c0 test rax,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2a2: >fffff880`0170089e 742b je storport!RaidAdapterStartMiniport+0x2cf >(fffff880`017008cb) >7: kd> p >storport!RaidAdapterStartMiniport+0x2a4: >fffff880`017008a0 4c8bc6 mov r8,rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2a7: >fffff880`017008a3 33d2 xor edx,edx >7: kd> p >storport!RaidAdapterStartMiniport+0x2a9: >fffff880`017008a5 488bc8 mov rcx,rax >7: kd> p >storport!RaidAdapterStartMiniport+0x2ac: >fffff880`017008a8 e8134b0200 call storport!memset (fffff880`017253c0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2b1: >fffff880`017008ad 488b9350120000 mov rdx,qword ptr [rbx+1250h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2b8: >fffff880`017008b4 c1e705 shl edi,5 >7: kd> p >storport!RaidAdapterStartMiniport+0x2bb: >fffff880`017008b7 488d8b40120000 lea rcx,[rbx+1240h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2c2: >fffff880`017008be 448bc7 mov r8d,edi >7: kd> p >storport!RaidAdapterStartMiniport+0x2c5: >fffff880`017008c1 ff1529980200 call qword ptr [storport!_imp_RtlInitializeBitMap >(fffff880`0172a0f0)] >7: kd> p >storport!RaidAdapterStartMiniport+0x2cb: >fffff880`017008c7 33c0 xor eax,eax >7: kd> p >storport!RaidAdapterStartMiniport+0x2cd: >fffff880`017008c9 eb05 jmp storport!RaidAdapterStartMiniport+0x2d4 >(fffff880`017008d0) >7: kd> p >storport!RaidAdapterStartMiniport+0x2d4: >fffff880`017008d0 488b9c2488000000 mov rbx,qword ptr [rsp+88h] >7: kd> p >storport!RaidAdapterStartMiniport+0x2dc: >fffff880`017008d8 4883c450 add rsp,50h >7: kd> p >storport!RaidAdapterStartMiniport+0x2e0: >fffff880`017008dc 415e pop r14 >7: kd> p >storport!RaidAdapterStartMiniport+0x2e2: >fffff880`017008de 5f pop rdi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e3: >fffff880`017008df 5e pop rsi >7: kd> p >storport!RaidAdapterStartMiniport+0x2e4: >fffff880`017008e0 c3 ret >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1db: >fffff880`0173182f 8bf8 mov edi,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1dd: >fffff880`01731831 85c0 test eax,eax >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1df: >fffff880`01731833 7891 js storport!RaidAdapterStartDeviceIrp+0x172 >(fffff880`017317c6) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e1: >fffff880`01731835 488bcb mov rcx,rbx >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e4: >fffff880`01731838 4584f6 test r14b,r14b >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e7: >fffff880`0173183b 7524 jne storport!RaidAdapterStartDeviceIrp+0x20d >(fffff880`01731861) >7: kd> p >storport!RaidAdapterStartDeviceIrp+0x1e9: >fffff880`0173183d e8eefafcff call storport!RaidAdapterCompleteInitialization >(fffff880`01701330) >7: kd> p >Assertion: RaidIsRegionInitialized(&Adapter->UncachedExtension) >storport!StorAllocateContiguousIoResources+0x2d: >fffff880`017184a1 cd2c int 2Ch > > >-----Original Message----- >From: nvmewin-bounces at lists.openfabrics.org >[mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Harris, >James R >Sent: Friday, March 16, 2012 10:59 AM >To: nvmewin at lists.openfabrics.org >Subject: [nvmewin] Win8 assertion message > >Paul, > >I didn't see the assertion message come through. Can you paste into your message? > >Thanks, > >-Jim > >_______________________________________________ >nvmewin mailing list >nvmewin at lists.openfabrics.org >http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin _______________________________________________ nvmewin mailing list nvmewin at lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/nvmewin From paul.e.luse at intel.com Fri Mar 23 16:24:33 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 23 Mar 2012 23:24:33 +0000 Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review - UPDATED Message-ID: <82C9F782B054C94B9FC04A331649C77A099603@FMSMSX106.amr.corp.intel.com> Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas J I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source.zip Type: application/x-zip-compressed Size: 158435 bytes Desc: source.zip URL: From paul.e.luse at intel.com Mon Mar 26 17:52:58 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 27 Mar 2012 00:52:58 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED Message-ID: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas J I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Mon Mar 26 18:40:30 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 27 Mar 2012 01:40:30 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I’ve since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won’t schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas J I’m not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn’t support deleting queues, I’ll get it figured out soon though. I’ll setup a review for late next week, if anyone would like more time let me know – there’s no blazing hurry on this but I do think we want this to be our first release (so we’re actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up’d the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it’s a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a ‘shutdown state’ as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don’t need large queues for learning and it’s a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we’re using it anyway - In NVMeInitialize() removed the init of num Q’s allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we’d spin forever. So, made passiveInit do the time based timing and removed from the init machine (don’t need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we’re using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we’re in sub-optimal conditions to begin with or if we’ve already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there’s no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won’t resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn’t require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV’s have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won’t happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we’re using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn’t added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don’t poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that’s set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we’re using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we’ve learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There’s no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you’ll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There’s one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex’s patch in here as it’s still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn’t recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we’re in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this…. On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they’re no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we’re in learning mode, if not then we simply look up the queue num from the core table o If we’re in learning mode (based on a simple count of how many cores we’ve learned vs total available cores), then we use the core number that we’re on (that we’re learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don’t do this). We disable it by pretending that we’ve already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we’re sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() – bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can’t return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex’s bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we’re learning or not to FALSE if we’re in shared mode because we disable learning mode during init if that’s the case o If we’re not shared, the learning Boolean is set based on how many cores we’ve learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we’re not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init’d it that way o The ‘learning’ happens in a new coniditonal just after we determine we have an srbExt. It works as follows: • Grab the startIO lock as we’re sharing the core table with startIO and during learning mode we’re not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) • Lookup the CT entry for the core that we’re completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. • Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we’d get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Mar 26 19:18:11 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 27 Mar 2012 02:18:11 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Mon Mar 26 20:12:57 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 27 Mar 2012 03:12:57 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A09B29F@FMSMSX106.amr.corp.intel.com> Short on time so am assuming that's what you meant and made it already - great suggestion and only a few lines of code to implement. It will be included when I send out the updates for review most likely on Thu. I believe I addressed my last issue w/QEMU, just not time to test it before I leave Thx Paul From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Monday, March 26, 2012 7:18 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Tue Mar 27 08:42:23 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 27 Mar 2012 15:42:23 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36EDC3@corpmail1.na.ads.idt.com> Hi Paul, Yes, that's what I meant. The sole purpose of learning mode you added is to find out the mappings between vectors and cores. If we allocate queue memory based on NUMA locality for each core, and then delete/re-create queues after learning is completed, the NUMA locality is still being covered without re-allocating the queue memory. Thanks, Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, March 26, 2012 7:18 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Tue Mar 27 09:05:32 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Tue, 27 Mar 2012 16:05:32 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <548C5470AAD9DA4A85D259B663190D36EDC3@corpmail1.na.ads.idt.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EDC3@corpmail1.na.ads.idt.com> Message-ID: <82C9F782B054C94B9FC04A331649C77A09B67E@FMSMSX106.amr.corp.intel.com> Agreed and implemented as such J I should be able to test first thing Thu morn. Will provide an update afterwards and the latest patch as well. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, March 27, 2012 8:42 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, Yes, that's what I meant. The sole purpose of learning mode you added is to find out the mappings between vectors and cores. If we allocate queue memory based on NUMA locality for each core, and then delete/re-create queues after learning is completed, the NUMA locality is still being covered without re-allocating the queue memory. Thanks, Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, March 26, 2012 7:18 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alex.Chang at idt.com Tue Mar 27 09:09:10 2012 From: Alex.Chang at idt.com (Chang, Alex) Date: Tue, 27 Mar 2012 16:09:10 +0000 Subject: [nvmewin] Learning Mode Patch for Review - UPDATED In-Reply-To: <82C9F782B054C94B9FC04A331649C77A09B67E@FMSMSX106.amr.corp.intel.com> References: <82C9F782B054C94B9FC04A331649C77A09B1AF@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36ED96@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A09B23C@FMSMSX106.amr.corp.intel.com> <548C5470AAD9DA4A85D259B663190D36EDC3@corpmail1.na.ads.idt.com> <82C9F782B054C94B9FC04A331649C77A09B67E@FMSMSX106.amr.corp.intel.com> Message-ID: <548C5470AAD9DA4A85D259B663190D36EDDD@corpmail1.na.ads.idt.com> Thank you very much, Paul. Regards, Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Tuesday, March 27, 2012 9:06 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Agreed and implemented as such J I should be able to test first thing Thu morn. Will provide an update afterwards and the latest patch as well. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, March 27, 2012 8:42 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, Yes, that's what I meant. The sole purpose of learning mode you added is to find out the mappings between vectors and cores. If we allocate queue memory based on NUMA locality for each core, and then delete/re-create queues after learning is completed, the NUMA locality is still being covered without re-allocating the queue memory. Thanks, Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, March 26, 2012 7:18 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.e.luse at intel.com Fri Mar 30 11:39:49 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 30 Mar 2012 18:39:49 +0000 Subject: [nvmewin] ***UNCHECKED*** RE: Learning Mode Patch for Review - UPDATED Message-ID: <82C9F782B054C94B9FC04A331649C77A0AB269@FMSMSX106.amr.corp.intel.com> OK, all tested again and working (same tests as before). Pw is ofanvme123 and you'll need to update your qemu for the learning stuff to work (drier will still work but with only 1 queue) Below is the list of changes to the patch. I'll schedule a call to review but don't plan on reviewing the changes one by one (we can of course if folks would prefer but there are a lot of small changes and it will take a while plus distract from the main functional differences) instead I'll walk through the init path for learning and subsequent IO submission/completion. I'll have a live debugger session going to demonstrate the changes in action as needed. nvmStd.h - Removed elements no longer needed nvmeStd.c - Removed setting the learning mode queue size, we don't re-allocate now - Changes throughout for NVMeDriverFatalError() where I removed the last parm (there were more callers that didn't need it than did) - In the DPC/ISR, no longer update the QID in the CT, instead only update the vector info in the CT, the QI and the MMT structs - In the DPC/ISR,, removed support for sync commands. Added as part of re-using the delete queues functions but ran into too many complexities making it not worth it. Instead, the delete queues are part of the init state machine and on shutdown we simply do an EN transition to delete the queues now nvmeStat.c - Changes for NVMeDriverFatalError() calls mentioned earler - In NVMeRunningWaitOnReSetupQueues(), we're now called as part of the init state machine for each queue so the call to delete Cpl queues is conditional on the sub queues having been deleted already. Also no longer clear the allocateQueue flag since we're no longer freeing/reallocating memory Nvmeio.c - Removed sync support Nvmeinit.c -In the nvmeInitcallback() added the state NVMeWaitOnReSetupQueues to handle completion of the queue deletion phase, once all are delete we change next state to recreate queues - Changes to NVMeDeleteCplQueues() and NVMeDeleteSubQueues() to call for one deletion and count on state machine to delete all queues (was sync on all queues) NVMeNormalShutdown() cals for device reset to delete queues instead of trying to delete the queues individually synchronously or via a state machine that would need to be coded - Removed learning support from NVMeAllocIoQueues(), no longer needed From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Tuesday, March 27, 2012 9:06 AM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Agreed and implemented as such J I should be able to test first thing Thu morn. Will provide an update afterwards and the latest patch as well. Thx Paul From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Tuesday, March 27, 2012 8:42 AM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, Yes, that's what I meant. The sole purpose of learning mode you added is to find out the mappings between vectors and cores. If we allocate queue memory based on NUMA locality for each core, and then delete/re-create queues after learning is completed, the NUMA locality is still being covered without re-allocating the queue memory. Thanks, Alex ________________________________ From: Luse, Paul E [mailto:paul.e.luse at intel.com] Sent: Monday, March 26, 2012 7:18 PM To: Chang, Alex; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED I guess we could do that, keep the queue-core association and change just the IV for the CQ, is that what you're suggesting? So, for example, below shows what mapping might look like before, then after learning with how I have it coded vs what you are saying. If I understand you correctly I do like that way better, pls take a look below and confirm for me. BEFORE Core QP MSIX NUMA 0 1 1 0 1 2 2 1 2 3 3 2 WHAT I'M DOING Core QP MSIX NUMA 0 8 8 0 1 1 1 1 2 2 2 2 WHAT YOU ARE SUGGESTING Core QP MSIX NUMA 0 1 8 0 1 2 1 1 2 3 2 2 From: Chang, Alex [mailto:Alex.Chang at idt.com] Sent: Monday, March 26, 2012 6:41 PM To: Luse, Paul E; nvmewin at lists.openfabrics.org Subject: RE: Learning Mode Patch for Review - UPDATED Hi Paul, In the changes of nvmestat.c, you mentioned: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. I am not sure it's necessary to free the allocated IO queue memory. Isn't it enough to adjust the mapping between cores and queues after re-creating the queue? Thanks, Alex ________________________________ From: nvmewin-bounces at lists.openfabrics.org [nvmewin-bounces at lists.openfabrics.org] on behalf of Luse, Paul E [paul.e.luse at intel.com] Sent: Monday, March 26, 2012 5:52 PM To: nvmewin at lists.openfabrics.org Subject: Re: [nvmewin] Learning Mode Patch for Review - UPDATED Quick testing update: ran into a QEMU bug that I've since fixed and enabled me to continue testing. There will be a few small tweaks to the package I sent out but nothing major. I have to head to the OFA developer workshop for the next two days so will pick up on this again Thu. I won't schedule the review until I get everything 100% tested. Thx Paul From: Luse, Paul E Sent: Friday, March 23, 2012 4:25 PM To: nvmewin at lists.openfabrics.org Subject: Learning Mode Patch for Review - UPDATED Here are the changes in addition to the ones below to accommodate the deletion/recreation of queues as part of learning mode. Looks like a lot only because I touched a lot of functions for cleanup in a few areas :) I'm not totally done testing, I only have the final queue deletion to step through and am having a small challenge w/QEMU and my proto HW doesn't support deleting queues, I'll get it figured out soon though. I'll setup a review for late next week, if anyone would like more time let me know - there's no blazing hurry on this but I do think we want this to be our first release (so we're actually NUMA optimized) Pw is ofanvme123 nvmeStd.h - Up'd the poll mode timeout from 1 to 3 and renamed it as its now used for more than just crashdump, it's a generic polling retry value - New error states and also I repurposed this from an init state machine specific set of errors to general errors that may throw the driver into a 'shutdown state' as well. The shutdown routine was using this enum and the driver state to indicate shutdown before however the names/usages were all related to the init state machine which I think could be confusing for new folks reading the code - Added 2 new init states, learning and re-setting up the queues - A few fairly obvious changes to support logic changes in the C code described later - Fixed the device state timeout counter, was a UCHAR and not big enough to handle our timeout counter nvmeInit.h - New parm for the alloc queue function, need to tell it how many entries on the queue as we don't need large queues for learning and it's a waste of alloc/free larger than needed contiguous chunks of memory if we used one size nvmeStd.c - By default I set the # of queues to create for learning to the previously defined minimum for the driver - Multiple changes where calling NVMeAllocQueues() to pass the new parm, # of queue entries - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - In NVMeInitialize() removed the init of num Q's allocated elements in queue info, its overwritten later regardless - Some refactoring in both the ISR and DPC around the logic where we decide whether we need to call storportNotification or not, with the new condition added (whether caller wants synchronous IO or not) it was too confusing to try and work it into the previous set of conditions - This is where we clear the srbExt synchronous flag which, for sync IO, tells the caller (more on this later) that the IO has now completed - Fixed the passiveInit timeout handler, it was relying on the time drive state machine thread to switch states, if it hung then this would never happen so we'd spin forever. So, made passiveInit do the time based timing and removed from the init machine (don't need to time in both places) - Removed the startiolock from the learning mode condition in the ISR/DPC, not needed until we implement concurrent_channels nvmeStat.c - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - Addition of 2 new states in NVMeRunning() - State handler added: NVMeRunningWaitOnLearnMapping(). If we're in sub-optimal conditions to begin with or if we've already done this state, we skip the learning and re-setup queues states. Also, if no namespace exists there's no way to learn so we just skip init learning and go non-NUMA optimized. The existing learning logic will pick the right queue and adjust the mapping tables (so we still learn as the host sends IOs) we just won't resetup the queues. The completion side will go back to this state if we have more IOs to learn. FLUSH is a mandatory command (and doesn't require data xfer) so it was the logical choice here however I made it a read of 1 block in case IHV's have issues with flush (either not supporting or not wanting a flush on every boot for whatever reason). - State handler added: NVMeRunningWaitOnReSetupQueues(). Delete the queues (frees the mem also), set the flag to indicate that we need to allocate them again and then jump back to the state where we allocate them. Learning won't happen again because the learning state handler will recognize that its been completed once - Moved timeout check to passive init thread nvmeSnti.c - renamed StartState to DriveState to reflect how we're using it anyway nvmeIO.c - changes here support being able to do sync IO. The shutdown routines, which I reused for learning mode resetup, needed sync IO but hadn't added it so this is technically a change that was needed anyway - moved the NVMeIssueCmd() ret value check to after the call (was in the wrong place) - after we issue the command, we handle polling for crashdump same as before (where there are no INTs) - after we handle crashdump, added we poll here for sync IO to complete but because INTs are enabled, we don't poll by calling the ISR, we just wait for the ISR to fire and clear the flag in the srbExt that's set if the IO is sync nvmeInit.h - function proto change for new parm nvmeInit.c - NVMeAllocQueues() changed to accept a parm for # of entries to alloc - Multiple changes where I renamed StartState to DriveState to reflect how we're using it anyway - New callback: NVMeDeleteQueueCallback. Simply decrements the # of deleted queues if no error - In the init callback for the NVMeWaitOnIoSQ state, we now check how many cores we've learned and skip learning if done. Otherwise we move to the learning state - In the init callback, new case for NVMeWaitOnLearnMapping: If we have more cores to learn, stay in this state, otherwise move onto remapping/setting up queues. There's no new completion state for resttting up the queues, things will then end in the NVMeWaitOnIoSQ - Added missing default handler - In both delete queues functions, changes to support making them synchronous - In NVMeAllocIoQueues() we set the number of queues we want based on learning mode or not Sources - Removed QEMU define as we only used it for some asserts that are no longer needed w/learning mode Also throughout you'll find a new call NVMeDriverFatalError() that replaces a bunch of places where we were setting and error code, setting a state and logging an error. There's one parm in here to indicate whether its being called from the init state machine or not so it nows whether to fire off the arbiter. That way this single call can be used anywhere to flag a fatal error, log it in the system log and change the DriverState to avoid accepting new IO. This is clearly only for cases when we want to stop accepting IO or simply cannot. ________________________________ From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Luse, Paul E Sent: Friday, March 16, 2012 9:42 AM To: nvmewin at lists.openfabrics.org Subject: [nvmewin] ***UNCHECKED*** Learning Mode Patch for Review Importance: High Note that I merged Alex's patch in here as it's still pending with Ray but is only a half-dozen lines of code so rather than have Ray create to tags and do 2 merges I just rolled them together so Ray can just process one patch. Normally wouldn't recommend this but the pending patch is so small it just makes sense. Password is ofanvme123. I can schedule a call to walk through any of this if anyone would like. Learning Mode: Pretty easy, its only enabled if we're in an optimal config wrt cores/queues/vectors. Assume we have N processors, it works like this.... On startup the first N IOs will be sent to queue # 1..N sequentially. Each queue is created with a matching MSI ID so in this manner we assure to hit every queue and every message ID by incrementing the queue # for the first N IOs regardless of the submitting core. On the completion side we simply look at the core that we completed on and update the table for the completing core such that the correct queues are used the next time an IO submits to this core. Testing: (in all cases I confirmed functionality and via iometer and xperf I confirmed core load balancing) - Chatham (DT with 8 cores): Win8 Server public beta, 2008R2 - QEMU (configured for 2 cores): Win7-64 - Tools: The standard stuff: format, stress, iometer, SCSI compliance, shutdown/restart Changes: nvmeStd.h: - Comment changes as needed - Removed the logcalmode element from the msgTable as its no longer used - Removed DBG directive from procNum element, its now required for learning mode - Removed proto for NVMeMsixMapCores, not used anymore (combined MSI and MSIX routines, more later) nvmeIo.c: - Removed DBG directive from procNum element, its now required for learning mode nvmeInit.c - Removed NVMeSwap(),NVMeQuickSort(),NVMeMsixMapCores() as they're no longer used - Changes to NVMeMsiMapCores(): o Set the initial core table MsgID to the CplQ number, this is needed so that we can determine the CplQ from the MsgId during learning mode while in the DPC/ISR o The other changes are just code simplifications - Changed NVMeCompleteResMapTbl() to use NVMeMsiMapCores() for either MSI or MSIX - Changed NVMeMapCore2Queue(): o We now check if we're in learning mode, if not then we simply look up the queue num from the core table o If we're in learning mode (based on a simple count of how many cores we've learned vs total available cores), then we use the core number that we're on (that we're learning) + 1, the +1 is because all queue numbers are core+1 by our convention - Change in NVMeAllocIoQueues() to effectively disable learning mode if we only have 1 queue (it makes no sense and actually causes problems for learning mode if we don't do this). We disable it by pretending that we've already learned all the cores nvmeStd.c: - In NVMePassiveInitialize(), disable learning mode is we're sharing one MSI over multiple queues, same reasons as when we hae one queue - In NVMeInitialize(): o Enable DPC perf opt per Msft recommendation to steer stoport completion DPCs back to the submitting core - In NVMeStartIo() - bug fix unrelated to learning mode but I found it while debugging learning mode (via BSOD). You can't return FALSE from this function per MSDN docs. Always return TRUE - In IoCompletionDpcRoutine() and same changes in the ISR when its enabled for completions instead: o Merged Alex's bugfixes in with my changes o Removed the DBG related code for checking core affiliation o Where we decide which queues to check, I set a Boolean to determine if we're learning or not to FALSE if we're in shared mode because we disable learning mode during init if that's the case o If we're not shared, the learning Boolean is set based on how many cores we've learned and whether the MsgId is >0 as MsgId 0 is admin and we exclude that from learning mode o If we're not learning then, we only search the queue specified in the MMT o If we are learning, we know the queue # is the same as the MsdId because we init'd it that way o The 'learning' happens in a new coniditonal just after we determine we have an srbExt. It works as follows: * Grab the startIO lock as we're sharing the core table with startIO and during learning mode we're not yet assured that start/complete are on the same core. Note the lock is only taken on IOs during learning mode (the first few IOs) * Lookup the CT entry for the core that we're completing on and set its queue numbers to queue number that was associated with the IO that just completed. This assures that the next lookup in the table for this core # will complete on this, the same, core. * Increment our learning counter which will direct the next IO to the next core - Unrelated changes to NVMeProcessIoctl(): A few changes were made here as the routine assumed every IOCTL we'd get would be a PT IOCTL making it difficult for venders to add additional private IOCTLs. Just moved things around a bit as we had to add one to our product specific code. No other changes here other than placing IOCTL specific code in the correct case block - ____________________________________ Paul Luse Sr. Staff Engineer PCG Server Software Engineering Desk: 480.554.3688, Mobile: 480.334.4630 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: source.zip Type: application/x-zip-compressed Size: 158301 bytes Desc: source.zip URL: From paul.e.luse at intel.com Fri Mar 30 11:43:02 2012 From: paul.e.luse at intel.com (Luse, Paul E) Date: Fri, 30 Mar 2012 18:43:02 +0000 Subject: [nvmewin] Code walkthrough - 'learning mode' Message-ID: <82C9F782B054C94B9FC04A331649C77A0AB292@FMSMSX106.amr.corp.intel.com> Tuesday, April 03, 2012, 10:00 AM US Pacific Time 916-356-2663, 8-356-2663, Bridge: 3, Passcode: 2874307 Live Meeting: https://webjoin.intel.com/?passcode=2874307 Speed dialer: inteldialer://3,2874307 | Learn more -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 1939 bytes Desc: not available URL: