[nvmewin] next patch - quick update

Luse, Paul E paul.e.luse at intel.com
Wed Nov 14 10:32:32 PST 2012


I had almost forgotten that the AER code in the OFA repo was broken to begin with (commented out in the state machine since day 1) so I'll need to merge the fixed code in from an internal branch (not just re-include what was there before with #define wraps) so it will take a few more days as I'll need to fully test the driver issued AER function in the OFA base...

Thx
Paul

From: Luse, Paul E
Sent: Monday, November 12, 2012 6:12 PM
To: nvmewin at lists.openfabrics.org
Subject: RE: next patch

FYI I had a request to leave the AER stuff in with a compile switch for anyone who wants the driver to manage the AER responses.  I'm totally fine with that and given it was in there before I'm assuming nobody will have issues leaving it in, especially since I'll wrap it in a #define.  I'll try to get that done tomorrow and send an updated patch out.

Thx
Paul

From: Luse, Paul E
Sent: Thursday, November 08, 2012 2:12 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: next patch

All-

Please review, will be looking for feedback here over the next few weeks - let's say by Mon Nov 19 if possible (just before the holidays).  Let me know if you have any questions or if anyone would like to schedule a call to walk through any of this.

Thx
Paul


Main Changes:
- fix reset code, using wrong criteria for identifying outstanding IOs on a timeout/reset
- misc cleanup of commands and some prints
- removal of all code related to the driver issuing AERs
- fix read_cap translations off by one error
- add support for checking the max data transfer size reported by HW to make sure init can continue (we don't find out what it is until after we've reported to storport)
- removed portion of last patch that addressed a storport assert but Alex discovered a perf side effect (confirmed here as well)

Testing:
- majority of testing was around error handling; I modified QEMU to drop an IO every 10K or so IOs to simulate an IO timeout in hardware.  Verified this with and without load; with load used 4 threads of data integrity testing that ran 48 hrs with continual resets/recoveries and no adverse effects

Detail:
nvmeInit.c
- removed excess prints, added a few on important non-frequent activities
- replaced all StorPortMoveMemory with StorPortCopyMemory and memcpy with StorPortCopyMemory
- added support for checking MDTS.  If we find that the card doesn't support the xfer size that we already reported (if its too small) then we have no choice but to fail the init state machine at this point in time or we'll get transfers that the HW can't handle.
- removed all code having to do with AER.  This was an initial design choice to include issuing AERs are part of the init state machine however it makes little sense for the driver to do this.  A mgmt. app should be doing this via PT IOCTL so that it can properly log the response.  The driver can do very little with the response unless someone adds additional code to pass that up to a mgmt. app in which case there's no value add in the driver being in the middle of it.

Nvmeinit.h
- removed AER function proto

Nvmeio.c
-  replaced all memcpy with StorPortCopyMemory
- updated NVMeDetectPendingCmds () so it can be used by the reset DPC to cleanup pending commands.  What we were doing before was cleaning up commands that were on the SQ but hadn't been picked up by FW yet which was simply wrong and will always be zero since we submit one command at a time.  The correct set of commands that we need to send back following a reset are those detected by NVMeDetectPendingCmds() so IO added a parm so it can serve that purpose as well
- changed the prints in NVMeDetectPendingCmds() so they print by default in a free build of the driver.  Implementations can change this if they want but even on a free build you'd generally like to be able to see if anything timed out and what was sent back if so

Nvmeio.h
- supporting func header change

Nvmepwrmgmt.c
- new parm for call to NVMeDetectPendingCmds()

Nvmensti.c
- replaced all StorPortMoveMemory with StorPortCopyMemory and memcpy with StorPortCopyMemory
- fix for read_cap translations, need to subtract one from translated value of NSZE as its not zero based

Nvmestat.c
- removed AER code

Nvmestd.c
- removed call to StorPortGetUncachedExtension(); causing performance issues.  We'll add it back after we fully understand the correct implementation that avoids the storport assert and has no side effects
-removed AER code
-added debug print
- reworked RecoveryDpcRoutine() to use NVMeDetectPendingCmds() for returning commands to storport
- replaced all StorPortMoveMemory with StorPortCopyMemory

Nvmestd.h
-fix typo in enum
-add new init state machine failure code for max xfer mismatch
-remove AER code

Nvme.h
- pragma for SMART data to be properly formatted

Nvmeioctl.h
- new ioctl status code for max AER (even though we don't issue them from the driver, we can still track how many are issued)


____________________________________
Paul Luse
Sr. Staff Engineer
PCG Server Software Engineering
Desk: 480.554.3688, Mobile: 480.334.4630

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20121114/d7770cfc/attachment.html>


More information about the nvmewin mailing list