[nvmewin] Question on Processor to MSI vector mapping

Wed Feb 15 10:57:30 PST 2012

Hi Greg-

Thanks for the question, it's a good one!  The reasoning was mainly because it was fairly straightforward and had one benefit over the method you mention - we could assure NUMA optimization and vector matching for both SQ and CQ.  There are many, many ways that one could approach this problem and we discussed  a few as part of the dev of this driver and then individually discussed experiences with various methods with other dev teams at our respective companies.

If one uses the method you mention below, we'd create our SQ/CQ per core NUMA optimized and then be submitting on a different core than we complete on however we'd still be optimizing the completion side.  The more I thought about this the more I realize its actually not buying us much of anything over using the Msft API due to the effects of CPU cache and the fact that the SQ access on submission is a write and not a read.  I also heard from various other folks that they didn't find the API below to be accurate all of the time, I can't say from experience that I've seen this.

That said, I will likely be proposing an alternate method in the near future so I'll go ahead and propose it now since you brought up the subject:

Proposal:  no long decompose the MSI address to populate the mapping table.  Instead, start off with a 1:1 mapping and 'learn and update' the mapping table on the completion side.  Would still avoid the storport API because I don't think it adds value over the learned method and requires us to use the DPC steering option which I've witnessed to have unpredictable side effects.  I do plan on following up with Msft (and have already had several internal discussions at Intel with other storport devs) on exactly how some of these optimizations within storport so we can better gauge whether they're a good fit for us or now.

Pro:  This will *always* work whereas the method we have no does not work for APIC logical mode.  I prefer a simple "one size fits all" solution every time over a 2 path solution even if one path is slightly optimized.  It makes the driver more maintainable and gives us less variables in debug (right now we don't even store whether we found the APIC in phy or logical mode so during debug you don't really know).

Con:  SQ mem will be on a different core than the submitting thread but I don't believe this is a measurable issue.  Certainly can perform some experiments to check though

Other thoughts?

Thx
Paul

From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of Greg de Valois
Sent: Tuesday, February 14, 2012 5:02 PM
To: nvmewin at lists.openfabrics.org
Subject: [nvmewin] Question on Processor to MSI vector mapping

All:

I'm wondering if anyone can explain to me the reasoning behind the processor to MSI vector translation that is being done by the driver, instead of using the vector returned from StorPortGetStartIoPerfParams for each IO? Are there cases where this doesn't work properly?

Thanks,

Greg de Valois
SanDisk

________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120215/f775aa70/attachment.html>