[nvmewin] Question on Processor to MSI vector mapping
Luse, Paul E
paul.e.luse at intel.com
Wed Feb 15 11:14:31 PST 2012
Because we create the SQ and CQ before the first IO and we have to provide the CQ vector when created. So, for example lets say we have 3 cores and an SQ/CQ pair all numbered the same as the core #. When we create them we arbitrarily give the CQ a vector to complete on. When the first OI comes lets say its on core 1 and the stoport PA tells us the vector we should expect it on is 3, we'd need to submit on SQ3 but we're on core 1. There are ways around this of course, we could do some creative things in passiveInit wrt the creation of the CQ's and submitting test IOs, re-creating, etc., but those things I don't think are worth the complexity.
Make sense?
Thx
Paul
From: Greg de Valois [mailto:Greg.deValois at sandisk.com]
Sent: Wednesday, February 15, 2012 12:06 PM
To: Luse, Paul E; nvmewin at lists.openfabrics.org
Subject: RE: Question on Processor to MSI vector mapping
Paul:
Thanks for the reply. I'm afraid I'm not quite following you: why does using the API provided by Storport to get the MSI vector associated with the current request imply that you're submitting on a different core than completing? The core you're executing on is the one assigned the vector that the API gives you, you put it on the submission queue that has been assigned to that vector, and it's completed on the completion queue for that vector as well.
What am I missing here?
Greg
________________________________
From: Luse, Paul E [paul.e.luse at intel.com]
Sent: Wednesday, February 15, 2012 10:57 AM
To: Greg de Valois; nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: RE: Question on Processor to MSI vector mapping
Hi Greg-
Thanks for the question, it's a good one! The reasoning was mainly because it was fairly straightforward and had one benefit over the method you mention - we could assure NUMA optimization and vector matching for both SQ and CQ. There are many, many ways that one could approach this problem and we discussed a few as part of the dev of this driver and then individually discussed experiences with various methods with other dev teams at our respective companies.
If one uses the method you mention below, we'd create our SQ/CQ per core NUMA optimized and then be submitting on a different core than we complete on however we'd still be optimizing the completion side. The more I thought about this the more I realize its actually not buying us much of anything over using the Msft API due to the effects of CPU cache and the fact that the SQ access on submission is a write and not a read. I also heard from various other folks that they didn't find the API below to be accurate all of the time, I can't say from experience that I've seen this.
That said, I will likely be proposing an alternate method in the near future so I'll go ahead and propose it now since you brought up the subject:
Proposal: no long decompose the MSI address to populate the mapping table. Instead, start off with a 1:1 mapping and 'learn and update' the mapping table on the completion side. Would still avoid the storport API because I don't think it adds value over the learned method and requires us to use the DPC steering option which I've witnessed to have unpredictable side effects. I do plan on following up with Msft (and have already had several internal discussions at Intel with other storport devs) on exactly how some of these optimizations within storport so we can better gauge whether they're a good fit for us or now.
Pro: This will *always* work whereas the method we have no does not work for APIC logical mode. I prefer a simple "one size fits all" solution every time over a 2 path solution even if one path is slightly optimized. It makes the driver more maintainable and gives us less variables in debug (right now we don't even store whether we found the APIC in phy or logical mode so during debug you don't really know).
Con: SQ mem will be on a different core than the submitting thread but I don't believe this is a measurable issue. Certainly can perform some experiments to check though
Other thoughts?
Thx
Paul
From: nvmewin-bounces at lists.openfabrics.org<mailto:nvmewin-bounces at lists.openfabrics.org> [mailto:nvmewin-bounces at lists.openfabrics.org]<mailto:[mailto:nvmewin-bounces at lists.openfabrics.org]> On Behalf Of Greg de Valois
Sent: Tuesday, February 14, 2012 5:02 PM
To: nvmewin at lists.openfabrics.org<mailto:nvmewin at lists.openfabrics.org>
Subject: [nvmewin] Question on Processor to MSI vector mapping
All:
I'm wondering if anyone can explain to me the reasoning behind the processor to MSI vector translation that is being done by the driver, instead of using the vector returned from StorPortGetStartIoPerfParams for each IO? Are there cases where this doesn't work properly?
Thanks,
Greg de Valois
SanDisk
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20120215/0fe7f476/attachment.html>
More information about the nvmewin
mailing list