[nvmewin] regarding incorrect core-queue-msix mapping in 1.4 OFA code
Foster, Carolyn D
carolyn.d.foster at intel.com
Mon Jun 15 17:20:58 PDT 2015
Hi Suman,
There are two important boundary conditions to consider with analyzing the queue to core and MSI-X mapping. These boundary conditions include systems with more than 32 cores and also more than 64 cores. On a system with more than 32 cores, in your example below, it's possible some cores won't get mapped to the appropriate MSI-X vectors. The other boundary condition is when the system has more than 64 cores. In this case, the core numbering may not be consecutive. In both cases, without re-creating the queues, there are likely to be problems.
All that being said, I do see areas for improvements in terms of memory allocation per NUMA node and the core to queue pair assignment. Is it possible for you to do more investigation and to come back with a proposal for a fix and potentially a patch?
Thanks,
Carolyn
From: nvmewin-bounces at lists.openfabrics.org [mailto:nvmewin-bounces at lists.openfabrics.org] On Behalf Of SUMAN PRAKASH B
Sent: Monday, June 15, 2015 10:36 AM
To: nvmewin at lists.openfabrics.org
Subject: [nvmewin] regarding incorrect core-queue-msix mapping in 1.4 OFA code
Dear All,
The core-queue-msix mapping(NUMA implementation) in the 1.4 OFA code doesn't look correct.
During learning-cores, after the driver sends a read command on each queue to get the Core to MSIx mapping, the driver updates the following:
a) MSIx vector corresponding to core.
b) Queue assignment with corresponding core.
Because of b, the queue assignment is changed but the memory allocation remains unchanged. Then when the queues are deleted and re-created, we see that the queues are created on memory from other nodes, which should not be the case, as this will introduce remote node memory access which might impact the performance.
We tested on a 32 logical processor server with 2 NUMA nodes and following is the mapping after learning cores -
[cid:image001.gif at 01D0A785.DFC282E0]
Ideally, during learning-cores, after the driver sends a read command on each queue to get the Core to MSIx mapping, the driver should update only the following:
a) MSIx vector corresponding to core.
With only a), following is the core-queue-msix mapping:
[cid:image002.gif at 01D0A785.DFC282E0]
Any comments?
Thanks,
Suman
[Image removed by sender.]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20150616/2561a50f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20150616/2561a50f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 32398 bytes
Desc: image001.gif
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20150616/2561a50f/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 31273 bytes
Desc: image002.gif
URL: <http://lists.openfabrics.org/pipermail/nvmewin/attachments/20150616/2561a50f/attachment-0001.gif>
More information about the nvmewin
mailing list