[ewg] FW: disabling N-1 CPUs at run time on a RH6.1 kernel using OFEd 1.5.3 causes hang

Hefty, Sean sean.hefty at intel.com
Thu Aug 11 15:13:24 PDT 2011


I saw the following problem disabling CPUs reported using OFED 1.5.3.   I'm simply forwarding this on.  If OFED 1.5.3 is removed from the system,  CPU disabling works.  I have not tried to reproduce this myself or looked into the matter.

- Sean

---


The following problem happens with RH6.1, 2.6.32-131.0.15.el6.x86_64 kernel  and OFED 1.5.3. If I disable all but one CPU via

echo 0 > /sys/devices/system/node/node0/cpu1/online
....
echo 0 > /sys/devices/system/node/node1/cpu31/online

on 2 socket systems (tested both on Westmere and Sandy Bridge) the systems becomes completely unusable. Stopping InfiniBand and removing the driver from the kernel nothing like this happens, the system remains stable.

Checking on console and got that directly after the offline



[cid:image001.png at 01CC5782.EAC1DBC0]



and a few moments later



[cid:image003.png at 01CC5782.EAC1DBC0]



Which indicates the Lustre FS can't reach the InfiniBand device. From time to time output is sprinkled with resets of eth0. The system does not react to alt-sysreq-t (it did it before)- but the various status messages indicate the system is still running!



If I try it without Lustre running I simply get



[cid:image002.png at 01CC5782.36F0FDB0]



but again it does not react to sysrq-t.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110811/7719e49a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 120832 bytes
Desc: image002.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110811/7719e49a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 157764 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110811/7719e49a/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 187758 bytes
Desc: image003.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110811/7719e49a/attachment-0002.png>


More information about the ewg mailing list