[ewg] FW: disabling N-1 CPUs at run time on a RH6.1 kernel using OFEd 1.5.3 causes hang

Tziporet Koren tziporet at mellanox.co.il
Sun Aug 14 08:43:59 PDT 2011


Thanks Sean
We will look into it

Tziporet

From: ewg-bounces at lists.openfabrics.org [mailto:ewg-bounces at lists.openfabrics.org] On Behalf Of Hefty, Sean
Sent: Friday, August 12, 2011 1:15 AM
To: ewg at lists.openfabrics.org
Subject: [ewg] FW: disabling N-1 CPUs at run time on a RH6.1 kernel using OFEd 1.5.3 causes hang

I saw the following problem disabling CPUs reported using OFED 1.5.3.   I'm simply forwarding this on.  If OFED 1.5.3 is removed from the system,  CPU disabling works.  I have not tried to reproduce this myself or looked into the matter.

- Sean

---


The following problem happens with RH6.1, 2.6.32-131.0.15.el6.x86_64 kernel  and OFED 1.5.3. If I disable all but one CPU via

echo 0 > /sys/devices/system/node/node0/cpu1/online
....
echo 0 > /sys/devices/system/node/node1/cpu31/online

on 2 socket systems (tested both on Westmere and Sandy Bridge) the systems becomes completely unusable. Stopping InfiniBand and removing the driver from the kernel nothing like this happens, the system remains stable.

Checking on console and got that directly after the offline



[cid:image001.png at 01CC5AB2.46625630]



and a few moments later



[cid:image003.png at 01CC5AB2.46625630]



Which indicates the Lustre FS can't reach the InfiniBand device. From time to time output is sprinkled with resets of eth0. The system does not react to alt-sysreq-t (it did it before)- but the various status messages indicate the system is still running!



If I try it without Lustre running I simply get



[cid:image004.png at 01CC5AB2.46625630]



but again it does not react to sysrq-t.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110814/8a7ed9fb/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 157764 bytes
Desc: image001.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110814/8a7ed9fb/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 187758 bytes
Desc: image003.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110814/8a7ed9fb/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 120832 bytes
Desc: image004.png
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110814/8a7ed9fb/attachment-0002.png>


More information about the ewg mailing list