[ewg] FW: disabling N-1 CPUs at run time on a RH6.1 kernel using OFEd 1.5.3 causes hang

Dotan Barak dotanb at dev.mellanox.co.il
Wed Aug 24 07:17:44 PDT 2011


Hi Sean.

We tried to disable the CPUs/cores and we didn't get the phenomena that
you've described (we didn't use Lustre).
Can you please provide some more info on this?
(can it be reproduced without Lustre too? are you using special CPU?).

Thanks
Dotan

On Fri, Aug 12, 2011 at 1:13 AM, Hefty, Sean <sean.hefty at intel.com> wrote:

>  I saw the following problem disabling CPUs reported using OFED 1.5.3.
>   I’m simply forwarding this on.  If OFED 1.5.3 is removed from the system,
>  CPU disabling works.  I have not tried to reproduce this myself or looked
> into the matter.****
>
> ** **
>
> - Sean****
>
> ** **
>
> ---
>
> ** **
>
> ** **
>
> The following problem happens with RH6.1, 2.6.32-131.0.15.el6.x86_64 kernel
>  and OFED 1.5.3. If I disable all but one CPU via ****
>
> ** **
>
> echo 0 > /sys/devices/system/node/node0/cpu1/online****
>
> ….****
>
> echo 0 > /sys/devices/system/node/node1/cpu31/online****
>
> ** **
>
> on 2 socket systems (tested both on Westmere and Sandy Bridge) the systems
> becomes completely unusable. Stopping InfiniBand and removing the driver
> from the kernel nothing like this happens, the system remains stable.****
>
> ** **
>
> Checking on console and got that directly after the offline****
>
> ** **
>
> ****
>
> ** **
>
> and a few moments later****
>
> ** **
>
> ****
>
> ** **
>
> Which indicates the Lustre FS can’t reach the InfiniBand device. From time
> to time output is sprinkled with resets of eth0. The system does not react
> to alt-sysreq-t (it did it before)- but the various status messages indicate
> the system is still running!****
>
> ** **
>
> If I try it without Lustre running I simply get****
>
> ** **
>
> ****
>
> ** **
>
> but again it does not react to sysrq-t.****
>
> ** **
>
> _______________________________________________
> ewg mailing list
> ewg at lists.openfabrics.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110824/eb76b3b4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 187758 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110824/eb76b3b4/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 120832 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110824/eb76b3b4/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 157764 bytes
Desc: not available
URL: <http://lists.openfabrics.org/pipermail/ewg/attachments/20110824/eb76b3b4/attachment-0002.png>


More information about the ewg mailing list