[ofw] [PATCH 1/4] docs/install: document an installation method
Fab Tillier
ftillier at microsoft.com
Wed Nov 4 23:25:09 PST 2009
Hi Sean
Sean Hefty wrote on Wed, 4 Nov 2009 at 23:21:10
>> Oh, and lastly, it may be useful to have a means of rebooting the
>> cluster into safe mode to recover things if the driver update caused a
>> distributed BSOD...
>
> Any ideas for this?
You need some sort of KVM. If you don't have that, redeploying the cluster should take ~30 minutes. Advantage here is you get a clean cluster. If you do crash, often some kernel interface was likely changed without its version number being incremented.
Once you have good working IB drivers you can add them to the deployment image so you can start with a known-good configuration. Then you can reimage you cluster whenever you leave the office at night, and come back in the morning to a nice clean fresh cluster, ready for more destruction over the course of the day.
-Fab
More information about the ofw
mailing list