[ofw] [PATCH 1/4] docs/install: document an installation method

Fab Tillier ftillier at microsoft.com
Wed Nov 4 23:25:09 PST 2009


Hi Sean

Sean Hefty wrote on Wed, 4 Nov 2009 at 23:21:10

>> Oh, and lastly, it may be useful to have a means of rebooting the
>> cluster into safe mode to recover things if the driver update caused a
>> distributed BSOD...
> 
> Any ideas for this?

You need some sort of KVM.  If you don't have that, redeploying the cluster should take ~30 minutes.  Advantage here is you get a clean cluster.  If you do crash, often some kernel interface was likely changed without its version number being incremented.

Once you have good working IB drivers you can add them to the deployment image so you can start with a known-good configuration.  Then you can reimage you cluster whenever you leave the office at night, and come back in the morning to a nice clean fresh cluster, ready for more destruction over the course of the day.

-Fab



More information about the ofw mailing list