says John Farmer on his blog, and I couldn’t agree more. Start with a meteor strike scenario for a change, just imagine a giant rock crushing your measly SPOF-ridden infrastructure in one unlucky data center. Waiting for the black swan to appear learn to keep calm and react normally using the tips from a triple post about incidents, outagesand systems maintenance:
There’s nothing quite like a good Single Point of Failure (SPOF) during a holiday dinner.
Simple problems can easily become large complicated problems after a few bad decisions made in haste. Take a breath before continuing. This is especially important with a page at 3AM or if a panicky client is in your office. Tell the client you’ll handle the problem and run through your normal procedure.[…] Remember the prime directive – your job is to restore service as quickly as possible. You are not there to debug interesting problems with your service.