Backup systems need testing
For the London Free Press – August 27, 2007
How reliable are your organization’s backup and emergency recovery plans?
A recent incident in California that shut down several popular websites showed how important it is to test business continuity plans to ensure they will work as planned.
A power outage in San Francisco knocked out electricity to tens of thousands of Pacific Gas and Electric Co. customers, including buildings and businesses with hordes of employees.
The outage affected the data centre 365 Main Inc., which hosts a number of popular websites such as GameSpot, Craigslist, Yelp and Typepad. Those websites were rendered offline for several hours, leaving millions of web surfers frustrated.
That’s despite the fact 365 Main Inc. had backup procedures and generators in place that were intended to prevent that. Unfortunately a problem occurred that their testing had not uncovered.
Backup generators are supposed to start immediately when a power failure occurs. It took about 45 minutes in this instance before power to the data centre was restored by backup generators.
365 Main Inc.’s San Francisco facility has complete backup systems for electrical power to protect against a power loss. 365 Main in a March 2007 press release stated that “in the unlikely event of a cut to a primary power feed, the state-of-the-art electrical system instantly switches to live backup generators to keep the data centre continuously running and shield tenants from costly downtime.”
In a news release issued a few days after the power failure, 365 Main said that three of the 10 backup power generators failed to complete their start sequence. Their investigation discovered a weakness in an essential component of the backup generator system. The problem was with the generator’s electronic controllers, which have since been fixed and tested.
In the wake of the outage, 365 Main’s president and chief executive, Chris Dolan, offered an apology to customers impacted by the incident and stated the concerns had been addressed as a top priority, the core problem identified and steps had been taken to prevent this type of problem from happening again. 365 Main is also honouring its service level agreements with all customers affected by the drop in service.
Despite the malfunction, 365 Main says this facility has delivered 99.9942 per cent power uptime to customers during the last five years, since its inception, inclusive of the July outage.
The event in San Francisco illustrates the importance of testing backup and emergency recovery plans on a regular basis. In addition to emergency generators, this applies to things like data backup, or indeed any element of a business continuity plan.
Forty-five minutes may not seem like much time, but even that amount of time can be significant for an organization relying on an online presence or computing systems to provide their services. The time an organization can tolerate being offline will vary. No matter what that time is, a recovery plan is of little use if it doesn’t work.





Hi David:
Nice posting!
One way of mitigating a risk of disaster is to have an online backup service.
I have been reading about the online backup and storage industry for a while now. It is becoming a commonly accepted technology these days.
For online backup news, information and articles, there is an excellent website:
http://www.BackupReview.info
This site lists more than 400 online backup companies and ranks the top 25 on a monthly basis.
It also features a CEO Spotlight page, where senior management people from the industry are interviewed.
Cheers,
Comment by Jenny — August 27, 2007 @ 12:38 pm