A Disaster Recovery Plan is a comprehensive, well thought out, and tested process document that maps out how to respond during disaster scenarios including events such as fires, floods, earthquakes or other disasters. The plan also addresses more mundane situations such as equipment failure, human error, theft, or other localized events.
Generally, a Disaster Recovery Plan (DRP) focuses on Information Technology (IT) systems while a Business Continuity Plan (BCP) is a more generalized plan to keep all parts of the business running during disruptive events. A DRP for even a small business can run 50 or more pages, and reads like a step by step instruction manual on how to recover each critical application, service, and piece of equipment in case of failure or disaster.
Key Disaster Recovery Metrics
There are two key metrics in DR plans you should become familiar with.
- Recovery Point Objective (RPO) – the maximum acceptable data loss before an event, usually measured in hours.
- Recovery Time Objective (RTO) – the maximum acceptable time a service or application may be unavailable, usually measured in hours.
Suppose that at noon on a Monday a business suffers a devastating fire. An RPO of one hour specifies that all data created one hour before the event (before 11AM), must be safely available at an alternate location. Information (data) created between 11AM and noon may be lost, and is acceptable given an RPO of one hour. The RTO of 24 hours means that the recovery team has until noon on Tuesday to bring the application back online.
Your business must decide on an RPO and RTO for each application, service, or system in your plan, and they must be decided early on in the process. These numbers will drive almost every planning and infrastructure decision later on.
For most small to midsize companies a reasonable RTO and RPO is 24 hours, but it will vary from company to company and application to application. Keep in mind that as RPO and RTO values are set lower, costs rise significantly.
Hypothetical Situation
It is Monday afternoon, and Mary the Engineer, is wrapping up a design project due on Wednesday. The server where her project is stored and manipulated is acting a little slow, but still seems to be working, so she continues on with her work. Another hour passes by and Mary receives a warning message when trying to save her most recent work: “Write Error.”
Mary frantically contacts Mark the IT manager at 1PM, who investigates and finds the server hardware inoperable and must be replaced. Let’s see how this situation is likely to play out without a DRP, and then with a DRP in place.
Without A Disaster Recovery Plan
Luckily, even though Mark does not have a DRP, he did take diligent backups of all data once per day. Even with this preparation, however, Mark must:
- Order a new server from the vendor, and beg for expedited shipping.
- Locate all software required for the replacement server and prepare the recovery data.
- Physically install the server, when it arrives on Wednesday, 36 hours after the event.
- Install and configure the operating system, applications, and load recovered data.
- Test the application and make changes to the configurations that were missed.
After nearly two days waiting for replacement equipment, and one long hard day of work, Mary is finally able return to her project on Thursday morning. Mary is frustrated that her deadline was missed, that she has to work overtime just to submit the project late and explain the situation to the customer. The customer is disappointed that the work is late, and wonders if they should look for a new vendor.
It could have been even worse. This assumes Mark took some reasonable precautions, was available when the problem happened, and actually knew how the server and application were configured originally; these are not safe assumptions in many environments.
This was also a fairly simple problem, just one system was impacted. What if there was a flood or theft of multiple key systems? Without a disaster recovery plan, chances are all data systems would be unavailable for a week or more, if they could ever be recovered.
Having been in that situation a couple of times myself, it’s no wonder that half of all companies with a plan in place executed at least part of it in 2008 [2008 Continuity Insights and KPMG Advisory Services Business Continuity Management Benchmarking Report]. Since half of companies used their DRP over a one year period, your business is almost guaranteed to execute some of your plan over several years, even if a major disaster does not strike.
With A Disaster Recovery Plan In Place
Now assume a disaster recovery plan was created earlier in the year with an RPO of one hour, and an RTO of 24 hours.
Due to the RPO of one hour, the server data was mirrored to another location each hour and Mary is reassured only her most recent save will be lost. Due to the RTO requirement of 24 hours, a spare server was purchased for situations like this.
Mark jumps on the “disaster” and quickly begins configuring the spare server to replace the damaged one. Since the recovery steps are well documented and tested in the Disaster Recovery plan, it takes Mark until just 2PM (1 hour) to get the server up and running and ready for the recovery data. By 3PM all data is restored, Mary is ready to continue on with her work, and the Wednesday deadline is met.
Note that this recovery only took about two hours to complete. That’s because this situation was not catastrophic. The RTO of 24 hours assumes the worst case scenario, such as a total building loss due to a fire, earthquake or flood. In those cases it would have taken closer to 24 hours to recover.
Stuff Happens
Keep in mind that “stuff happens” often, even when there is no life threatening disaster. This video demonstrates just one example that causes major disruptions for computing equipment: water. Even more common are human errors, such as incorrectly configured equipment or software. “Stuff” happens more often than many would like to admit.
Get Started Now
I hope you now feel comfortable with what a disaster recovery plan is and are assured of its value. It’s important to make a commitment to creating your plan now, it is all too easy to put off for another day, and pretty soon it’s too late. Every business needs at least a basic DRP.
A final word of advice: start small. Disaster recovery planning can be very overwhelming, so pick a key location, key application, or key system and make sure that piece of your business is well protected. If the starter project is chosen wisely, it will have clear and immediate value to your organization, and be an easy win when complete (okay, it won’t be that easy, but manageable).
Still need help deciding on what is right for your disaster recovery plan, or need help implementing an archive system or backup strategy? Contact Red Wire Services at (206) 829-8621.
One Response to What Is A Disaster Recovery Plan?