Now that you've charted a Windows disaster recovery plan, your next task is to figure out your methodology for...
actually creating the plan. The easiest way to create your disaster recovery document is to start with a framework and then address the details.
Use the following four-step methodology. It will you help you overcome the initial shock of a blank piece of paper:
Scope statement: Simple overview of the purpose of the document
Process layering: Business process / Applications / IT infrastructure
Interrogate: Define the additional details needed to continue / restore applications or systems
- Identify key contacts
The scope statement should be a very simple statement, no more than two sentences, of what you hope to accomplish. I would caution against being so general in your statement that it is meaningless. For example, if you create a disaster recovery plan scope statement that says: "restore all applications/systems for the company," nobody will know what that means and it will be almost impossible to keep it updated. A better approach is to create smaller more focused plans. So, begin by developing a disaster recovery plan scope statement that says something like one of the following:
The scope of this plan is to define the recovery steps in the event of a domain controller failure.
The scope of this plan is to recover from an Active Directory forest or domain failure.
The scope of this plan is to recover from key hardware/software failures that affect the Exchange Server (i.e., storage failure, OS failure, etc.).
The scope of this plan is to recover from accidental or malicious domain admin events (i.e., the deletion of an object -- OU, domain or forest).
- The scope of this plan is to recover from a loss of SQL Server 2005 Integration and Analysis Services.
If you find that this document is getting bigger than you want it to, come back and re-evaluate the scope. You may need to redefine it into two or three smaller documents.
This is the heart of your disaster recovery plan and should include as many pictures, data flows and diagrams as possible. As you do this, you will quickly realize that there are areas and complexities that you didn't originally consider. Look back at the scope statement -- if these additional processes and systems don't directly relate to your scope, leave them out.
To help identify the business critical systems, evaluate the business processes. The business process will help determine which IT applications and infrastructure must be restored to allow that business process to operate. I have created an example of what a process layering diagram might look like:
The key to the process layering diagram is in building a complete high-level picture of the scope of the Windows disaster recovery plan. The completion of the layering diagram leads directly to the interrogation phase.
In the interrogation stage of the disaster recovery plan, you begin to ask questions, fill in the gaps and ultimately gather the knowledge points that currently reside with the key players in your organization.
Building the layering diagram will inevitably spark questions and doubts. Can the whole process really be described in four simple steps? The answer is no. It doesn't matter how good of an IT manager you are -- you don't know everything about the process. You will always find that someone is doing something every day to keep things running. I had an employee who would manually process files every few days that had an error from the daily file processing. It wasn't until he went on vacation and someone asked what happened when the expected did not occur that I realized he was manually doing this and not the application.
Ask the key employees responsible for each layer questions about that layer and document their responses. If you aren't sure where to start, you can always revert to this fifth-grade education tip: Ask who, what, where, when, why and how.
Continuing with the example:
- Who sends files?
- What are the possible file types?
- How frequently?
- Are there SLAs with file processing?
- Who monitors exceptions? How do they process them?
- How does the application initiate? If on a schedule, what controls the schedule?
- Who supports the application?
- Where are the source files? Can you recompile the application?
- What applications do users run and what are the procedures for changing the password?
- Where are the servers located? (You'd be surprised how many times the "server" is a developer's workstation that has gone unnoticed.)
- Are the servers backed up? If so, when? How? And when was the last time backups were tested?
- Who has administrator rights, source code access, etc.?
Don't forget the importance of an updated and complete phone tree. Key personnel supporting each box within each layer should be reachable in the event of an emergency. Don't forget to include contacts for the business layer -- they will ultimately have to sign off that any recovery efforts were successful.
A scope statement, process layering, interrogation and a contact list become the skeleton of your DRP document, and you can use the document for more than just a DRP plan to have on file. These steps will help overcome any writer's block that comes with the seemingly overwhelming task of creating a document such as a disaster recovery plan. Keep in mind this is a living document and needs to be nourished.
Russell Olsen is currently the CIO of a Medical Data Mining company and previously worked for a Big Four accounting firm performing technology risk assessments. He co-authored the research paper "A comparison of Windows 2000 and Red Hat as network service providers." Russell is a CISA, GSNA, and MCP.
Part 3: The disaster recovery execution methodology