Disaster recovery planning for Active Directory

In part one of this tutorial, learn how creating an Active Directory replication lag site minimizes the chances of an Active Directory disaster.

Preventing Active Directory failures should be a key component of any disaster recovery plan. There are steps every...

Windows shop can take to reduce the chances of an AD disaster. The best way to minimize downtime is to have a proactive plan in place.

Need to restore a single domain controller? Want to prevent the accidental bulk deletion of objects? Microsoft MVP Gary Olsen offers his advice on how to plan for the worst and what to do to get your Active Directory up and running again.

Part 1: How creating an Active Directory replication lag site minimizes disasters

It is a good idea to have a disaster recovery plan for major catastrophes, but there are a number of actions you can take to prevent disaster -- or at least minimize the chances of an Active Directory disaster such as the accidental bulk deletion of objects.

One of those actions is to create a replication lag site. Very simply, the lag site is an Active Directory site that is intentionally a few days to a week behind the rest of the domain. Of course, there are some gotchas when doing this, which we'll discuss shortly, but the lag site basically preserves a live backup of the Active Directory.

You create a lag site by putting a domain controller from the hub site into its own site (we'll call it the disaster recovery site) with a site link to the hub site. Configure the hub-disaster recovery site link for a replication frequency of 96 hours. That means that the disaster recovery site domain controller's copy of the Active Directory will be 96 hours behind the rest of the forest.

Now, remember that administrator who -- mistakenly, of course -- recently deleted an organizational unit (OU) with 10,000 users? Your only alternative is to do an authoritative restore (and hope your backup media is valid). That means you have to perform the following authoritative restore process:

  1. Unplug the domain controller that has the authoritative copy of the Active Directory from the network.

  2. Get the appropriate system state backup tape that you made before the deletion.

  3. Make sure the tape is valid and that it is no older than the TombstoneLifetime (60 days by default).

  4. Boot the restore domain controller into Directory Service Restore Mode (DSRM).

  5. Do a system state restore to this domain controller. Note that you have to do this twice to get the groups and users restored properly. This is not trivial.

  6. Plug the domain controller into the network.

  7. Replication will force the Active Directory objects from the restored domain controller to the other domain controllers in the network.

Note: Refer to Microsoft's KB 241594: How to perform an authoritative restore to a domain controller in Windows 2000 and KB 280079: Authoritative restore of groups can result in inconsistent membership information across domain controllers for more details on authoritative restore.

With the lag site, however, you now have a domain controller that has a copy of the Active Directory before the deletion took place (assuming you noticed it within four days of the occurrence). Let's say you discovered that an administrator mistakenly deleted 10,000 accounts yesterday. You can go to the domain controller in the lag site, which still has a copy of the Active Directory before the deletion and perform an authoritative restore using that domain controller's copy of the Active Directory, and push it out. Again, this depends on when the lag site replicates and when the deletion took place. If replication takes place on Monday and Friday, and the deletion happens Thursday night, then you have a small window of opportunity.

Get control of the gotchas

It is important that you take steps to prevent authentication from the lag site domain controllers since it has security data (accounts, passwords, locked accounts, group membership, etc.) that is a week old. You can accomplish this by defining a site policy for the lag site and defining the "DCLocator DNS Records Not Registered by the DCs" setting. The Mnemonics field is described in the Explain tab. You need to include all of the Mnemonics except CNAME record (needed for replication). The Explain tab is a bit confusing, but it's a space-delimited list as shown in Figure 1. The Mnemonics themselves are listed in the left column on the Explain tab.

a space-delimited list in a Active Directory replication lag site
Figure 1: A space-delimited list in an Active Directory replication lag site

The minimum configuration to implement an Active Directory lag site is to have a single site with at least one domain controller from each domain in the site. The preferred configuration is to have two domain controllers from each domain in the site. Set their replication frequency for 168 hours (seven days) and stagger the schedule so they replicate every 3.5 days. Thus, you have two old copies to choose from, mitigating the problem just noted.

You can also use a Virtual Server as the lag site domain controllers to save hardware costs.

If you have a multiple (parent/child) domain structure, then you have a lot of unseen problems. When you attempt a restore on one domain, it will fail to restore cross-domain group memberships. Hewlett-Packard Co. was the first to discover this problem, and the company developed a tool called Active Directory Link Replication Manager (ADLRM) that stores these links in a SQL database and restores them quite nicely. The tool also can store and restore individual attributes. For instance, if you have an HR application that modifies certain user attributes, and you need to restore the attribute to the pre-modified value, ADLRM can do that without requiring a full-scale authoritative restore.

Proceed to Part 2 on how to build redundancy in Active Directory replication.


Disaster Recovery Planning for Active Directory
 Part 1: How creating an AD replication lag site minimizes disasters
 Part 2: How to build redundancy in Active Directory replication
 Part 3: How to restore a domain controller from backup in AD
 Part 4: How to use Install from Media to restore a domain controller

Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He wrote Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Olsen is a Microsoft MVP for Windows Server-File Systems.

Dig Deeper on Enterprise infrastructure management