Although we hope you never need to restore Operations Manager from a catastrophic
failure, you must be prepared for the possibility that this could happen. You should have a
well-documented recovery plan that would work for every conceivable type of disaster
that could occur, from hardware failures to a total datacenter loss. Essentially, you want to
be able to get OpsMgr up and running with minimal data loss.
Your plan should assume the worst but be able to concisely and efficiently restore
Operations Manager at a minimum to the last backup of your databases. You need to not
only develop a detailed plan for the various contingencies, but should also practice the
various scenarios in a development environment until you (and others on your staff for
when you are not available) are comfortable with the process.
There are at two potential scenarios for disaster recovery, discussed in the next sections.
Recovering from a Total Loss
What would it take to recover OpsMgr assuming a "total loss?" Assume the following
scenario:
- The Operational database is installed on the RMS.
- The management server is monitoring 200 agent-managed systems.
- There is only one management server in our management group.
- The Web console is installed.
- OpsMgr Reporting and ACS are not installed.
Although this is a very simple implementation of Operations Manager, it is intended to
show you the steps necessary to recover OpsMgr from a complete hardware failure of the
management server. We will assume that our server team has already built a new server
using the same NetBIOS name in the same domain, installed SQL Server 2005, and
enabled IIS because we will use the OpsMgr 2007 Web console. The appropriate level of
service packs and security patches are applied—be sure to be at the same level of software
maintenance that you had with your original system. We are ready to recover Operations
Manager.
At a general level, here are the steps involved:
- Install Operations Manager 2007 from the installation media—selecting the option
for a typical installation and using the same management group name as the original
install. Remember that the group name is case sensitive. Specify the same
accounts (SDK and Config service, Management Server Action account) as used by
your original installation.
This type of information should be documented as part of your disaster recovery
planning. Detailed steps on installing OpsMgr can be found in Chapter 6, "Installing
Operations Manager 2007."
- After Operations Manager is installed, immediately stop the SDK service to prevent
the RMS from sending data to the Operational database. This prevents OpsMgr from
writing data to this database, which you will be overlaying as part of your recovery
process. Because any data written to this new database will be lost, immediately
really means immediately!
- Install any additional hotfixes previously installed with your original installation.
- Delete the OperationsManager database created from your OpsMgr installation in
step 1.
- Restore the latest OperationsManager database created from your SQL backup.
- Restore the RMS encryption keys.
- Import any additional management packs that were loaded to your old management
server or changed and backed up after your last Operational database backup.
- Install the Web console.
- Start the SDK service. Operations Manager will now be functional.
These steps constitute a high-level process for recovering Operations Manager. Your
actual plan should contain greater detail, including specific hard drive configurations, the
exact installation options, the SQL steps necessary to delete and restore the databases, and
so forth.
Using Log Shipping
Another approach for disaster recovery is to implement log shipping. As we discuss in
Chapter 10, log shipping automates the process of backing up database transaction logs
and storing them on a standby server. This process keeps your production and standby
SQL Server systems in synch. Figure 12.23 illustrates a sample disaster recovery solution
that includes log shipping for the Operational and Data Warehouse databases.
FIGURE 12.23 (click to enlarge)
In addition to deploying log shipping, you will need the RMS and SRS encryption keys for
a successful recovery. If you have the OperationsManager database without the RMS key,
you will not be able to restore the management group (unless you have SP 1 installed and
use the NEWKEY option previously introduced in the "Recovering from a RMS Loss"
section of this chapter). The steps to recover from a downed RMS are discussed in the next
section.
Recovering from a Downed RMS
Another potential scenario to discuss is if you only lose one of your OpsMgr servers. In this
example, we will consider the steps to take if you lose the most important component, the
RMS. If your RMS is not available, OpsMgr is not functional. If your RMS is down and you
will not be available to meet your SLAs, you will want to promote an existing management
server to become the RMS, as depicted in Figure 12.24.
FIGURE 12.24

Recovering a downed RMS requires that you have previously backed up the RMS encryption
keys, as we discuss in the "Backing Up the RMS Encryption Keys" section of this
chapter. You would then promote a functional management server to become the RMS,
using the steps we discuss in the "Recovering from a RMS Loss" section. Note that you
cannot move from a non-clustered RMS to a clustered RMS, or vice-versa.
| A Virtualization Plan for Disaster Recovery |
An additional approach for disaster recovery planning is virtualizing your disaster recovery
(D/R) environment. This concept would take backups of the physical drives you
used when installing and configuring Operations Manager, and convert them to virtual
drives.
The advantage of virtual drives is they are hardware independent of the physical environment
they run on, making them easy to bring up in a D/R site. You could create a
D/R management server in place, maintain a copy of the RMS encryption key, and
establish an empty SQL Server(s). If you need to recover your systems, you would
restore database backups, promote the management server to become the RMS, and
connect the systems. This scenario would work in any software environment supporting
virtualization.
Another approach to use virtualization to provide an off-site disaster recovery solution
would be through sending regularly scheduled backups of the virtual hard drives to
the disaster recovery location. In the event of a disaster, the backup copies of the
virtual hard drives are activated and IP address changes made to reflect their new
physical location. |
Inventorying Your OpsMgr Configuration
Part of any successful disaster recovery plan includes understanding your current configuration.
The Operations Manager 2007 Resource Kit includes a tool to assist in taking an
inventory of the components changed on each computer where you install an OpsMgr
component. This tool, Operations Manager Inventory, collects information about
your installation and saves it to a XML-formatted .cab file. Data collected includes
the following:
- Windows Installer logs for Operations Manager 2007
- Registry information for Operations Manager 2007
- Operations Manager 2007 configuration information
- Management packs
- All running processes
- All Windows NT event logs on that system
- The report produced by the Prerequisite Checker when you installed Operations
Manager 2007
The inventory tool (MOMInventory.exe) must be run locally on each computer. The
computer must have Microsoft .NET 3.0 installed. To run the inventory tool, perform the
following steps:
- Open a command prompt (Start -> Run -> and type CMD) and type
MOMInventory.exe.
- A dialog box will appear. You can click either Run Collection or Close. Click Run
Collection.
- In the Save As dialog box, enter a name and location for the .cab file the tool will
create.
- While the tool is running, a status window is open, showing information about the
data that is collected. When the tool is finished, the status window provides the
name and location of the .cab file (see Figure 12.25).
You can also run the tool in "silent" mode. At the command prompt, type the following:
mominventory.exe/silent/cabfile:<drive>:\<folder>\<filename>.cab
Figure 12.26 shows sample content of the .cab file created by executing
MOMInventory.exe.
You can download the OpsMgr 2007 Resource Kit utilities from the System Center
Operations Manager TechCenter at http://go.microsoft.com/fwlink/?LinkId=94593.
FIGURE 12.25

FIGURE 12.26 (click to enlarge)