|This chapter excerpt from System Center Operations Manager 2007 Unleashed, by Kerrie Meyler, Cameron Fuller, John Joyner and Andy Dominey, is printed with permission from Sams Publishing, Copyright 2008.|
Although we hope you never need to restore Operations Manager from a catastrophic failure, you must be prepared for the possibility that this could happen. You should have a well-documented recovery plan that would work for every conceivable type of disaster that could occur, from hardware failures to a total datacenter loss. Essentially, you want to be able to get OpsMgr up and running with minimal data loss.
Your plan should assume the worst but be able to concisely and efficiently restore Operations Manager at a minimum to the last backup of your databases. You need to not only develop a detailed plan for the various contingencies, but should also practice the various scenarios in a development environment until you (and others on your staff for when you are not available) are comfortable with the process.
There are at two potential scenarios for disaster recovery, discussed in the next sections.
Recovering from a Total Loss
What would it take to recover OpsMgr assuming a "total loss?" Assume the following
- The Operational database is installed on the RMS.
- The management server is monitoring 200 agent-managed systems.
- There is only one management server in our management group.
- The Web console is installed.
- OpsMgr Reporting and ACS are not installed.
Although this is a very simple implementation of Operations Manager, it is intended to show you the steps necessary to recover OpsMgr from a complete hardware failure of the management server. We will assume that our server team has already built a new server using the same NetBIOS name in the same domain, installed SQL Server 2005, and enabled IIS because we will use the OpsMgr 2007 Web console. The appropriate level of service packs and security patches are applied—be sure to be at the same level of software maintenance that you had with your original system. We are ready to recover Operations Manager.
At a general level, here are the steps involved:
- Install Operations Manager 2007 from the installation media—selecting the option
for a typical installation and using the same management group name as the original
install. Remember that the group name is case sensitive. Specify the same
accounts (SDK and Config service, Management Server Action account) as used by
your original installation.
This type of information should be documented as part of your disaster recovery planning. Detailed steps on installing OpsMgr can be found in Chapter 6, "Installing Operations Manager 2007."
- After Operations Manager is installed, immediately stop the SDK service to prevent the RMS from sending data to the Operational database. This prevents OpsMgr from writing data to this database, which you will be overlaying as part of your recovery process. Because any data written to this new database will be lost, immediately really means immediately!
- Install any additional hotfixes previously installed with your original installation.
- Delete the OperationsManager database created from your OpsMgr installation in step 1.
- Restore the latest OperationsManager database created from your SQL backup.
- Restore the RMS encryption keys.
- Import any additional management packs that were loaded to your old management server or changed and backed up after your last Operational database backup.
- Install the Web console.
- Start the SDK service. Operations Manager will now be functional.
These steps constitute a high-level process for recovering Operations Manager. Your actual plan should contain greater detail, including specific hard drive configurations, the exact installation options, the SQL steps necessary to delete and restore the databases, and so forth.
Using Log Shipping
Another approach for disaster recovery is to implement log shipping. As we discuss in Chapter 10, log shipping automates the process of backing up database transaction logs and storing them on a standby server. This process keeps your production and standby SQL Server systems in synch. Figure 12.23 illustrates a sample disaster recovery solution that includes log shipping for the Operational and Data Warehouse databases.
In addition to deploying log shipping, you will need the RMS and SRS encryption keys for a successful recovery. If you have the OperationsManager database without the RMS key, you will not be able to restore the management group (unless you have SP 1 installed and use the NEWKEY option previously introduced in the "Recovering from a RMS Loss" section of this chapter). The steps to recover from a downed RMS are discussed in the next section.
Recovering from a Downed RMS
Another potential scenario to discuss is if you only lose one of your OpsMgr servers. In this example, we will consider the steps to take if you lose the most important component, the RMS. If your RMS is not available, OpsMgr is not functional. If your RMS is down and you will not be available to meet your SLAs, you will want to promote an existing management server to become the RMS, as depicted in Figure 12.24.
Recovering a downed RMS requires that you have previously backed up the RMS encryption keys, as we discuss in the "Backing Up the RMS Encryption Keys" section of this chapter. You would then promote a functional management server to become the RMS, using the steps we discuss in the "Recovering from a RMS Loss" section. Note that you cannot move from a non-clustered RMS to a clustered RMS, or vice-versa.
|A Virtualization Plan for Disaster Recovery|
An additional approach for disaster recovery planning is virtualizing your disaster recovery
(D/R) environment. This concept would take backups of the physical drives you
used when installing and configuring Operations Manager, and convert them to virtual
The advantage of virtual drives is they are hardware independent of the physical environment they run on, making them easy to bring up in a D/R site. You could create a D/R management server in place, maintain a copy of the RMS encryption key, and establish an empty SQL Server(s). If you need to recover your systems, you would restore database backups, promote the management server to become the RMS, and connect the systems. This scenario would work in any software environment supporting virtualization.
Another approach to use virtualization to provide an off-site disaster recovery solution would be through sending regularly scheduled backups of the virtual hard drives to the disaster recovery location. In the event of a disaster, the backup copies of the virtual hard drives are activated and IP address changes made to reflect their new physical location.
Inventorying Your OpsMgr Configuration
Part of any successful disaster recovery plan includes understanding your current configuration.
The Operations Manager 2007 Resource Kit includes a tool to assist in taking an
inventory of the components changed on each computer where you install an OpsMgr
component. This tool, Operations Manager Inventory, collects information about
your installation and saves it to a XML-formatted .cab file. Data collected includes
- Windows Installer logs for Operations Manager 2007
- Registry information for Operations Manager 2007
- Operations Manager 2007 configuration information
- Management packs
- All running processes
- All Windows NT event logs on that system
- The report produced by the Prerequisite Checker when you installed Operations Manager 2007
The inventory tool (MOMInventory.exe) must be run locally on each computer. The
computer must have Microsoft .NET 3.0 installed. To run the inventory tool, perform the
- Open a command prompt (Start -> Run -> and type CMD) and type MOMInventory.exe.
- A dialog box will appear. You can click either Run Collection or Close. Click Run Collection.
- In the Save As dialog box, enter a name and location for the .cab file the tool will create.
- While the tool is running, a status window is open, showing information about the data that is collected. When the tool is finished, the status window provides the name and location of the .cab file (see Figure 12.25).
You can also run the tool in "silent" mode. At the command prompt, type the following:
Figure 12.26 shows sample content of the .cab file created by executing MOMInventory.exe.
You can download the OpsMgr 2007 Resource Kit utilities from the System Center Operations Manager TechCenter at http://go.microsoft.com/fwlink/?LinkId=94593.
This was first published in July 2008