Exploring the Windows Server 2003 Resource Kit: Clusterrecovery.exe

The free Clusterrecovery.exe tool from Microsoft helps restore server cluster resources, replace failed disks, recover from disk signature changes and migrate data to a different disk on the shared bus.

In this segment of my in-depth review of the Microsoft Windows Server 2003 Resource Kit tools, we will be diving into the next alphabetically listed tools -- Clusterrecovery.exe.

The common theme of the Clusterrecovery.exe tool is cluster support. If you manage a cluster, then you need to take note of these free tools. They might just make your clustering world a little easier.

First, let's review some basics. Per Microsoft: "A server cluster is a group of independent computer systems, known as nodes, that work together to run a common set of applications and to provide a single system image to the clients and to the application. The computers are physically connected by cables and programmatically connected by clustering software. The nodes in a cluster remain in constant communication through the exchange of periodic messages, called heartbeats. If one of the nodes becomes unavailable as a result of failure or maintenance, another node immediately begins providing service (a process known as failover)."

Clusterrecovery.exe: Server Cluster Recovery utility -- This tool is primarily used to restore cluster resources (cluster IP addresses, cluster names, hard disks, Microsoft Distributed Transaction Coordinator and so on) checkpoint files, replace a failed disk, recover from disk signature changes and for migrating data to a different disk on the shared bus. Clusters use a shared storage infrastructure that is visible from multiple servers (nodes), although only one node in a cluster can access any given disk at any point in time. In the event of the disk failure, special care must be taken to restore the data and recover the applications. The Server Cluster Recovery utility was designed to do just that.

Requirements: Clusterrecovery.exe: Server Cluster Recovery utility -- You can install the Server Cluster Recovery utility on computers running Windows Server 2003 and Windows XP Professional. The Server Cluster Recovery utility can target server clusters running Windows Server 2003, Enterprise Edition, Windows Server 2003, Datacenter Edition, Windows 2000 Advanced Server and Windows 2000 Datacenter Server.

Installation: This program can be run on any system that has the reskit (Windows 2003 Resource Kit tools) installed or you can copy the .exe file to any directory and then run it from that location.

You can use the recovery tool locally or remotely. After starting the tool, a wizard appears:

  1. Navigate to C:\Program Files\Windows Resource Kits\Tools
  2. Double click on Clusterrecovery.exe.
  3. Enter in the name of the cluster and choose your recovery option: Replace a physical disk resource or Restore cluster resource checkpoints.

Replace a physical disk resource:
The server cluster physical disk resource uses the disk signature to identify a disk and to map the real device to a physical disk resource instance. When a physical disk fails and is replaced, or when a physical disk is re-formatted with a low-level format (may be required if the I/O subsystem information on the disk becomes corrupt for any reason), the signature of the newly formatted disk no longer matches the signature stored by the physical disk resource.

There are other reasons that the disk signature may change. For example, a boot sector virus or a malfunctioning multi-path device driver can cause the signature to be rewritten (see kb article 293778). In all of these cases, the physical disk resource cannot be brought online and action is required to get the applications using that disk up and running again.

The cluster recovery utility allows a new disk, managed by a new physical disk resource, to be substituted in the resource dependency tree and for the old disk resource (which now no longer has a disk associated with it) to be removed.

To replace a failed disk use the following procedure:

  1. Add a new disk drive to the cluster, taking note to make sure that it is only visible to one node.

  3. Make sure the new disk is only visible to one node in the cluster.
  4. Create a new physical disk resource for the new disk drive using Cluster Administrator.
  5. Verify that the new disk drive is visible to the same set of nodes as the failed disk drive was.
  6. After selecting to replace a failed disk resource in the Server Cluster Recovery utility wizard, set the old physical disk resource as the drive letter from the old drive. Then choose the new physical drive you just installed. Once you have validated that the new resource is correctly installed, delete the old physical disk resource as it no longer represents a real resource on the cluster.
  7. Once the cluster is configured, restore the application data to the new disk drive.

A server cluster has a special disk known as the quorum disk. This disk is used to store cluster configuration and such things as resource checkpoint files. The quorum disk needs additional work to recover in the event of a failure and we will cover that separately.

Restore cluster resource checkpoints:
Per Microsoft: "Many applications and other resources store data in registry keys outside of the cluster database. Resource checkpointing is the process of associating a resource with one or more registry keys so that when the resource is moved to a new node (during failover, for example), the required keys are propagated to the local registry on the new node. This allows an application to store configuration data in the registry and have an up-to-date version of that data available, irrespective of where the application is hosted in the cluster."

The cluster stores up-to-date checkpoint files on the quorum disk (the disk that every cluster can access) so if you have a cluster node that fails, you can move an application over to a new node, and then run this utility to move over the necessary registry files to recreate the running environment.

  1. After selecting the option to restore the checkpoints, choose Next.
  2. Typically, you would be choosing to restore All Resource checkpoints (default), but if you know the exact cluster resource checkpoint file you wish to restore, you can be specific.

If the quorum disk fails, you will lose all checkpoint files, making it more difficult to move a failed nodes application to a new node. You will need to use the Cluster Recovery utility to re-create the checkpoint files after the disk is replaced.

Microsoft Windows Server 2003 Resource Kit tools help administrators streamline management tasks such as troubleshooting operating system issues, managing Active Directory, configuring networking and security features and automating application deployment.

About the author:Tim Fenner (MCSE, MCSA: Messaging, Network+ and A+) is a senior systems administrator who oversees a Microsoft Windows, Exchange and Office environment. He is also an independent consultant who specializes in the design, implementation and management of Windows networks.

Dig Deeper on Windows Server troubleshooting