When (and when not) to use Windows server failover clustering

Confused about the benefits of adding Windows Server 2008 failover clustering to your environment? You're not alone. There's a lot of confusion in IT today about when and where clustering fits as a solution for improving service reliability. Server clusters are implemented all the time in IT organizations, but sometimes they're not added to the environment for the right reasons.

First and foremost, adding Microsoft clustering to an existing service can significantly increase the cost of supporting that solution. This is obvious when you consider factors such as clustering's shared storage requirements, added cabling and networks and more expensive editions of Microsoft Windows. Above all, the extra switches, dials and knobs that clustering adds to managing a hosted service at the same time creates a more complex environment.

Yet there are still some specific situations in which

Requires Free Membership to View

clustering can assist your uptime. In making any clustering decision, consider what the right reasons are to take advantage of its improved availability features. Remember, the added complexity won't necessarily outweigh the benefits.

Reason #1: Clustering reduces the impact of hardware outages

When a server motherboard fails, that server usually goes down hard. Such hardware failures often result in a long-term outage of the hosted service due to the time delay in acquiring and replacing the failed part. If maintenance agreements are in place with traditional server-class vendors, it could mean a half to full day of downtime. If no agreements are in place, that time could be significantly longer. For highly-critical services, long delays like these are unacceptable.

In making any clustering decision, consider what the right reasons are to take advantage of its improved availability features.

Failover clustering provides a location for a service to automatically re-host itself when a failure occurs, which takes away the urgency of obtaining and installing the failed part. A clustered service incurs an outage of only a few seconds or minutes rather than hours or days.

Still, there's reason for caution when implementing server clustering for this purpose alone. These days, server-class hardware comes equipped with multiple levels of redundancy. Hard drives are RAIDed, network cards are teamed; some servers even incorporate redundancy within the internal components as well. All of these reasons reduce the likelihood that a component failure will lead to the catastrophic loss of an entire system, which means you may already have the redundancy you need built into your server hardware.

Reason #2: Clustering takes the pain out of software problems

Using Microsoft Windows to host a service involves more than just processing the needs of the service. Windows alone has all kinds of moving parts, and most environments add more software to servers for things like backups, systems management, monitoring and remote control. All of these software packages at some point can conflict in a way that causes the server to stop processing your critical service.

When this occurs, server clustering can relocate the service to another node where problems do not exist. Relocation gives the administrator precious time to fix that software conflict without the added strain of a critical service failure. The result leads to better fixes and fewer "band-aids."

And yet this reason only holds true for situations where "other" software is causing the problem. In situations where your critical service is the problem, clustering's added machinery can in some cases make the troubleshooting and resolution process more difficult.

Reason #3: Clustering makes OS patching less painful

Every month Microsoft releases yet another round of patches for its products. Ranging from low priority to exceptionally critical, these patches need to be installed to host machines as soon as operationally possible. The problem with patches is that many require a reboot of the system to be fully installed. That reboot impacts the uptime of the hosted service.

More on Windows server cluster management

Take control of server clusters with Microsoft's ClusDiag tool

Microsoft tool simplifies Windows server cluster configuration

Backing up and restoring server cluster nodes

Adding clustering to the mix enables an IT environment to relocate the service to another cluster node prior to patching, allowing the patch install and subsequent reboot to occur without affecting the service. Once you're complete with the first node, you can then relocate the service and continue patching without impact.

However, once again this reason may not be enough. One of Microsoft's improvements with the release of Windows Server 2008 is a reconfiguration of patches themselves; fewer of them actually require a reboot to complete. Also, at times the hosted service itself requires patching, and patching a hosted service often requires a reboot, which means downtime anyway. Your mileage will vary.

Reason #4: Clustering can be a form of disaster recovery

Using traditional failover clustering, cluster nodes must be directly attached to some form of shared storage. This storage is used for quorum information as well as the storage of data that is processed by nodes of the cluster. As such, the physical positioning of each cluster node is limited by the length of the cabling that separates the node from its storage.

Traditional clusters require this direct connection to centralized shared storage for all cluster hosts, which means a disaster that impacts one node is likely to impact others. As an alternative, Windows Server 2008 includes enhanced support for geographically-dispersed clusters, also called stretch clusters or geo-clusters.

These special clusters enable the "stretching" of cluster nodes across great distances. However, they also involve extra cost in network connectivity, added storage and usually third-party data replication between sites. In addition, they can add a significant level of complexity to existing services, which means they're best reserved for only the most critical of services.

So the moral of today's story is to be conscious of both pros and cons when considering whether to add failover clustering to an existing Windows service. While often (and incorrectly) failover clustering is assigned "magic bullet" status for preventing large swaths of possible outages, its design is tailored to protect against only a specific few.

Greg Shields, MVP, is an independent author and consultant based in Denver with many years of IT architecture and enterprise administration experience. He is an IT trainer and speaker on such IT topics as Microsoft administration, systems management and monitoring, and virtualization. His recent book Windows Server 2008: What's New/What's Changed is available from Sapien Press.

This was first published in August 2008

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.