The strategic aspect of terminal server-based computing can get very complex very fast, especially when you're looking at it from a change management perspective. In this article, we'll look at ways to create a predictable server farm and the processes you may need to implement to create a successful change management strategy.
Server-based computing takes complexity away from the desktop and puts it into the server farm. However, that's both good and bad news. The desktop is simple, but a farm of terminal servers, with or without Citrix, is about as complex as a server farm gets. Some of these complications include:
The servers are running applications that generally were not designed to run on a shared computer (and sometimes not designed to be run on a computer running more than one application at a time). Most applications require some sort of post-installation configuration.
The servers are used by dozens of people at the same time.
Some poorly designed applications require that users have administrative privileges.
Citrix and Terminal Services (and the bits of the OS that affect terminal server performance) have hundreds of individual settings.
The terminal servers may need to support device drivers for devices that the system administrator does not control.
The basic problem with server-based computing is that for a load-balanced server farm to work properly (behaving exactly the same way regardless of which server a user connects to), the servers must be consistent. Unfortunately, keeping servers consistent is no easy task. What happens most often is that either no one touches the servers once they're set up, or they slowly creep out of sync with each other as ad-hoc adjustments are made. This complicates server updates since you can't know the exact configuration of each server. It also creates a less consistent user environment and vastly complicates troubleshooting. Most of the time it takes to fix a problem is spent investigating the root of the issue.
Structured change management practices and the server farm
If having all the servers in the server farm acting alike sounds like a good idea -- a good idea that you haven't yet realized -- then some structured change management practices might help.
Your first step is to analyze current practices and perform a gap analysis to find the distance between where you are and where you'd like to be. A very structured and detailed approach to gap analysis is the most useful, as you're better off with specific recommendations than vague guidelines. One free description of change management best practices is available from AuditNet.org, but I'd look into scoring sheets that can help you see where your efforts are passing or failing best practices.
With gap analysis complete, you can probably build a decent case for making practice and policy changes. You can now take a structured approach to introducing any new practices. If you follow the guidelines in the Visible Ops Handbook, this involves four discrete steps:
Step 1: Create structure.
Freeze change outside set maintenance windows and provide change information to first responders. The goal of this step is to reduce unplanned work from the common 35% to 65% of total work to 25%. It's accomplished by tightening control around changes so that unauthorized changes don't happen, and authorized changes only happen at approved times. This is one of the toughest steps culturally. Many people feel that their abilities to troubleshoot on the fly define their abilities, but over time and over multiple servers, "on the fly" is not a reliable way to update servers.
Step 2: Identify fragile parts of the structure -- with one of the most fragile being your terminal servers.
The goal of this step is to create a detailed inventory of hardware and software assets and map them to business services. It's crucial that senior staff members perform this step -- those who really know the terminal servers, their applications and their peculiarities. This inventory needs to show complete details about why things are done in a certain way, not just a list of hardware and installed software. During this stage, you shouldn't add new assets or change existing ones if avoidable -- if you must, be sure that the changes are reflected in the inventory.
By this time, you've locked down change to certain maintenance windows -- and only authorized changes, and you have found all the pieces of your infrastructure.
Step 3: Make it cheaper to rebuild than to repair.
Again, detective work takes up the most troubleshooting time, and the amount of troubleshooting time is unpredictable. It might take five minutes to fix a server or it might take five days. Your goal at this step is to create a build library that makes it possible to rebuild assets -- even the fragile ones -- in a predictable interval that's shorter than most troubleshooting times. If a server is broken and the time required to rebuild it is two hours, then that time is at least known and that allows you to plan in a way that an unknown period of downtime does not.
Step 4: Create metrics to ensure that steps 1-3 are followed and improved.
Creating metrics is actually an ongoing procedure if you're doing it properly. As the saying goes, "If you can't document it, you don't know it." Therefore, at each stage of implementing change management, you'll want to be documenting procedures: how you plan to lock down change, what the assigned windows are, how much downtime you started with, how long fixes take, how hardware and software assets are mapped to the business practices so they're properly prioritized and so forth. Step 4 is where you ensure that you're getting recognizable metrics out of the process, such as:
Unplanned downtime each month/week/quarter
Approved changes implemented
User help desk calls
At this point, you should have pretty good control over the network and will be in a position to increase that control and improve your metrics.
Do you need ITIL?
If you read anything about change management these days, chances are you've seen something about the IT Infrastructure Library (ITIL). Do you need it? It can sure feel like it -- the impression you're left with is that, if you implement ITIL, not only will your server farm be error-free, but that nagging back pain will go away as well.
It's not that simple. The basic idea is that, if your company requires ISO 20000 certification, then it will have to pass operational audits to ensure that rigorous change management policies are followed. To pass these audits, you'll need to follow the ITIL guidelines and have people on hand who've passed the ITIL certifications. You'll also need documentation of your four stages -- the auditors will need it to complete the audit.
Even for those who don't have to worry about ISO 20000, thinking in terms of ITIL can be helpful. You may already be following good change management practices, but looking at the ITIL best practices can help you find the places where there's room for improvement or give you ideas for improving your current practices. So no, you don't necessarily have to follow ITIL guidelines (or COBIT, for another example), but familiarity with those guidelines will help you improve your change management practices without reinventing the wheel.
Make the most of terminal servers and change management
Change management is important for any server farm or data center, but the complex nature of terminal servers makes them especially vulnerable to change-introduced downtime. By doing a critical gap analysis to find out exactly how your current change management policies work (and where they don't) and then incrementally locking down your environment to make it both more stable and more reproducible, you can decrease the time spent troubleshooting. In addition, you will make your server farms more predictable while still introducing change as it's necessary.
Christa Anderson, visionapp's regional director for North America, is an internationally known speaker and writer about server-based computing. Her books include Windows Terminal Services (Sybex, 2002), The Definitive Guide To MetaFrame XP (available from www.realtimepublishers.com) and she co-authored the best-selling Mastering Windows Server 2003 (Sybex, 2006).