From time to time, my customers have asked me to help them define a stable computing environment. They want make sure their systems are, for lack of a better term, stable. Of course this makes perfect sense, defining a stable computing environment is important for a number of reasons.
For one, it establishes a benchmark that you can use to measure performance, access, change management, and supportability. Suppose that a contractor is hired to make configuration changes or install a new product that may affect server performance. Having a benchmark that outlines the minimum acceptable levels for a stable environment can prove that the contractor met the criteria and kept the system stable. Without it, the contractor cannot be held accountable should a problem occur that causes system instability. This could cost the company both money and time. Having this benchmark also helps when support calls are logged for repair problems. When admins have objective measurements, it helps them determine the system's stability before they close a ticket.
The problem is that stability means different things to different people under different circumstances. For instance, a Windows environment with a network that consists of a handful of sites in a single city and connected by high-speed fiber links could be considered stable if there were no outages and no breaks in connectivity. Meanwhile, a global network that has many remote offices connected by slow and perhaps unreliable links might be considered stable if it was fully functional 75% of the time.
Unfortunately, there is no one template that can be used to generate a punch list to make sure failure will not occur. There is, however, a set of guidelines and best practices in the Information Technology Infrastructure Library (ITIL) that can be used by any Windows IT manager to define policies, practices and procedures that proactively ensure stability, which ultimately improves change management processes.
And while there are many ITIL standards that should be employed in any IT environment, there are three key areas that you should focus on when defining stability in your organization. They are:
1. Service-level agreements (SLAs)
2. Change management procedures
3. Security policies
While it should be obvious that a stable environment is defined by the service-level agreements and change management regulations for various components in the Windows enterprise, I am continually amazed at the careless attitude many companies have when it comes to dealing with these elements. But the elements are critical for achieving and maintaining stability with the computing environment, so let's take a closer look at them.
1. Service-level agreements
I find that service-level agreements are often seen as documents that executives read to reassure themselves that they are complying with corporate edicts, but they are rarely used to run day-to-day operations. This needs to change, because SLAs actually play a crucial role in defining a stable computing environment.
Generally, an SLA determines the permissible outage for hardware or software levels that must be maintained in order to meet business goals. Not only will it have an impact on how fast a recovery is made after an outage, but it will also affect the costs associated with the recovery.
When establishing an SLA, it's crucial for IT managers to properly define the business needs and to temper the SLA with what is technologically possible and within the budget. In a Windows shop, create a separate SLA for each of the following components:
- Business critical application servers and software
- Domain controllers
- Exchange/email servers and software
- Related storage arrays
2. Change management procedures
Changes such as hardware configuration, patch installation and software version upgrades always have the potential to cause failures, which can result in costly outages. Creating a strict, well-defined change management process will ensure that the changes are planned and tested in an orderly manner and will reduce the risk of failure.
3. Security policies
Until there are security measures in place for administrative rights, security patches, antivirus updates and access policies for passwords and remote access, your Windows computing environment will not be stable. In general, the following points should be included in your policies:
- Establish and maintain user password policies. Also, ensure that passwords are hard to crack, but that they won't generate a lot of help desk tickets due to length, history and age.
- Incorporate remote access protection. Opening your corporate network to the outside world for the sake of convenience is always a risky proposition. But, if you must, use secure technologies such as dual-factor authentication, IPsec encryption and VPNs to prevent attacks from disrupting your business.
- Monitor and maintain security and antivirus updates.
- Identify and use existing technologies and products to ensure that security patches and antivirus updates are applied regularly. Consider using Network Access Protection (NAP) methods to prevent client workstations and servers from harming the environment because security updates are outdated.
There are of course other factors to consider in order to bring stability to your Windows environment, such as involving key stakeholders and defining enforcement policies. Just make sure these factors are technically feasible, affordable and truly meet your business needs. If you need help, review these ITIL recommendations; doing so will help improve change management procedures.
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.