During the course of several years it seems I have had to manage more and more servers. Most recently -- as I decided...
to invest some time in the advanced hosting arena -- I have become responsible for over 2000 servers. I don't claim to manage all of the systems on my own. Quite to the contrary, there are a hundred or so employees managing day-to-day operations, build processes, respond to critical issues and answer customer requests. However, that hundred would be thousands if we hadn't mastered a few things along the way. So let me impart a few of the key lessons I've learned for management of the masses.
Lesson #1 – Prepare for knowledge transfer
You spend big bucks on clustered systems, tape backups and all sorts of technical redundancies to protect your data. Yet, you let the knowledge of how all of those systems work together, their maintenance schedules, and their process flows float around in your star employee's head. Knowledge transfer is not something you do when an employee moves on. It is a clearly-defined process and carefully-developed culture created to protect valuable information from walking out the door. Don't wait to develop a clear system for the creation and storage of server, system and process information. In addition, develop policies for maintaining the integrity of this data just like you would for your account database or SAP system. Early on in the development of the hosting company I currently work for, we started spending a large amount of resources on the development of a system that tracks everything about the servers moving in and out of the server farm. Despite how early we started and how well it is currently working, we still wish we had done it sooner.
Lesson #2 – Develop standards
Developing standards for the way servers in your server farm are built makes life much easier. You still need flexibility to adapt to changing technology and customer needs. But when you need to deploy the next security update, service pack, or patch, having all of your system at least close to being the same will ease the deployment.
Lesson #3 – Automate, automate, automate
If you have to do something more than five times – you should have automated it. Microsoft has provided script engines, API sets, WMI, ADSI, and the ever- prolific Visual Basic to allow you to develop all sorts of automated processes. All administrators should have a scripting language under their belt; whether it is PERL, VBScript, Jscript, or Windows Command Line is irrelevant. Personally, I would recommend VB simply for its commonality throughout most of Microsoft's software.
Lesson #4 – Monitoring needs constant improvement
Contrary to popular belief, monitoring system health is not a passive activity. Starting with the right monitoring software is a good first step. But, you will need to develop and improve the systems configuration continually to maximize your payback. You will also want to make preparations for performance statistics collection and archival. Systems that crash are fairly easy to fix. Systems that experience performance problems are more common, are far more difficult to repair, and usually require a lot more data before you can determine the cause of the problem. You must constantly refine how and what you monitor. It is simply not good enough to put a monitoring system in place – you have to work to make sure it monitors the right objects, alerts on the right thresholds, and even offers assistance to engineers when alerts present themselves.
Lesson #5 – Get a knowledge base
Develop a knowledge base for problems and their resolutions. Any type of service-call system or problem-tracking system must have information retrieval and complex searching as its primary functions. Data entry is incidental to the retrieval. Too many companies focus on building up of the data store without realizing it is the reports and queries that are the real goal.
You could say that I learned these lessons from my experience managing large groups of servers. However, the single most valuable lesson I have taken away from my work in advanced hosting is that these five things really matter. Sure, everyone in the industry gives lip service to their importance. What they don't recognize is how soon in a company's development they become important. If your company plans to be around for the long haul, spend the time and resource now or you will pay 10-fold later.
ABOUT THE AUTHOR:
Paul E. Hinsberg
Paul is a consultant, technical trainer and author with over ten years of experience in a wide variety of operating systems and programming languages. MORE ON THIS TOPIC:
Best Web Links: Hardware and Infrastructure Management
Best Web Links: Systems Management