Previously in this series, we looked at some of the reasons why server hangs occur in a network. Now that you have a little background, let's look at the preparation process for resolving the problem using a tool called the Windows Kernel Debugger, or Windbg.
When troubleshooting a hung Windows server, there are several things that need to be done up front to prepare for collecting data. A forced crash dump may only be necessary if other means of troubleshooting prove unsuccessful. The first thing administrators should always do is run MPS Reports to collect event logs and other pertinent information. Close examination of system and application event logs may reveal a pattern of particular entries occurring prior to each hang. If the problem starts with a slow down or performance issue, you should collect Perfmon data as described in Microsoft Knowledge Base article 248345.
Once you determine that a forced crash dump is necessary, update the appropriate registry entries per KB article 244139 or 927069 and reboot the server. Also, ensure you have properly configured the dump file type as previously mentioned in KB article 254649. Finally, be sure that your pagefile.sys is sufficiently sized to accommodate a memory dump and that you have enough free space on the disk where the memory.dmp will be located, per KB article 886429.
Installing Windbg and setting the symbol path
In addition to configuring the server to generate a memory dump, you have to install the Windows Kernel Debugger and establish the symbol path. Do that on a separate server from the one that you are troubleshooting. You can download the Windbg kit free from Microsoft, and the kind of kit you choose depends on the architecture you are installing it on: (x86 or x64/IA64). Each is capable of reading a dump from a different architecture (i.e., 32-bit Windbg can read a 64-bit dump and vice versa).
Once Windbg is installed, be sure to establish the symbol path as documented in KB article 311503. Setting up the symbol path allows the debugger to translate memory references to meaningful functions and variable names. This will allow you to look at a stack trace and determine what routines were executing at the time of the hang.
More server management tips from Bruce
Once you have all this set up, you are ready to analyze a crash dump. Use the appropriate keystrokes, Web GUI, Management Processor TC command or NMI button to initiate the forced crash as previously described. Be sure to allow sufficient time for the contents of memory to write to the pagefile.sys. If you have trouble getting the crash dump created, be aware that there are several reasons why a crash dump may not be captured as expected (see KB article 130536).
Now you are ready to determine the cause of the server hang. In the final part of this series, I'll explain how to use Windbg to analyze a forced crash as a means of resolving the problem.
ABOUT THE AUTHOR
Bruce Mackenzie-Low, MCSE/MCSA, is a systems software engineer with HP providing third-level worldwide support on Microsoft Windows-based products including Clusters and Crash Dump Analysis. With more than 20 years of computing experience at Digital, Compaq and HP, Bruce is a well known resource for resolving highly complex problems involving clusters, SANs, networking and internals.