Part 1 | Part 2 | Part 3
Troubleshooting a hung or nonresponsive Windows server can be a challenging endeavor. Simply hitting the reset button is no longer a tolerated option as more companies use these servers for business-critical operations. This three-part series will explore the reasons why a Windows server may hang and provide a cookbook approach to diagnosing the underlying issues with the Windows Kernel Debugger (Windbg).
Background
When Microsoft released the early versions of its server operating system (Windows NT 3.5x and NT4), there was no easy way to troubleshoot a hung server. Other mainstream operating systems, such as Digital Equipment Corp.'s VAX/VMS, offered ways to manually intervene by forcing a crash dump whereby the server's state could be captured at the time of the hang. This dump could then be analyzed to determine why the server hung. The only option for early Windows platforms, however, was to reset the box.
As Windows servers became more predominant in the business world, hitting the reset button became unacceptable. As a result, in Windows 2000 Server and later versions, it became possible to force a crash dump to assist with determining why the server hung. Microsoft introduced this feature in Knowledge Base article 244139. It allows a keystroke combination (right CTRL+SCROLL LOCK twice) to generate a crash dump on PS/2-type keyboards. Microsoft extended this feature in Windows Server 2003 with a hotfix to the Kbdhid.sys driver to accommodate USB-type keyboards.
Several other options now exist to force a crash dump. Microsoft provides the Windows Special Administrative Console (SAC) Crashdump command as part of Windows Emergency Management Services (EMS), which allows for "headless" servers with no local graphical console. Vendor-specific options also exist to force a crash dump including the HP Integrity server's Management Processor TC (trans
To continue reading for free, register below or login
To read more you must become a member of SearchWindowsServer.com
');
// -->

fer of control) command, an NMI (non-maskable interrupt) button on some Integrity models, or the Integrated Lights Out (iLO) virtual NMI button. We'll take a closer look at each of these options later in the series.
Why a server hangs
There are a variety of reasons why a server may hang, including both hardware and software issues. The most common hardware reason for a server hang is spurious interrupts by a failing device. For example, a network interface controller (NIC) may have a bad component or be attached to a bad cable causing false interrupts to occur. These interrupts occur at an elevated interrupt request level (IRQL) dominating the attention of the processor(s), leaving lower priority requests (user level) unanswered. As a result, the server appears to be hung.
Another example of a hardware-induced hang involves storage requests going unanswered. For example, consider a case where a disk drive fails, causing outstanding I/O requests to be queued up. Eventually, these pending requests trigger a cascading effect of user and system threads to hang, leading to a system-wide outage.
More often, however, server hangs are a result of software issues. These issues come in several flavors, including:
The big picture
So, as you can see, there are numerous reasons why a server may hang. To give you a better idea of what happens when you force a crash to generate a memory dump, and subsequently analyze the crash to determine what caused the hang, see Figure 1 below.
Starting on the left-hand side, you can see the server crashes or hangs. In the event of a crash, the server would generate a memory dump if the dumpfile and pagefile are properly configured (see Microsoft Knowledge Base articles 254649, 197379 and 889654).
In the event of a hang, manual intervention would be required to force a crash dump as previously described. In either case, the content of memory is written to the pagefile.sys before the server is rebooted. During the reboot, the pagefile.sys is written to the memory.dmp file. Finally, once the server has rebooted, you can use the Windows Kernel Debugger (Windbg) to analyze the memory dump using a symbol server (as documented in KB article 311503) to translate memory references to meaningful functions and variables.
Figure 1: Overview of memory dump process and analysis
[IMAGE]
Now that you have a better idea of why server hangs occur, the next article in this series will look at the preparation process for troubleshooting a hung Windows server.
[IMAGE]
[IMAGE]TROUBLESHOOTING A HUNG WINDOWS SERVER
[IMAGE]
[IMAGE] Part 1: Why do servers hang?
[IMAGE] Part 2: Preparing to troubleshoot
[IMAGE] Part 3: Resolving the issue
[TABLE]