What to do when your server crashes

Here are some troubleshooting steps you can take as you determine whether the crash occurred at the hardware, operating system or application level.

Please let us know how useful you find this tip by rating it below. Do you have a useful Windows tip, timesaver or workaround to share? Submit it to our tip contest and you could win a prize!

If you are reading this, then I'm assuming that you are probably working through the nightmare of a server crash.

The bad news -- aside from the experience of your server crashing -- is that every crash is different, so I can't address every type of server crash here. The good news is that in the event of a server crash, there are some fundamental things that you can do that cover a broad range of circumstances.

When a server fails, the first two questions you need to ask yourself are: "What are the symptoms of the crash?" and "Has anything on the server changed recently?"

Is hardware or software to blame?
Some of the possible symptoms are:

  • The server will not power up.
  • The server powers up but displays the Blue Screen of Death (BSOD).
  • The server powers up and Windows loads, but some critical services fail to start.

If the system refuses to power up, then it is definitely a hardware-related problem. Most likely, the server's power supply has gone bad.

But it's important to first check the simple stuff, so make sure the system is plugged in and it is receiving power from the electrical outlet. I once drove 600 miles to replace a dead server only to find out that someone had accidentally flipped the switch on the surge protector. Make sure you check things like the surge protector and the UPS before you just assume that the power supply has gone bad.

If the system boots but displays the BSOD, the problem is generally due to a hardware failure or a bad device driver. If you recently installed a new device driver, then that's a pretty good clue that the problem might be driver related.

One way to get to the root of the problem is to try booting the machine into safe mode. If the machine can't even boot into safe mode, then the problem is almost always hardware related. Keep in mind, though, that sometimes a corrupt Windows installation can cause the same problem. Your best bet at this point is to try to decipher the blue screen. Although the blue screen looks like gibberish, it actually contains very useful information. (I recommend you check out an article I wrote for Microsoft at: http://www.microsoft.com/technet/archive/winntas/tips/techrep/bsod.mspx. Even though the article is outdated because it pertains to Windows NT, Windows 2000 and Windows Server 2003 are based on the Windows NT kernel, and most of the blue screen-related errors are still the same as they were in Windows NT.)

If you are able to boot the server into safe mode, then most likely it's a driver-related problem that's triggering the BSOD. The reason why you can boot the system into Safe Mode is because Safe Mode loads Windows using a bare minimum set of drivers and services. Therefore, whatever is causing the problem isn't being loaded when you boot into Safe Mode.

Look for clues in Search event logs
I recommend beginning the diagnostic process by searching the Event Viewer for clues to uncover the problem. If nothing turns up in the event logs, then consider going into the Device Manager and disabling anything that isn't critical to Windows. The idea is to mimic the driver set used by Safe Mode.

Now reboot the machine into normal mode. If Windows boots, then one of the devices that you disabled is causing the problem. Try enabling one device at a time, rebooting between each until the problem comes back. The device that is causing the problem and triggering the BSOD is the one you enabled just prior to seeing the BSOD.

Another type of server crash is one in which Windows boots, but some critical system service fails to start. This type of problem can be difficult to troubleshoot because the troubleshooting steps vary so widely, depending on the service that isn't starting.

Take Microsoft Exchange as an example. If a lower-level service such as the System Attendant fails to start, Exchange could be corrupt or it is having trouble communicating with the Active Directory. In a situation like that, you would verify that nothing is blocking LDAP communications, and you might try reinstalling Exchange Server or reloading the latest service pack. If, on the other hand, a database fails to mount, then the database is probably either corrupt or in an inconsistent state. If that is the case, then there is an entirely different set of troubleshooting steps that you would use.

Server crashes can occur at the hardware, operating system or application level. The troubleshooting steps that you use to diagnose and recover from the crash will vary depending on which level the crash occurred.

Try to fix the problem manually prior to restoring a backup, since restoring a backup almost always leads to at least some data loss.

Editor's Note: You can receive similar hardware tips twice weekly by subscribing to our Windows Systems and Storage newsletter. Sign up now!

Brien M. Posey, MCSE, is a Microsoft MVP for his work with Windows 2000 Server and IIS. He has served as the chief information officer for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. As a freelance technical writer, he has written for Microsoft, CNET, ZDNet, TechTarget, MSD2D, Relevant Technologies and other technology companies. You can visit Brien's personal Web site at http://www.brienposey.com.

Dig Deeper on Windows Server troubleshooting