One of the most common problems any administrator faces involves services that misbehave. Services -- easily exposed in the services.msc snap-in -- control a variety of system and application functions. Fortunately, there are some fundamental troubleshooting methods that can be used when services don't do what they are designed for or cause various forms of system failure.
In order to understand Windows services troubleshooting we need to review some basic characteristics of services. First, services run under the security context of an account. This account provides permissions to perform operations required by the service. Most Windows services will run under the powerful LocalSystem account, which has privileges to do anything. Applications such as Exchange Server, SQL Server, Cluster Server and other third-party applications will usually run under a "service account." This is a user account that has necessary privileges and group associations needed for the service. This account, like any account, has a password that is often controlled by the service and can be used by other applications, such as agents on clients, to communicate with the server application.
Services can also have dependencies, meaning a service cannot start until other services are started. Likewise, a service can have others dependant on it. Figure 1 shows the dependency list for the Exchange System Attendant service.
Finally, services may be started in different ways in addition to being disabled:
- Automatic: Service is started on boot of the computer.
- Manual: Service is started manually . In the service snap-in, right-click on the service and select start.
Let's look at some common problems and how we can use this information to troubleshoot them.
Problem: Application fails, reporting an authentication error, access denied or the like.
This is typically an issue when the password associated with the service does not match the password on the service account found in the Active Directory Users and Computers snap-in. Often you can manually change the password on the account and then manually change the service password as shown in Figure 2. This particular service has as an account named ASMUser. Many accounts run under LocalSystem. However, some applications may sync the password, in which case you won't be able to control it that way. This may require re-creation of the user account or reinstallation of the application or agent.
Problem: Service that is set to Automatic doesn't start on boot.
After logon you can start the service manually. The important point to note here is that the service does start, but the automatic start isn't working. In this case it is likely a dependency issue. Very simply, if ServiceA is dependant on ServiceB, then ServiceB must start before ServiceA. If ServiceB doesn't start, then ServiceA won't start.
To troubleshoot this situation you need to determine why the dependant services either didn't start or didn't start in time for the service to start. One way to troubleshoot this and prove it is a dependency problem is to give the dependant services plenty of time to start during the boot process. To do this, find a service that is one of the last to start before boot completes and add it to the dependency list of the problem service. A future article will provide more details on how to find a service to use.
Problem: System experiences a sudden reboot, hang or crash.
New application installations, hotfixes, driver updates and antivirus updates are common causes of these issues. Of course the first step in troubleshooting this problem is to first boot to Safe Mode with Networking. If that doesn't work, then boot to Safe Mode.
In one instance, a server suddenly started rebooting during initial logon. We found booting to Safe Mode with Networking worked. We further found that we could log on with a local or domain account, so we knew domain authentication services such as Kerberos were working and that the network-related services and drivers were not at fault. We also knew that the problem was likely a service starting in full boot that was not starting in Safe Mode with Networking.
You can easily get at least a list of services started in Safe Mode from the registry. In the registry, go to HKLM\Services\CurrentControlSet\Control\SafeBoot. Under that key you'll find two subkeys: Minimal and Network. Each of these keys holds all the services that start in those two Safe Mode boot options. In my case, I exported the "Networking" key and got a nice text listing of all services started in Safe Mode with Networking.
If the system will boot, you can simply get a list of services started with a Net Start command from the command line and compare it against the registry listing. In my case, I couldn't do that, so I exported the registry key HKLM\system\currentControlSet\Services. In the registry, each service has a start value that indicates whether it starts automatically, manually or disabled. A start value of 3 indicates a manual start. Then, simply find all services with a start value of 2. It will still take a while to find them all -- there's a lot of data to sort through -- so start looking at traditionally problematic services, like your antivirus product's service. In our case, once we stopped the antivirus service, we could boot normally. A quick search of the antivirus product's website pointed to a fix for the problem.
Debugging service problems sometimes requires a crash dump analysis using the Windows Kernel Debugger (Windbg) tool. This is useful for hangs when you have a chance to run the debugger while the problem is occurring, as it may point to a problematic service.
ABOUT THE AUTHOR
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.