The Active Directory-related service that has the highest degree of importance to AD -- and which is often the least understood by administrators -- is the Local Security Authority Subsystem (LSASS).
Of all Active Directory services that run on a domain controller (DC), I would wager that an LSASS.exe system error or failure is the most common and has the biggest impact on the environment. It is imperative, therefore, that Active Directory administrators and support personnel understand how LSASS works and how to troubleshoot it.
What is LSASS?
LSASS.exe has the very important responsibility of processing authentication requests and allowing or denying access to user requests. These requests could come from either a domain logon attempt or from another service or application that is responding to a user's request. In any event, if LSASS does not process the request in a timely manner, the request could fail or be significantly delayed.
LSASS.exe mainly consumes memory and CPU resources on every domain controller. Consumption of those resources could occur to such an extent that the DC would:
be unable to satisfy requests for other services, such as Active Directory replication.
be slow to respond to authentication requests and LDAP searches because LSASS needs more resources than the DC can provide.
hang and disrupt all services.
This is not a new problem and Microsoft has documented it fairly well in KB and TechNet articles. As far as describing the problem and defining LSASS's functions, KB 308356 is one of the best articles around, though it is heavily slanted toward Windows 2000. It indicates that the solution is to either get more memory/CPU or reduce the load on the DC.
Why LSASS issues arise
It is important to understand what causes LSASS to consume resources. First of all, when the domain controller starts up, the NTDS.dit (Active Directory's Jet database) loads at least partially into the same memory space as LSASS. Of course, this is limited by available physical memory. For example, if the NTDS.dit is 2 GB, you might expect the LSASS.exe working set size to be 3 to 4 GB based on the activity handled by LSASS.
This authentication activity, which includes LDAP queries, varies throughout the day. What I have observed is a spike in memory usage by LSASS.exe after a reboot, which then eventually flattens out. As the usage demands fluctuate, the memory set will be "trimmed" and it will return memory for usage by other services and processes. If memory use continues to climb over time, a memory leak should be suspected.
Determine if performance issues are related to LSASS
The question everyone has is "How do I know when LSASS is using too much memory or CPU?" The answer is … it depends. We know from KB 308356 and other articles that LSASS.exe will use what memory and CPU that it can. This Microsoft TechNet blog post provides a great way to determine how much memory is too much. Essentially, it says that you need to establish a baseline and then observe deviations from that baseline.
Using Performance Monitor (PerfMon), I typically do this by setting a counter for the Process object and the LSASS instance. I set the Working Set and Working set peak counters and, of course, it is also important to add the %CPU utilization counter to measure CPU performance. I typically establish at least a 48-hour baseline and set the interval to collect at least 1,000 samples. You can do more if you desire.
In the absence of a baseline, Microsoft's blog recommends that "sustained and repeated" CPU utilization of 80% or more should be investigated. Periodic spikes are not a problem and are expected. Again, using a PerfMon analysis with the counters previously noted will determine if there is a problem.
NOTE: If you are not an expert with PerfMon, consider using the third-party tool "Performance Analyzer of Logs" (PAL). It's a free download and easy to use. Just be sure to use the "Threshold File" called Active Directory. It will do a basic analysis for you, showing warning and error thresholds on counters collected.
As noted previously, besides authentication requests, LDAP queries are processed by LSASS.exe and can consume additional resources. In analyzing LDAP queries, it is important not only to consider the number and frequency of the queries, but the source and the efficiency of them as well. I've read that an efficient LDAP query should return no less than a 10% "hit rate." That is, if you send an LDAP query and it searched 1,000 Active Directory objects, it should return 100 successful hits. Inefficient LDAP queries can quickly use up a lot of resources and create an instant bottleneck on the DC.
Fixing LSASS-related problems
The solution to resolving LSASS performance issues in general include:
Identifying the source of LSASS.exe process usage by collecting a user mode memory dump and analyzing the source of the problem to resolve it.
Identifying the source and cause of excessive and inefficient LDAP searches.
Adding more horsepower to the Active Directory environment by adding more DCs or moving to 64-bit large memory platforms for domain controllers.
A future article will have more details on each of these solutions and the steps for troubleshooting LSASS.
ABOUT THE AUTHOR
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.