All but the smallest organizations have some level of security auditing turned on for their servers. Whether it is collecting information about failed logon attempts, successful file access, file deletion, Active Directory modifications, or something else, the fact is most of us need to capture a certain amount of information.
At TechEd 2011, I led a panel discussion on auditing that made one thing very clear: Windows’ native security auditing definitely involves some compromises. One gentleman’s story was representative of everyone’s experience:
Management asked me to provide certain details on things like failed logons, successful logons and so forth. I told them we didn’t currently collect that information, and they instructed me to do so. I enabled the necessary auditing, and turned it back off almost as quickly. Our domain controllers simply couldn’t handle the additional load.
The fact that auditing causes a performance hit is something that surprises many administrators at first, because it isn’t entirely intuitive. After all, the domain controller is already performing the work; why is simply making a note of it so much more difficult? And it is difficult: One fellow made a point that auditing was something his company worked into its capacity planning. He estimated that his team had twice as many domain controllers as it would need just to handle logon traffic, because it had turned on almost every auditing option possible.
It’s one thing for a file server to deny a request to access a file; it’s an entirely different operation for the file server to open up the event log and make a note of that fact. While Windows’ native event logging architecture is robust, it isn’t free; it requires computing effort, and it can diminish the overall performance of a server. This is why auditing is almost always a tradeoff between performance and knowledge. The more auditing that takes place, the less user workload the server can ultimately handle, because it’s spending more time on auditing workloads. Some organizations simply deploy more computing resources to handle that workload; others have to scale back on the amount of auditing they’re doing in order to keep their servers running at a desired level of performance.
Third-party auditing solutions can sometimes handle a much greater degree of auditing than the native event log architecture. They do so through a combination of three basic techniques:
- Agents installed on the server can tap directly into Windows APIs, rather than waiting for events to be written to the event log. These remove the overhead of the event log, since native auditing can be disabled. Capturing data directly from API traffic is often lower-overhead.
- Auditing data can be “lazily written,” which means it can be queued up for logging a short time later. This isn’t usually a huge lag time, but it does allow auditing to take a slight backseat to user workload.
- Events are often transmitted to a central database for actual writing, removing a bit more workload from the server, since the server doesn’t need to maintain the actual log.
Admins will likely have to do some lab-based experiments to see exactly what level of performance impact the chosen auditing configuration will create in the server environment. How one chooses to approach auditing, the core message is this: You can’t have it all. Management needs to understand that capturing every possible bit of information will create a performance impact; the company needs to be willing to pay for that impact in terms of additional servers, bigger servers, or lowered performance expectations.
ABOUT THE AUTHOR
Don Jones is a Senior Partner and Principal Technologist for Concentrated Technology, LLC, a strategic consulting and analysis firm. Contact him through the company's Web site, http://ConcentratedTech.com.