Keeping Active Directory clean and tidy is usually a low priority, as long as no one is complaining. Because you have "more important" things to do, you let it go – until something breaks.
It's better to do periodic maintenance than to try to fix something like replication failures that can cause downtime. Here are common areas that need to be monitored as well as some pointers for cleanup operations:
CNF (conflict) objects – These are duplicate records created in Active Directory. Somehow two records were created that can't be resolved by AD, so it flags the newest one with CNF: in front of the GUID. You can see these in a Repadmin/ShowRepl command output, or you may see them when viewing AD objects using ADSIEdit or LDP. Typically you need to delete these records, but it's a good idea to view attribute details to make sure you delete the correct ones. This can cause replication failures.
DEL (deleted) objects – This looks like a bad thing but isn't. It does cause admins to want to clean up when it is not necessary. If you demote and repromote a DC, the old DC computer object will show up in Repadmin /showrepl with DEL: in front of the GUID. It does make the report look messy, but it isn't a problem. It is simply saying that the object was deleted. After a while this won't show up in the report, but it causes some folks to want to do something crazy, like shortening TombstoneLifetime to one day and then forcing garbage collection, which can cause replication failures.
Lost and Found Folder – This is one that flies well under the radar. Unless you enable Advanced Features in the AD Users and Computers snap-in, you won't see this folder. It contains duplicate objects or objects AD didn't know what to do with. Create an OU and put a user in it on DC1. Unplug the network cable, then go to DC2 and delete the OU. Plug the cable back in. Because DC2 doesn't have the OU, it doesn't know where to put the user when replication happens, so it puts the user object in Lost and Found container. To clean it up, look for valid objects -- like the user in this example -- and move them to the proper place.
Note that you will most likely find CNF flagged objects or other objects that just don't' belong. I once saw a couple of NTDSSettings objects (replication) from a demoted DC in the Lost and Found container. That caused replication to fail because it saw those objects but there was no DC associated with them. In general, you want to delete objects in this folder.
Leftovers from manual DC demotion – Forcefully removing a DC from Active Directory with the DCPromo /forceremoval option required cleanup of the AD. This is much easier in Windows 2003 than in Windows 2000, but it's a good idea to check these areas as well:
- NTDSUtil – You can use Metadata Cleanup to look at all sites and domains for server objects that should not be in there. This takes a while, but it's a good way to just make sure all the objects shown are valid DCs. Remove the ones that should have been deleted. Here are a couple of good Knowledge Base articles to use if you aren't familiar with this procedure:
- How to remove data in Active Directory after an unsuccessful domain controller demotion
- AD sites and services – This is a different object than the one found in NTDSUtil. Find the site of the removed DC and make sure the related server object is deleted. You usually have to do this manually after a forced demotion.
- DNS records – These are usually cleaned up, but it's a good idea to make sure all of the DNS records for the removed DC are gone – SRV, A, Alias and SOA or NS records, if it was also a DNS server.
- FRS objects – File Replication Service objects are usually deleted, but check in ADSIEdit under
\System\File Replication Service\Domain System Volume (SYSVOL). Look for objects of DCs that have been demoted.
Monitor Groups and Group Membership – This one bites everyone. Security Group and group membership bloat are extremely common. Authentication failures can occur when you have more than 70 to 80 groups in a user's token and use default UDP. New router technology has helped this, but I still see users who are members of several hundred groups that have authentication errors or can't access file shares, among other problems.
It's rare that hundreds of groups for a user would really be necessary. Likewise, group membership can get out of control easily. I have talked to several Windows admins in different companies who all say they don't have any idea who has admin rights. Some administrator will add users as domain admins to solve a security problem on a hot issue because it's an easy solution – but then he forgets to remove those users and gives them the real rights they need. This builds up over time and it can be disaster to have many users with domain admin rights, which can be a destructive right if it's not restricted to the proper people. Remember that reducing the size of the user's security token will improve logon performance.
Purge old or unused users and computer accounts – Make sure to disable and delete accounts for users who leave the company as well as test accounts, old service accounts and the like. If someone just jerks a workstation or server out of the network and never puts it back in, the computer object will not be removed. You might be surprised to find how many computer objects have no computer associated with them.
Purge old, unused or duplicate Group Policies – The two things that directly affect not only the logon time but also the bandwidth used for user logons are Group Policies applied and Group Membership. These also apply to the computer for computer startup time. Have a periodic review of Group Policies and settings and purge any old, unused or duplicate ones. Consider combining them.
Each GPO processed causes a performance hit, so maybe you can combine some of them. Also, things like ACL filtering or blocking inheritance, for example, will increase processing time. I'm not saying not to do it, but make sure it is necessary and see if there is a better way to accomplish it. Because policies are replicated by FRS, if you have large numbers of Group Policies – some have hundreds of policies¬ – it will be a heavier burden on FRS, which isn't the most efficient replication engine in the world.
DNS cleanup – Scavenging should do this, but it isn't perfect, and many organizations don't enable scavenging. Just open the DNS snap-in and peruse the various folders. Make sure there are no duplicates, that the name matches the correct IP address and that records for non-existent servers, clients and DCs, for example, are removed. If you have delegations, make sure all delegation records point to the correct DNS server for that zone. IP address changes on delegated servers can cause widespread DNS failures.
In general, you need to pay attention to event logs. Using tools like MOM will help for large organizations. I'm surprised by how many admins I talk to who have had critical errors being logged for months before a failure but didn't notice the events.
An ounce of prevention is indeed worth a pound of cure – and can prevent a weekend in the computer room.
You can follow SearchWindowsServer.com on Twitter @WindowsTT.
ABOUT THE AUTHOR: Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.