Home > Windows Server Tips > Active Directory Administration > When an Active Directory design goes bad -- and how to fix it
Windows Server Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

ACTIVE DIRECTORY ADMINISTRATION

When an Active Directory design goes bad -- and how to fix it


Gary Olsen, Contributor
08.29.2006
Rating: -3.82- (out of 5)


Expert advice on Active Directory and Group Policy
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google


In a previous article by Gary Olsen, Best practices for Active Directory replication topology design, he discussed some basic principles in regard to designing Active Directory replication topology. This week, he recalls a case study in which a poor design eventually resulted in serious replication problems and caused the topology to be re-designed and implemented.

The company in this week's case complained that after a domain controller failed and became unavailable, the replication didn't flow as the designers had planned. We were able to work with them to figure out what went wrong and how to fix it -- all on the fly.

In the sidebar, you'll find the best practices identified in last week's article which we will use here to analyze this company's Active Directory replication topology to help resolve the problem.

Identifying where things went wrong

Figure 1 shows a graphical illustration of the replication topology at this corporation.

[IMAGE]

We examined the topology, and it was apparent that the designers intended to create a multi-tier replication topology, forcing the sites at the lower bandwidth locations (tier 3) to replicate to regional hubs (level 2), which, in turn, replicates to two "core" hubs (tier 1). Unfortunately, when they implemented this design, they failed to observe best practice number four, and if they would have diagrammed the flow (best practice number five), they most likely would have seen that the design just didn't make sense.

Figure 1 only shows a part of the whole topology. The company in this case collected the Northwest sites -- Calgary, Sacramento, Portland, Boise and Seattle -- and put them into one site link. The company repeated this for other regional groupings of sites -- defining each group into a site link, so there was a Southwest link, Southeast link and a Northeast link as well.

The core lin...


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google



RELATED CONTENT
Microsoft Active Directory Design and Administration
Active Directory in Windows 2008 R2
What is Next Generation Active Directory?
Utilizing Active Directory snapshots in Windows Server 2008
Active Directory tops the list of hot Windows Server 2008 R2 features
Creating Windows taskpad views for Active Directory management
When to add new domains to your Windows environment
Forcing the removal of a Windows Server 2008 domain controller
Performing a staged installation of an RODC in Windows Server 2008
Using Active Directory to manage Macs in a Windows environment
Scripting domain controller installations: A must for Server Core

Microsoft Active Directory Tools and Troubleshooting
How to find and remove lingering objects in Active Directory
DNS troubleshooting best practices
Generating a DNS health check in Windows
Debugging Windows client logon delays: Narrowing the scope
Troubleshooting poor Windows logon performance in Active Directory environments
New Operations Manager 2007 feature allows for automated agent deployments
Taming the LSASS.exe process for Active Directory performance and security
Active Directory FAQs
Troubleshooting Active Directory database errors
Troubleshooting a cross-forest trust in Active Directory

Active Directory Administration
How to find and remove lingering objects in Active Directory
Utilizing Active Directory snapshots in Windows Server 2008
Creating Windows taskpad views for Active Directory management
When to add new domains to your Windows environment
Debugging Windows client logon delays: Narrowing the scope
Using Active Directory to manage Macs in a Windows environment
Troubleshooting poor Windows logon performance in Active Directory environments
Common Active Directory security oversights
Scripting domain controller installations: A must for Server Core
Taming the LSASS.exe process for Active Directory performance and security

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
Active Directory  (SearchWindowsServer.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary


k contained the company's two hub sites in Boston (East) and Chicago (West). The problem with this topology is that it violates best practice number three: There was no topology defined to connect the three tiers together. That is, you could replicate within the sites in the Northwest link, and between the sites in the Southwest link, and even in the core site links, but they couldn't replicate from, say, Seattle to Los Angeles. This is a perfect example of what would cause the infamous Event 1311 in the Directory Services event log.

Fixing the design can be trial and error

To make this work, the designers decided to connect the links by putting regional hub sites that were in the third tier (such as the Northwest link) in the second tier (West) link as well. They also added the appropriate core site link to the second tier site link (see Figure 2). Note that Sacramento, the regional hub, is now part of two links -- the West link and the Northwest link. The first and second level tiers are connected by putting the Chicago site in the West link as well as in the core link. That satisfies best practice number three and connects the topology together.

[IMAGE]

Although it still violates best practice number four, it did work. It worked until the DC in the Sacramento site had a disk drive fail, which caused them to rebuild the Sacramento DC. For reasons we still don't fully understand, since the KCC had used Sacramento to connect to the West link, and since we had full bridging turned on, the KCC decided to elect another site to replace Sacramento in the topology. For whatever reason, it picked Portland, and replication worked.

Meanwhile, the Sacramento DC comes back online, but the KCC still insists on using Portland. Portland isn't a Level 2 site and thus doesn't have the bandwidth that Sacramento has, so performance suffered. They wanted to get it back the way it was. After trying unsuccessfully to repair it, they rebuilt the Sacramento DC, and the KCC recreated the routing to funnel through Sacramento. All was well until the DC in Los Angeles went out. Los Angeles was the regional hub for the Southwest link. Same thing happened -- with LA gone, the KCC elected the Las Vegas site as the Southwest link "hub." And when LA came back online, the routing didn't change and performance issues arose. Just like for the Northwest link, they rebuilt the Los Angeles DC and replication was routed successfully through Los Angeles again. In addition, they found some third-level sites that had connections created to them from all other sites in the AD. This was indeed strange behavior.

Do I have to rebuild the DC every time?

Now they are wondering if every time a DC in a hub site goes down, would they have to rebuild it? This was not acceptable.

After diagramming the topology (Figure 2), we thought it should work. However, I'd never seen site links created with many sites in a single site link like they had done. I reasoned that we had violated best practice number eight. That is, by putting several sites in a site link, when a failure occurred and the KCC had to rebuild the topology, it picked another site to hook to the next tier. I asked an expert on replication at Microsoft if perhaps it was picking the DC based on the GUID. He was as baffled about this as I was but thought that was a good guess anyway.

The solution, of course, was to obey best practice number four and not allow any site links to have more than two sites. That makes the topology look like Figure 3, with the red and blue lines representing site links and the numbers representing the site link cost. I advised the administrator to delete all site links except the core and to create new site links, each with only two sites, then connect the regional hubs. That is, connect tier 3 hubs to their corresponding tier 2 hubs, etc.

[IMAGE]

Lessons learned

The administrator asked me if that would affect his production AD environment. I told him this action would definitely affect the AD … it would fix the problem! He actually just reconstructed the links in the Northwest site link, creating three new site links, each with Sacramento in it. So that gave him Sacramento-Portland, Sacramento-Calgary and Sacramento-Seattle links. It indeed fixed the problem. They tested it by taking the Sacramento DC out and putting it back in, and it routed replication properly. He then rebuilt the rest of the topology.

The lessons to be learned here are:

We were able to fix a bad design literally on the fly without affecting our production AD. Of course we used common sense and rebuilt it over the weekend, but it was not a significant outage and the administrative work was minimal.

AD replication topologies are not set in stone. It is best to fix them and eliminate the problems than to live with the problems.

Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers.

Rate this Tip
To rate tips, you must be a member of SearchWindowsServer.com.
Register now to start rating these tips. Log in if you are already a member.




DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Server Room Design - Planning, Cooling, Maintenance
HomeTopicsBlogsITKnowledge ExchangeTipsNewsMultimediaWhite PapersIT Downloads
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2004 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts