Replication Summary Start Time: 2005-10-21 00:02:56
Beginning data collection for replication summary, this may take awhile:
...................
Source DC largest delta fails/total %% error
QTEST-D
To continue reading for free, register below or login
To read more you must become a member of SearchWindowsServer.com
');
// -->

C5 10d.10h:16m:46s 31 / 31 100 (8524) The DSA operation is unable to proceed because of a DNS lookup failure.
QEMEA-DC4 09d.00h:16m:45s 3 / 3 100 (8524) The DSA operation is unable to proceed because of a DNS lookup failure.
BEDROCKDC5 07d.10h:06m:22s 5 / 5 100 (1722) The RPC server is unavailable.
BEDROCKDC4 07d.10h:06m:20s 5 / 5 100 (1722) The RPC server is unavailable.
QAMERICAS-MDC1 07d.06h:17m:40s 22 / 22 100 (1722) The RPC server is unavailable.
KPARKHURST4 03d.11h:13m:49s 12 / 12 100 (1722) The RPC server is unavailable.
QAMERICAS-DC39 17m:55s 0 / 21 0
QTEST-DC9 17m:55s 0 / 25 0
QTEST-DC22 17m:55s 0 / 20 0
QEMEA-MDC1 17m:01s 0 / 47 0
QAMERICAS-DC2 15m:59s 0 / 16 0
Physical connectivity
Obviously, if there is a network failure, replication isn't going to happen. The first thing to do is to check the general health of the domain using the Repadmin /replsum command just described. You can also ping broken DCs by address and FQDN, and you can run NetDiag and DCDiag commands from the command line (with the /v switch on each). This will give you more details about the errors and perhaps related ones.
The network connecting all the sites should be fully routed. Don't create a site link if there is no underlying network link to get between the sites in the site link.
Logical connectivity
This is a bit more difficult to diagnose. It means, bottom line, that something in the AD site topology configuration is wrong, creating a hole in the topology. This could be solved by one of the following actions:
Figure 1
[IMAGE]
Seattle, Denver and Dallas are connected, but there is nothing connecting them with Chicago-Atlanta. We could fix this with something like an Atlanta-Dallas site link or simply put them all in a single site link. Typically, this is not a problem because most topologies are some form of hub and spoke. But you could have a situation, as seen in Figure 2, with a couple of hubs with remote sites off of each one, by forgetting to build a site link between the two hub sites. While it looks fairly simple in these examples, if you have hundreds of sites, it's easy to miss one.
Figure 2
[IMAGE]
Orphaned objects
In one case, a global catalog server was demoted, but an impatient administrator wanted to "clean up" the Active Directory, so he shortened the tombstonelifetime value and then forced garbage collection. Unfortunately, he did that before the deletion of the global catalog (GC) server was completed to all DCs and GCs in the forest. We saw 1311 events along with a host of others stating that Active Directory was trying to replicate an object that had no parent, but it didn't identify the parent. The deletion process deleted the parent object but not the child. We turned on verbose logging and finally identified the GUID of the problem object. Using the LDP.exe tool, we were able to delete that object and stop the 1311 events.
DNS errors
Since AD replication relies on DNS name resolution to find DCs to replicate with, if DNS is broken, it could cause the 1311 events to occur. The helpful thing here is that if DNS is the culprit, the 1311 event will have the phrase "DNS Lookup Failure" included in the description. If you see this phrase, then you absolutely, positively have a DNS problem that must be fixed. I've never seen this error turn out to be bogus. Note that this will not necessarily log an event in the DNS log, and you will see it in other events as well. Remember that just because there are no significant errors in the DNS event log, it doesn't mean DNS is healthy.
When debugging 1311 events, you should get a scope of the entire forest to see which DCs are not replicating. You can do this easily using the Repadmin /Replsum command as described in this article. Note that the loss of physical connectivity, an incomplete AD site topology or DNS failure usually cause these events, with an outside chance it will be an orphaned object. Usually, other events will accompany them, such as the 1722 (RPC Server Unavailable), or the event will contain a descriptive statement such as "DNS Lookup Failure." This is a critical event that must be resolved in order for Active Directory replication to function properly to all DCs.
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers.