While most admins would agree that Windows Server 2003 R2's new replication engine, Distributed File System Replication, is light years ahead of FRS (File Replication Service), it seems very few really understand how to deploy it properly. Perhaps years of fighting FRS and keeping a fresh supply of Burflags in their hip pockets has given them tunnel vision. Fortunately, learning to configure a healthy DFS environment using DFSR is something...
any Active Directory administrator can do.
DFSR, in addition to the domain namespace management functionality that has been around since Windows 2000 Server, allows configuration of replication groups to simply replicate data without defining a namespace. We can configure replication groups to replicate data from a file server in a remote site to a server in the hub site. That server is easily attached to a storage area network or other storage configuration for large data storage for the enterprise. Administrators can then back up the data at the hub site without worrying about the remote sites.
A typical replication group would have two servers -- the hub server and a server in a remote site (or perhaps multiple servers in the remote site). The hub would have a share for each remote site, and the share would be on a storage device. Admins typically refer to the server at the remote site as the source (where data is created and changed) and the server at the hub site as the target (where data is received and backed up).
Configuring replication groups (RGs)
While Distributed File System Replication has similar components to FRS, they work a little differently. Figure 1 shows a typical configuration: three remote sites, with each site having one file server. Each file server has a single shared folder that is replicated to a share on the server at the hub site. Thus, on the hub site there will be three shares – one for each remote shared folder.
When creating a replication group (RG) there are two options: multipurpose replication and data collection. For backing up remote sites to the hub, we will use data collection. In the configuration, you will select the source server (at the remote site or branch office) and the hub server and identify the folders on each to replicate. You will then configure bandwidth throttling and a replication schedule. Be careful here, as improper configuration at this point is a leading cause of failure. Make sure you carefully analyze the available bandwidth on your network and the amount of data you have to replicate. If you choose a bandwidth that's too small and schedule it too infrequently, you will have a major backlog of data.
After configuring the replication group, the information will be replicated to all domain controllers. The time it takes depends on your Active Directory infrastructure and network. Initially, the source server you selected in the RG configuration will be designated with a "primary" flag. This is somewhat like the old Burflags D4 or D2 setting in FRS; but, once again, it doesn't work the same way. In DFSR, this primary designation will last until initial replication has completed before it is removed and never used again.
I have seen several cases where administrators think they can force replication from the source server to the target like they did with the FRS Burflags method (by forcing the source server to have the isPrimary designation). Using the DFSRAdmin command, you can tell which servers still have the isPrimary designation and have not initially replicated. The replication group name in this example is "DFS2":
C:\>dfsradmin membership list /rgname:dfs2/attr:memname,rfname,isprimary
MemName RfName IsPrimary
CORP-DC2 DFS2 No
SRV2 DFS2 No
Note: It is sometimes necessary to use the DFSRAdmin utility to reset the isPrimary value to true if initial replication does not work. Once initial replication has occurred, setting isPrimary will have no effect.
This is where things get interesting. On the hub (target) server, under the replicated folder, will be a hidden folder called "dfsrprivate," with five subdirectories as shown in Figure 2. The contents of these subdirectories are:
- Conflict and Deleted -- Holds files that have changed on the target server for comparison to files on the source server to see whether to replicate changes.
- PreExisting -- During initial replication, if the target server has files that the source server does not have, they are placed in this directory. If the file on the target server is different from that of the source, the file is placed into the Conflict and Deleted folder and DFSR replicates only the changed blocks of the file.
- Staging -- This is used for outgoing data (similar to FRS). It is incorrect to say that the data in this directory is files. They are actually changes (RDC signatures, RDC hashes, USN Journal data, etc.) as well as file data. There could be many of these entries on a single file, as there is not necessarily a 1:1 relationship between entries in the staging directory and the physical files.
Remember that this is still a multi-master replication engine with bidirectional replication. Even though it is set up with a "source" and "target," this is only in the mind of the administrator. Data can replicate from the target server (hub site) to the source server (remote site) just as easily. When a file is changed (on either node of the replication group), it will trigger replication to the other replication node
Note: Yes, you can change the properties on a replication group to be unidirectional, but it is not recommended and quite dangerous. That action will prevent the normal file comparison between the two servers in the RG and will break replication. Don't do it!
It is not uncommon to have large amounts of data in the staging directory, but it should move pretty quickly. For example, I've heard of cases where upwards of 250,000 files were in the staging directory. Remember that it's not necessarily a problem that there are files there, as I've also seen cases where admins see all these files in the staging directory, assume they are backlogs, start trying to fix it and end up making matters a whole lot worse.
For additional information on Distributed File System Replication, check out Microsoft's DFS Step-by-Step Guide.
In my next article, I'll offer a case study that shows how an incorrect understanding of these principles caused a problem when there was no problem to begin with. In addition, look for detailed troubleshooting steps for fixing these problems.
ABOUT THE AUTHOR
Gary Olsen is a systems software engineer for Hewlett-Packard in Global Solutions Engineering. He authored Windows 2000: Active Directory Design and Deployment and co-authored Windows Server 2003 on HP ProLiant Servers. Gary is a Microsoft MVP for Directory Services and formerly for Windows File Systems.