Checklist: Establishing a data replication strategy

Once you decide to replicate your data, the real planning begins. Answer the questions on this checklist, and you'll be well on your way!

This Content Component encountered an error
Once you decide to replicate your data, the real planning begins. Which method is best? What do you do when a server fails? Read this checklist to establish a foundation for this important strategy.
 Establishing a data replication strategy
How much of your data needs replication?
If you are replicating existing data, then the current amount of existing data needs to be considered in terms of what size hard disks to buy, but is otherwise of only minor importance since it will only be replicated once. It is more important to consider the volume of new data created each day, since new data will always have to be replicated. You must also take into account the changes that users make to existing files.
Which replication method should you use?
One of the most commonly overlooked issues with data replication is the method used for replicating the data. For example, if a user changes 1 byte of a 2 GB file, will your server have to replicate the entire 2 GB file or just the byte that has changed? This is a very important thing to know, especially if users will be making frequent changes to large files.
How long do you want the process to take?
Products exist that take anywhere from a few milliseconds to almost half an hour to replicate data. It is important to determine what an acceptable length of time is for a replication cycle in your organization. Does the data need to be replicated in near real time, or is waiting a few minutes for the replication cycle to complete acceptable?
Will the replication software also be used for load balancing?
Data replication products often are used for load balancing. This allows users to access data off of multiple servers for better performance. If load balancing is used, though, the replication frequency becomes more important.

For example, suppose that a user opens and changes a Microsoft Word document, saves the changes and then closes the application. A couple of minutes later, he realizes that he needs to add one more thing to the file, so he opens it again. In a situation like this, it is possible that the user might be opening the document from a different server if load balancing is being used. With load balancing, the replication cycle is slow, and...

the user might see the document as it existed before he made the changes. The changes still exist; they just have yet to be replicated to the server. From a user's perspective, however, the changes were never saved.

What happens if a server fails?
Most data replication products are designed so that if one of the servers fails, then user requests are redirected to another server automatically. Still, you should verify that the data replication solution you want to buy has that feature. Some lower-end replication products are intended only for backing up data and do not support server failover. Other low-end products may require you to perform a manual failover procedure.
What steps does the server take to guard against data corruption?
It is important to know what mechanisms, if any, a product has for preventing data corruption. This is important because there have been cases in which an error on a server has led to data corruption, and the corrupted data is propagated to the other servers. In choosing your product, make sure that if data does become corrupted, the corrupted data doesn't overwrite good data being stored on other servers.
What type of server connection is available?
Data replication can place a huge strain on your network's bandwidth, especially if you are replicating lots of data to multiple servers. Ideally, you will want a dedicated gigabit connection between servers. If, however, a dedicated connection between servers isn't available, make sure that the product you choose supports bandwidth throttling. That way, you can avoid completely draining your network of bandwidth. Bandwidth throttling allows you to limit the amount of bandwidth a server uses for replication. This is especially important if you plan to replicate data across a WAN link.
How many replica servers does the product allow?
Sure, one replica might be plenty for today, but you always need to have an eye toward the future. Operational requirements could change, and you may need to create additional replicas, so it's important to know what the limits are.

Remember that the practical limit is often much lower than the limit built into the software. For example, your replication product might allow 32 replicas, but the amount of data you're replicating and your current bandwidth might make it impractical to have more than four replicas.

ABOUT THE AUTHOR:   Go back to checklist
Brien Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. Brien has served as the CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. As a freelance technical writer he has written for Microsoft, CNET, ZDNet, MSD2D, Relevant Technologies and other technology companies. You can visit Brien's personal Web site at

This was first published in March 2005

Dig deeper on Microsoft Group Policy Management



Enjoy the benefits of Pro+ membership, learn more and join.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: