Although every virtual machine (VM) exists within a server's memory space, the VM is stored on -- and periodically saved to -- shared storage such as a SAN or NAS. But, as any IT administrator knows, local storage just isn't enough to protect business data from the ravages of accident and disaster. As a result, VMs are almost universally replicated to off-site facilities. This may simply be a matter of creating a static copy of the VM at a cold site, or establishing some type of operational copy at a warm or hot site. The fact is, however, administrators often struggle with moving the copy fast enough to meet business needs.
While it's not an easy tightrope to walk, there are some considerations that might help bring the challenges of replication into focus.
Consider what needs to be protected. All VMs are not created equal. Some are mission-critical, where a loss of data or availability could be catastrophic for an organization, while others are merely incidental, used occasionally or only by a small number of users for non-essential purposes. Most VMs fall somewhere in this continuum, and critical workloads should receive more protection than non-critical ones.
Replicating the workloads that are important to the business and leaving the other workloads for local snapshots or some type of asynchronous batch copy is the one of the best ways to improve real-time replication performance. Although this strategy takes some thought and planning, it makes the most use of expensive WAN bandwidth.
Consider the data volume. Replication performance is affected by the amount of data that must be moved over time: Copying a larger number of VMs in a shorter period of time will demand more WAN bandwidth and lower latency. One way to lower the data volume is to create smaller VM footprints by allocating less memory during the VM creation process. Another tactic is to replicate only essential workloads, as mentioned above.
Replication must also involve a consideration of latency caused by distance. For example, synchronous replication is best for critical workloads that do not tolerate downtime -- the copy will be kept "synchronized" with the original. Synchronous replication, however, can only be implemented within a fairly small geographic distance. On the other hand, asynchronous replication allows the copy to be "out of sync" with the original for a certain amount of time (perhaps as much as an hour). While this poses a greater potential for data loss, it supports global distances and can be accomplished with much lower overall WAN bandwidth.
It is often possible to mix the replication priorities, copying critical VMs synchronously and moving non-critical workloads on a bandwidth-available basis.
Consider the tools. Replication efficiency is influenced by the software that handles the data transfers. For example, replication may be handled directly between storage systems, but can also be implemented with stand-alone software. Storage-based replication can be extremely efficient and leverage compression or other special features of the storage system itself, but to use those features, identical storage hardware is needed at both ends. Third-party software is more heterogeneous, but not necessarily as efficient.Administrators should test replication performance with a variety of tools and compare behaviors before settling on one.
Consider the network infrastructure. After looking at which workloads should be protected, the data volume and the tools that are moving the data, it may be prudent to evaluate the bandwidth capabilities both within the LAN and the WAN. For example, business needs may justify additional bandwidth to support the replication data volume. Similarly, additional bandwidth may be needed between the LAN and the storage sub-system to overcome any bottlenecks. Organizations that experience significant fluctuations in bandwidth demand may be able to save money by only buying additional bandwidth during periods of peak need.
In addition, you should think about the impact of WAN disruptions, which can stop both workload replication and recovery. Organizations that demand high levels of availability may justify redundant WAN carriers between sites. Also, be sure redundancy is present all the way to your building. For example, carriers that share the last few miles of cable aren't really resilient if the cable is broken in a disaster.
Consider the cost. All of your replication decisions will ultimately depend on money, so it's important for administrators to match replication performance to business needs.
The question of budget often forces organizations to re-think their replication plans. In some cases, it may make sense to sacrifice a level of replication performance in favor of financial savings -- as long as the solution addresses the actual replication goals. For example, the cost of additional bandwidth for full real-time VM replication may be prohibitive. It may be much more cost-effective to purchase a new storage system to capture local snapshots of all workloads, and then move all of the workloads to the DR site asynchronously across the lower bandwidth connection.
|Stephen J. Bigelow, senior features writer, has more than 15 years of technical writing experience in the PC/technology industry. He holds a BSEE, CompTIA A+, Network+, Security+ and Server+ certifications and has written hundreds of articles and more than 15 feature books on computer troubleshooting, including Bigelow's PC Hardware Desk Reference and Bigelow's PC Hardware Annoyances. Contact him at firstname.lastname@example.org.|