The basic idea behind backups is simple: take data from a server and store it, usually in compressed format, on another server or on removable media that can be stored in a safe location. Of course, the mechanics of how these steps are performed has a huge impact on how long the backup takes, how safe the data are, and how likely it is that a restore will be successful.
Backup to tape: Pros and cons
Tape backups have been the gold standard in disaster recovery for more than 40 years -- but are they still? Tape-based systems have some important pros and cons that you should consider as part of your overall disaster recovery planning. The biggest factor driving the use of tape is the total ownership cost. Tape systems occupy the bottom tier of the storage pyramid. The reason is simple economics: tape offers relatively large storage capability coupled with relatively low media costs.
For example, a DLT-III tape currently sells for between $25 and $40 in single quantities. For that, you get between 35 and 70 GB of storage space, depending on compression. These figures don't seem too impressive in the current environment in which disk drives offer storage costs of well under $1/GB, but in quantity, the price of tapes looks better, especially when you factor in the cost of arrays, controllers, and the other paraphernalia that disk-based systems require. For true offline systems -- where backup media are taken to a separate physical site and stored for long terms -- tape is difficult to beat.
That's not the only reason tape technology is ubiquitous, though. It's a familiar and well-understood technology, and it scales relatively well on individual servers. If your backups are too slow, you can add more tape drives to back up more data concurrently, or you can move to a more expensive tape drive type to increase throughput.
Library vendors such as STK and Exabyte offer large-scale tape libraries that can hold hundreds or thousands of individual tapes and switch between them very quickly, which provides near-line access to extremely large volumes of data. For larger numbers of servers, vendors such as CommVault and Veritas sell backup solutions that allow automated backup of dozens, hundreds, or even thousands of servers to a central set of backup servers.
What are the downsides of tape backup technology? First, and most important for our purposes, tape restores are generally slower than disk-based restores. Microsoft's standard rule of thumb is that you should multiply the time it takes to capture an Exchange backup and double it to estimate the time required for a restore, and tape's relative slowness just exacerbates the problem.
Next, tape-based restore processes are error-prone -- one analyst firm estimates that more than 40 percent of tape-based restores initially fail. When you get ready to restore from tape, you're betting that the tape isn't damaged or suffering from media errors brought on by improper handling or inappropriate storage or environmental conditions. In fact, you're making a more fundamental bet: that you can find the tapes in the first place, and that once found, you can get them back to your recovery site in a timely manner. Of course, you can work around these potential problems by building redundancy into your backup processes, but that comes at an extra cost.
Disk-based multi-stage backup and restore
Tape's advantages as a long-term storage mechanism are clear, but so are the disadvantages of using tape as the linchpin of a backup system. This conundrum has led to the common deployment of multi-stage backups: protected data is initially backed up to disk and kept there for a limited period; the disk-based backups are then archived to tape. This approach has several advantages:
- It's fast -- backing up data to disk means that the backup runs at the speed of your storage subsystem, which can exceed the speed of tape systems by a factor of ten or more. This speed shortens the required backup window, which means you can take backups more often.
- It offers more frequent RPOs -- because the backup time window is smaller, you can easily decrease the intervals between backups, which gives you a way to quickly recover to a point in time.
- It puts less load on the Exchange server -- in fact, depending on how you implement the disk-based portion of the backup process, there may be essentially no load on the server because the work is all done by the SAN controller when it makes a copy of the volume being backed up.
These advantages come at a cost, though. Per-gigabyte storage costs for tape are still significantly lower than for fixed disks, so if you have a large volume of data to back up, you'll need to maintain enough spare storage capacity to hold the backups and keep them around for the backup retention period.
In addition, adding disk-based stages to your backup procedures makes them more complicated, so you'll need to spend some extra time and attention to ensure that backed-up data moves from stage to stage appropriately and that you have adequate storage monitoring and control technology so that you don't run out of storage space.
Common backup pitfalls
You've probably heard television sports commentators say that a team did well or poorly based on the amount of emphasis the team gave to the fundamentals. Such is certainly true for backup and restore operations -- it's the simple things that you do, or don't do, that can spell the difference between successful and failed restores when the chips are down.
First, be sure that your backups are actually working. It might shock you to know how many otherwise competent administrators have been undone over the years by the sudden discovery that their backup tapes were blank. This disaster is 100 percent preventable. Every Windows backup utility includes logging and reporting features that can tell you whether the backup completed, and the Exchange information store service logs a number of informational events that tell you when backups started and completed. More important, the information store also logs events that tell you whether errors were encountered.
For full backups, the information store calculates a checksum for each 4KB database page as it's read; the calculated checksum is compared against the checksum stored with the page. If they don't match, the information store logs a -1018 error to indicate the mismatch, and the backup terminates. (In addition, the backup process also checks that each page's "next page" pointer is pointing to a valid page.)
The one exception to this circumstance is that, as of Exchange 2003 SP1, the information store can fix some types of single-bit errors that would otherwise cause a -1018 error. However, if you see a -1018, -1019, or -1022 error generated by the information store, the error indicates a serious problem that warrants your immediate attention. The Microsoft article Understanding and analyzing -1018, -1019, and -1022 Exchange database errors, describes more about these errors, what causes them, and how to troubleshoot them.
The gold standard method for verifying backup integrity is actually to restore the backup and see whether it contains the expected data. You can't rely solely on logging because all that tells you is that the data was successfully read (and verified, if you haven't turned verification off to save time). You don't need to check every backup, but you should do so often enough to maintain confidence that your backup procedures are working. As a happy side effect, if you regularly restore backups to test them, you'll vastly improve your disaster recovery skills, which will pay off in the event of an actual failure that requires restoration.
Third, remember the old adage "out of sight, out of mind." If you use any type of offsite storage, be sure that you include it in your test plans. Can you get the media you need within the allotted recovery period? It's a wise idea to find out before you actually have to. In the same vein, be sure to test retrieved media to make sure that it hasn't been damaged in storage.
10 tips in 10 minutes: Fundamentals of Exchange Server disaster recovery
Tip 1: Defining Exchange disaster recovery
Tip 2: How Exchange backs up data
Tip 3: Choosing a backup type for Exchange
Tip 4: Online vs. offline Exchange Server backups
Tip 5: Basic Exchange backup and restore
Tip 6: Exchange vendor snapshots and point-in-time copies
Tip 7: VSS for Exchange
Tip 8: Exchange Server replication
Tip 9: Exchange design choices and issues
Tip 10: Exchange disaster recovery planning
This chapter excerpt from the free e-book The Definitive Guide to Exchange Disaster Recovery and Availability, by Paul Robichaux, is printed with permission from Realtimepublishers, Copyright 2005. Click here for the chapter download or download all available chapters here.