|This chapter excerpt from Inside Windows Storage, by Dilip C. Naik is printed with permission from Addison-Wesley/Prentice Hall, Copyright 2003.|
Before diving into the various ways that backup and restore operations are accomplished, it is advisable to understand the problems that need to be solved to accomplish the desired objective. The prominent issues are these:
* An ever decreasing amount of time, called the backup window, in which the backup operation must be accomplished
* An ever increasing number of APIs that backup applications need to support 138 Chapter 5 Backup and Restore Technologies
* An inability to back up files that are open and actively being used by an application
Sections 5.2.1 through 5.2.3 consider each of these issues in more detail.
The backup window
Historically, server applications were run during regular business hours only. Backup operations were typically done in the wee hours of the night when the applications could be stopped without causing user distress. Once the applications were stopped, the server would typically be taken offline and the data backed up. There are two problems with this old approach:
1. The huge explosion of data has meant that the backup is hard to accomplish within the given amount of time. Difficult as it may be to believe, writing to tape is an expensive operation, especially in terms of time and man-hours consumed. The correct tape has to be located, mounted, and then positioned. Once positioned, tapes are much slower to write to than disk. Whereas most hard drive interfaces are able to transfer data at a sustained rate well over 80MB per second, the fastest tape drives available today see maximum transfer rates of under 30MB per second. Robotic silos can be used to manage the multiple-tape media units, but they are expensive, and of course they can only alleviate the time spent in locating and loading a tape, not make the tape read or write any faster.
2. The second problem is that more and more applications, as well as the data they access, control, create, or modify, are considered important if not mission-critical. This means that the amount of time when the server could be taken offline to accomplish the backup is shrinking.
Explosion of APIs
Customers are deploying more and more enterprise applications that can be stopped rarely, if ever, for backup. Recognizing this fact, each application vendor has resorted to providing APIs for backing up and restoring the application data files in a consistent manner that ensures that no data is lost. Although the creation of these APIs sounds great, a closer inspection shows that the problem is rapidly worsening as a result.
Figure 5.1 illustrates the problem of an ever increasing need to support more APIs in the backup/restore application. As this example shows, customers typically have multiple applications, and very often multiple versions of the same application. Each backup vendor now must write code to use the APIs provided by each enterprise application. Because many backup application vendors choose to separately license the agents that deal with a specific enterprise application, just keeping track of the software licenses and the costs can make any IT manager dizzy. Furthermore, even all this does not take into account the deployment of the infrastructure, personnel, and discipline to accomplish these backups.
Open files problem
Another problem with doing a backup is that it can take a considerable amount of time. If a tape device has a rated throughput of 10GB per minute, it will take 10 minutes to back up a 100GB disk. During these 10 minutes, applications will need access to the disk and will also be changing data on the disk. To ensure that the backup is consistent, three approaches are possible:
1. To prohibit applications from accessing the disk while the backup is in progress. Blocking simultaneous user access to the application during backup was commonplace in the early days of PC computing, when 24x7 operations didn't take place. The backup was done in times of light load—for example, during night hours.
Now this approach is not feasible, for a couple of reasons:
* Operational requirements now often call for 24x7 application uptime, so there is no good time to do the backup.
* The amount of data needing to be backed up has grown, so the operational hours have increased, and the backup window is often not long enough to accomplish the backup.
2. To back up the data while applications are accessing the disk, but skip all open files. The problem here is that typically, only important applications run while a backup executes, which implies that the important data will not be backed up!
3. To differentiate between I/O initiated by the backup application and other applications. Backup vendors came up with solutions that partially reverse-engineered operating system behavior. In particular, the solutions depend on the implementation being able to differentiate judiciously, a method that can break fairly easily. The implementations have generally used a varying degree of undocumented features or operating system behavior that is liable to change in new versions of the operating system. The solutions also depend on the availability of a sufficient amount of disk space. Another variation that applies to both techniques is whether the implementation works with one file at a time or is simultaneously applied to all files at once.
Three approaches have been tried to allow open files to be backed up yet have a consistent set of data on the backup media corresponding to the open files.
The first approach is to defer application writes to a secondary storage area, allowing the backup operation to back up all files. The approach must also be selective in operation; for example, it must allow the paging file writes to pass through but defer the writes to application data files or put them in a predefined secondary cache (often called a side store), ensuring that the data backed up is in a consistent state. I/O to or from the secondary storage area also needs to be treated specially depending on whether the backup/restore application or a different application is doing this I/O. Once the backup application is done, the data must be copied from the side store to the regular file area.
The second approach is to implement copy-on-write for the benefit of backup applications. When a backup application opens a file, other applications are still allowed to write to the file. To avoid a mix of old and new data in the backup application, the data being overwritten is copied to a side store. If regular applications request this data, the read is handled by the regular Windows file system drivers. When a backup application requests this data, the data is retrieved from the side store. St. Bernard Software is one example of a vendor that has implemented this approach to backing up open files.
Consider Figure 5.2 and notice the layering of drivers (a detailed explanation of Windows drivers, device objects, and so on is available in Chapter 1). The file system filter driver is layered over the NT file system (NTFS) driver, which itself is layered over the disk filter driver. The disk filter driver in turn, is layered over the disk class driver. There are other drivers below the disk class driver (as discussed in Chapter 1), but these are not relevant to the discussion here. When an application opens a file, NTFS (in response to application requests) issues a series of commands to read metadata (the location of the file in the disk) and then issues reads or writes to the logical blocks on which this particular file is stored.
The upper filter driver (above the file system driver) shown in Figure 5.2 is ideally placed to intercept file operations and divert the call, if that is what is desired to solve the problem of open files. Microsoft sells a product called the Windows Installable File System (IFS) Kit, which provides information needed to write such a filter driver. A backup vendor may choose to work at a lower level; for example, an image level would typically use a solution that involves writing a lower filter driver (above the disk class driver), as illustrated in Figure 5.2.
The I/O operations shown in Figure 5.2 operate at a file system level to begin with, as denoted by the path marked with the number 1 in the figure. The NTFS file system driver manages the mapping of file data to disk blocks; subsequently, the I/O operation is done at a disk block level, below the NTFS file system, as denoted by the path marked with the number 2. Microsoft conveniently ships the diskperf.sys filter driver as part of the Windows Driver Development Kit (DDK), which is exactly such a driver. Several backup vendors have started with this sample as their building block for a snapshot solution.
The third approach is to take a snapshot of the data and back up the snapshot while the applications unknowingly continue using the original volume. The snapshot may be created by means of a variety of hardware or software solutions. This is the approach Microsoft favors with Windows Server 2003.
WINDOWS BACKUP AND RESTORE TECHNOLOGIES
Tip #1: Reasons for backup and restore
Tip #2: Backup problems
Tip #3: Backup classifications
Tip #4: Windows 2000 backup utility
Tip #5: Techniques to create a volume snapshot
Tip #6: XP and Windows 2003 volume shadow copy service
Tip #7: Windows-powered NAS devices and snapshots
Tip #8: Network Data Management Protocol
Tip #9: Practical implications
Tip #10: Summary
About the author: Dilip C. Naik has more than twelve years of experience in various roles at Microsoft, including software engineer, program manager, and technical evangelist. His contributions include writing CIFS/SMB code, CIFS-related RFCs, code and documentation for the Windows NT Installable File System Kit, as well as Windows Management Instrumentation (WMI) and performance/management (including storage management) features for the Windows platform. Dilip has also represented Microsoft on a number of industry standards organizations.