Windows applications and caching control
The Windows NT platform allows an application to control caching behavior, both within the file system and also within the storage sub-system. In particular, when applications open a file using the CreateFile API, they may specify the following options to control caching behavior:
- FILE_FLAG_SEQUENTIAL_SCAN informs the file system that the file will be accessed sequentially. This allows the file system to be more aggressive in caching and perform read ahead/write behind operations and also flushing/discarding buffers once the I/O has progressed beyond the location of the file contained within the buffer. Benchmarks have shown that FILE_FLAG_SEQUENTIAL_SCAN provides a negligible improvement in read throughput, but provides a noticeable improvement in write throughput.
- FILE_FLAG_WRITE_THROUGH indicates that a device may not complete a write request until the data is committed to media. This is enforced through the use of the FUA flag
- FILE_FLAG_NO_BUFFERING indicates no caching should be performed in the file system layer
- FlushFileBuffers is an API call that forces all data for an open file handle to be flushed from the system cache and also sends a command to the disk to flush its cache (contrary to the name, this call affects all data stored in the device cache)
- File systems have control on a per I/O basis and they typically use write through for metadata operations only
In reality, Windows NT 4.0 and Windows 2000 did not correctly implement code corresponding to the FILE_FLAG_WRITE_THROUGH option. As an optimization, the disk driver checks to see if write caching is enabled and bypasses these operations if it is not. If the cache was enabled, the write operations are completed before the data is committed to the spinning media. The implications are obvious in terms of inflated performance numbers and possible data corruption. This problem affects other platforms including Linux and other operating systems, besides also affecting the Windows server platform. Microsoft is obviously committed to ensuring that application requested functionality is correctly implemented and has fixed a number of bugs in the Windows XP/Windows .NET Server 2003 platform(s). However, there is a tangled web to unravel and the main threads can be summarized as:
- Windows 2000 and prior operating systems assumed that WCE was turned off and never bothered to ensure that WCE was turned off, even when an application specified the FILE_FLAG_WRITE_THROUGH option for a file. The problem only applies to SCSI and FC devices and not IDE disks. To ensure proper data integrity, an administrator should check the cache setting using a storage disk vendor supplied utility. Unless a user modified the cache setting by clicking on the Windows 2000 write cache property page, it was assumed that write caching was not enabled and consequently no attempt was made to force writethrough operations. Also, cache synchronization (FlushFileBuffers) was not done except at system shutdown (or device removal on Windows 2000). Service Pack 3 for Windows 2000 fixes known problems except when dynamic disks are used.
- Windows XP also fixed most of these problems and faithfully uses the SCSI Force Access Unit parameter within the SCSI write command when an application specified FILE_FLAG_WRITE_THROUGH or the filesystem is writing metadata. However, Windows XP only did this for basic disks and dynamic disks on Windows XP continued to have the same bug as existed in Windows 2000. Basic disks are legacy disks that existed in pre-Windows 2000 days and have a disks partition table at the beginning of the disk. Dynamic disks were introduced in Windows 2000 and provide enhanced data integrity and management features. For a comprehensive description of Dynamic disks, please see reference 1. Most home and workstation users do not use dynamic disks.
- Meanwhile, applications desiring good performance and throughput should specific FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING in the CreateFile API. A number of application developers, including some at Microsoft, faithfully followed this recommendation. However, applications desiring data integrity also correctly specified the same setting! Thus one now needs to decide the intent of the application developer. Applications that have critical write timing constraints and can tolerate a slightly increased risk of data corruption on a power failure should use just the FILE_FLAG_NO_BUFFERING flag.
- The tangled web was rendered even more intricate by a couple of knowledge base articles. One of them, KB 276253 explains that under some rare circumstances, the operating system may reset the SCSI bus. Another one, KB 308219 refers to SCSI disk performance problems without specifying sufficient details. Microsoft has indicated it is in the process of providing further details for this KB and has also indicated that the issues referred to in KB 308219 do not have anything to do with WCE on SCSI disks. The Microsoft Knowledge Base article Q332023 provides more details.
Solutions and implications
- Microsoft has indicated that it will make available fixes for Windows 2000 and Windows XP that will provide the desired data integrity by ensuring the data is committed to the spinning media before the write operation is completed when the file is opened specifying FILE_FLAG_WRITE_THROUGH. The implications for degraded performance are obvious. In fact, Windows 2000 Service Pack 3 already has most of the new behavior implemented.
- A new option (advanced performance) will be available in the upcoming Windows Server 2003 release that lets customers with power protected storage devices restore the performance since they do not have the same risk of power outages. This option should only be used when a battery backup unit is available for the storage device. The next service packs for Windows 2000 and Windows XP will also allow changing this setting, but there will not be an updated UI so a resource kit utility will be available to implement this function. It should also be noted that large enterprise customers using expensive storage arrays with potentially many GBs of cache are not affected since these devices do not implement FUA or cache sync commands (nor do they need to). Some low-end RAID controllers do not have onboard battery backup and should never have write caching enabled, even if the system itself is on a UPS.
- Microsoft has indicated that it will be examining applications it owns to ensure the correct file flags are used. Just what this list of applications that will be updated is, and the exact time frame in which they will be updated is something still unknown. Perhaps a promised Knowledge Base article will shed some more light. The task of updating applications is rendered a little more onerous by the fact that one must not only update applications that use the CreateFile API, one must also update applications that use the StrCreateStorageEx() and StgOpenSTorageEx() methods exported by the platform SDK. See reference 2 for more details.
- Microsoft has also indicated that it is considering creating an application compatibility layer for legacy applications affected by this. Nevertheless, this is one more thing for the system administrator to worry about.
- Additional bugs in third party device drivers have been uncovered in this investigation and fixes are being implemented by those vendors to correctly support caching controls.
- While considering benchmarks that measure performance on Windows 2000 and Windows NT 4.0 (such as in references 3 and 4), one needs to factor in the presence of the bug described.
The Windows NT family of operating systems has been a little lax in providing behavior specified by application semantics. Microsoft has positively identified these shortcomings and indicated that fixes have either already been implemented or will shortly be made available. Once the fixes are available, the application specified behavior will be provided, but at the cost of reduced performance. Meanwhile, system administrators need to evaluate their trade off between system throughput and the miniscule chance of data loss and ensure that their SCSI hardware is configured accordingly.
Back to Part I
For more information:
Discover Dynamic Disks
MSDN section on StgCreateStorageEx and StgOpenStorageEx APIs
Sequential I/O on Windows NT 4.0 -- achieving top performance
More on "Managing Windows-based storage"
About the author:
Dilip C. Naik has more than fourteen years of experience in various roles at Microsoft, including software engineer, program manager, and technical evangelist. His contributions include writing CIFS/SMB code, CIFS-related RFCs, code and documentation for the Windows NT Installable File System Kit, as well as Windows Management Instrumentation (WMI) and performance/management (including storage management) features for the Windows platform. Dilip has also represented Microsoft on a number of industry standards organizations.
Do you want to see more articles or insights from noted industry observers? Visit the complete Bits & Bytes column library.