Q. What happens if an application has problems with deduplicated data? Can I turn Windows Server deduplication...
Storage needs have increased dramatically as new applications work with data-intensive file types ranging from customer databases, email archives and PDFs to image libraries and streaming media files. Data deduplication has emerged as a powerful technology for mitigating enterprise storage demands by identifying and removing redundant data elements, or chunks, without altering or compromising the data. Windows Server 2012 R2 provides expanded deduplication capabilities, but IT professionals must consider the proper use and suitability of deduplication to ensure that application performance and system performance benefit in a predictable manner.
Once enabled and configured on a Windows Server 2012 R2 platform, data deduplication is typically delivered as a background job periodically initiated by the server; this is slightly different from the inline deduplication capabilities native to some storage subsystems. Consequently, when Windows Server deduplication is applied, some applications or file types may not perform well, such as files that require continuous access, change often or require active I/O patterns. Examples include Hyper-V host systems, SQL database files, Exchange servers, Windows Server Update Services (WSUS) and files larger than 1 TB.
The underlying problem here is that files that change frequently cannot be efficiently deduplicated. The server must deliver significant computing resources to process a deduplication job -- and immediate changes to the application's files only undo the deduplication. This isn't an issue for dedicated storage subsystems such as EMC's that are designed to provide continuous deduplication, but it's a computing burden for conventional servers. In actual practice, deduplication is best applied to files that have reached some age (more than zero days) without changing. If you must select a short deduplication cycle, be certain to benchmark the application before and after deduplication is enabled in order to compare performance and ensure that the application's performance is not adversely affected.
There are several options available when an application or file access is adversely impacted by Windows Server deduplication. First, an IT administrator may elect to change the deduplication frequency by changing the schedule or opting for manual deduplication jobs. In addition, job options like StopWhenSystemBusy will halt deduplication if the job interferes with the server's workload, while the Priority options can be used to prioritize deduplication jobs. The Expand-DedupFile cmdlet can expand (or un-deduplicate) specific files if needed for compatibility or performance.
In most cases, scheduling and other options can improve performance without it being necessary to disable deduplication, but there are times when deduplication simply needs to be disabled. Deduplication can be removed on a volume by running a Start-DedupJob cmdlet with "Unoptimization" as the job type.
Dig Deeper on Windows Server storage management
Related Q&A from Stephen J. Bigelow
Eliciting performance requirements from business end users necessitates a clearly defined scope and the right set of questions. Expert Mary Gorman ... Continue Reading
Requirements fall into three categories: business, user and software. See examples of each one, as well as what constitutes functional and ... Continue Reading
Navigating data center malfunctions when hardware is off premises can be tricky. Organizations must have strong SLAs with their colo provider to ... Continue Reading