IT Showcase On: Windows Server 2012 Deduplication

IT Showcase On: Windows Server 2012 Deduplication
Windows BitLocker Drive Encryption
Quick Reference Guide
Saving Storage Costs Using Windows Server 2012 Deduplication
The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own
company or organization.
EXECUTIVE OVERVIEW
Features of Deduplication
Microsoft IT has found great value in using deduplication, the
Server Data Storage Optimization service included in Windows
Server 2012. Also known as Dedup, deduplication finds and
removes duplication within data without comprising fidelity or
integrity. Deduplication has also improved server storage
capacity and user experience and increased productivity by
streamlining the storage management process.
Capacity optimization. Data deduplication in Windows
Server 2012 stores more data in less physical space. It
achieves greater storage efficiency than was possible by
using features such as Single Instance Storage or NTFS
compression. Data deduplication uses subfile variable-size
chunking and compression, which deliver optimization
ratios of 2:1 for general file servers and up to 20:1 for
virtualization data.
What Is Deduplication?
The basic purpose of deduplication is to eliminate duplicate data
and reduce redundant data storage. Deduplication facilitates the
storage of more data in less space and removes redundant
copies of data by replacing them with a reference to a single
copy.
In conjunction with Microsoft BranchCache and optimization
management features found in Windows Server 2012,
Microsoft IT used deduplication to increase data storage for
scale, performance, reliability, data integrity, and bandwidth
efficiency.
Why Did Microsoft IT Need Deduplication?
Enterprise data storage increases at tremendous rates, which
results in increased costs—a challenge for any IT department.
Deduplication reduces storage costs and is a significant storage
solution available in Windows Server 2012.
To cope with data storage growth in the enterprise, Microsoft IT
used data deduplication to consolidate servers and scale their
capacity to meet their data storage optimization goals.
Scale and performance. In Windows Server 2012, data
deduplication is highly scalable, resource efficient, and
nonintrusive. It can process about 20 MB of data per
second, and it can run on multiple volumes simultaneously
without affecting other workloads on the server. Low
impact on the server workloads is maintained by throttling
the CPU and memory resources consumed. If the server is
busy, deduplication can stop completely. In addition,
administrators have the flexibility to run data deduplication
jobs at any time, set schedules for when data deduplication
should run, and establish file-selection policies.
Reliability and data integrity. When data deduplication is
applied, the integrity of the data is maintained. Windows
Server 2012 uses checksum, consistency, and identity
validation to ensure data integrity. For all metadata and the
most frequently referenced data, data deduplication
maintains redundancy to ensure that the data is
recoverable in the event of data corruption.
Bandwidth efficiency with BranchCache. Through integration
with BranchCache, the same optimization techniques are applied
to data transferred over the WAN to a branch office. The result is
faster file download times and reduced bandwidth consumption.
Optimization management with familiar tools. Windows
Server 2012 has optimization functionality built into Server
Manager and Windows PowerShell cmdlets. Default settings can
provide savings immediately, or administrators can fine-tune the
settings to see more gains. Windows PowerShell cmdlets can be
easily used to start an optimization job or schedule one to run in
the future. Installing the Data Deduplication feature and enabling
deduplication on selected volumes can also be done by using an
Unattend.xml file that calls a Windows PowerShell script or with
the System Preparation Tool (Sysprep) to deploy deduplication
when a system first starts.
Deduplication Architecture
Available as part of Windows Server 2012, deduplication made it
easy for Microsoft IT to leverage both existing data management
technologies and physical infrastructure. Inherent in the
deduplication architecture is resiliency during hardware
failures—with full checksum validation on data and metadata,
including redundancy for metadata and the most accessed data
chunks.
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SharePoint, Windows, and Windows Server are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Compressing the chunks, and then organizing them
into special container files in the System Volume
Information folder
After a volume is enabled for deduplication and the data is
optimized, the volume contains the following elements:


Dedup has three layers:


Management Interface. Deduplication can be managed
through Windows Server 2012 Server Manager, Windows
PowerShell, or Windows Management Instrumentation.
Deduplication job. The actual deduplication job can
optimize storage and includes two additional advanced
features—Garbage Collection and Scrubbing:

Garbage Collection processes deleted or modified data
on the volume so that any data chunks no longer
referenced are cleaned up.

Last Scrubbing Time analyses the chunk store
corruption logs and, when possible, makes repairs.
Data access layer. When deduplication is enabled,
administrators can access the deduplication file system to
store, retrieve, and index an application with no noticeable
performance attrition.


Unoptimized files. For example, unoptimized files
could include files that do not meet the selected fileage policy setting, system state files, alternate data
streams, encrypted files, files with extended
attributes, files less than 32 KB in size, other reparse
point files, or files in use by other applications.
Optimized files. These files are stored as reparse
points that contain pointers to a map of the respective
chunks in the chunk store that are needed to restore
the file when it is requested.
Chunk store. This is the location for the optimized file
data.
Additional free space. The optimized files and chunk
store occupy much less space than they did prior to
optimization.
No Impact on Application Performance
Data deduplication works by finding and removing duplication
within data without compromising data fidelity or integrity. The
goal is to store more data in less space by:
After deduplication, files are no longer stored as
independent streams of data; instead, they are replaced
with stubs that point to data blocks stored within a
common chunk store. Because these files share blocks, the
blocks are only stored once, which reduces the disk space
needed to store all files. During file access, the correct
blocks are transparently assembled to serve the data
without calling the application or the user having any
knowledge of the on-disk transformation to the file. This
enabled Microsoft IT to apply deduplication to files without
having to worry about changes in behavior to the
applications or impact the users who are accessing those
files.

Deploying Deduplication

How Deduplication Works


Segmenting files into small, variable-sized chunks (32–
128 KB)
Identifying duplicate chunks and maintaining a single copy
of each chunk
Replacing redundant copies of the chunk with a reference to
the single copy
Microsoft IT used two methods to estimate the rate of savings
that would help evaluate whether a drive was a good candidate
for deduplication:


At a command prompt, by typing C:\>DDPEval.exe<
VolumePath:>
In Windows PowerShell, by using the MeasureDedupFileMetadata cmdlet
An example of the Data Deduplication Saving Evaluation Tool
results are illustrated below.
Enable and Configure
Microsoft IT used Server Manager to install and enable the
deduplication feature on the chosen volumes:


To install, click Server Manager  Add Roles and Features
 File and Storage Services  File Services  Data
Deduplication  Install.
To enable deduplication on a volume, click Server Manager
 File and Storage Services  Volumes; right-click the
intended volume, and then click Configure Data
Deduplication  Enable data deduplication, Set schedule,
or Exclude folders.
Before Microsoft IT deployed deduplication feature, they
first evaluated candidate servers and data for savings. After
candidates were identified, rollout plans, scale, and policies
were developed.
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SharePoint, Windows, and Windows Server are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
Optimize Data
Although deduplication does not affect performance while
running, it can take anywhere from several hours to several days
to finish depending on the amount of data. Microsoft IT
customized the optimization schedule to start right before the
end of peak business hours and run through the tasks during offpeak business hours.

To schedule a task, type Task Scheduler in the Search bar. In
the Task Scheduler, use the drop-down menu to access the
Task Schedule Library and the list of tasks that can be
scheduled. For example Task Schedule Library  Microsoft
 Windows  Deduplication  Garbage Collection/
Scrubbing.
Checking Deduplication Results
Microsoft IT used Windows PowerShell to check the results of
deduplication by running PS C:\Windows\System32>getdedupvolume <volumepath>: | fl.
The returned information includes drive capacity, saving rates,
excluded folders and file types, and chunk redundancy
thresholds as well as other volume data that helps Microsoft IT
validate deduplication results.
Monitoring, Maintenance, and Troubleshooting
Through ongoing monitoring, Microsoft IT is able to obtain the
rate of optimization savings and follow the status and progress of
ongoing optimizations. Also, part of monitoring efforts is the
inspection of data scrubbing reports.
Microsoft IT also performs maintenance activities that include
running additional Scrubbing and Garbage Collection as required
for further optimization.
For troubleshooting, Microsoft IT can query Dedup metadata and
unoptimize a volume.
Deduplication Benefits
Microsoft IT has successfully deployed deduplication on 12
Windows Server 2012 file servers, focusing on the deduplication
of primary data, such as general file shares, software deployment
shares, and virtual hard disk (VHD) libraries. With an average of
10 TB of storage capacity for each server, the initial overall
savings rate averaged 40 percent. The two VHD drives that
store archival data had a saving rate that was close to
90 percent.
In addition to the reduced data volume, Microsoft IT saw
performance and capacity improvements as well as cost
savings.
Reduced Costs
Based on the current rate of deduplication savings, the table
below demonstrates deduplication savings forecasted by
Microsoft IT for the current and next two fiscal years.
Type
FY13
FY14
FY15
680 TB
820 TB
1,100 TB
Physical Storage and
Management Cost
$2,040,000
USD
$2,460,000
USD
$3,300,000
USD
Storage and
Management Cost
$1,224,000
USD
$1,476,000
USD
$1,980,000
USD
$816,000
USD
$984,000
USD
$1,320,000
USD
File Share Data
Storage Capacity
Reduced Data Volume
Microsoft IT reduced their data volume through
deduplication without compromising the data contents,
fidelity, or integrity. Deduplication optimized data volumes,
depending on data type, by more than 80 percent. For
example, for 10 TB of VHD data, deduplication can save
8 TB in drive space, requiring only 2 TB of drive space for
10 TB of data!
Storage Savings
RESOURCES
Typical Deduplication Storage Savings

Scenario
Content type
Space savings
User documents
Documents, photos,
music, videos
30–50%

Deployment
shares
Software binaries, cab
files, symbols files
70–80%

Virtualization
libraries
VHD files
80–95%
General file share
All of the above
50–60%
Data Deduplication Overview:
http://technet.microsoft.com/en-us/library/hh831602.aspx
About Data Deduplication: http://msdn.microsoft.com/enus/library/windows/desktop/Hh769303(v=vs.85).aspx
Windows Server 2012: http://www.microsoft.com/enus/server-cloud/windows-server/default.aspx
Improved Performance and Capacity
These savings provided Microsoft IT with smaller and faster
data backup-and-restore processes as well as efficient
archiving and quick data migration. As a result of the
storage space savings, Microsoft IT could add more data
using less hardware.
With the storage saving Microsoft IT enjoyed after
implementing deduplication, there is now capacity for
200 TB of storage in an environment that used to support
120 TB, without the addition of physical capacity.
Deduplication has enabled Microsoft IT to expand their file
share, client data backup service, scaling of the service to
accommodate additional storage needs, simplified storage
management, and reduced costs.
© 2013 Microsoft Corporation. All rights reserved. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SharePoint, Windows, and Windows Server are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.