EMC Data Domain :
Data Protection and Deduplication
© Copyright 2010 EMC Corporation. All rights reserved.
1
Why backup?
 Goals
– Backups are done for restores
 Operational
 Disaster Recovery
–
–
–
–
Disaster recovery requires offsite backup
Operational recovery requires onsite backup
Need both onsite and offsite copies on disk
Need quick restores
 Don’t have time for moving physical assets
– Protection of personal data & intellectual property
© Copyright 2010 EMC Corporation. All rights reserved.
2
Why So Much Interest in
Data Deduplication?
 Backup & Archive processes have been
overwhelmed by information growth
 Primary storage efficiency has become a
necessity to cope with massive growth
 ROI drives the compelling appeal of Dedupe
–
–
–
–
Reduced Storage Capacities
Lower Infrastructure Costs
Improved SLA’s
Efficient Replication for Business Continuance/DR
One of the top 10 Technology Considerations
Deduplication
59%
Very important
Deploying Deduplication
24%
In use
© Copyright 2010 EMC Corporation. All rights reserved.
55%
Evaluating / In Near – Long Term plan
21%
Not in Plan
- Source: TheInfoPro Wave 11 Storage Study, 2008
3
Why Do Enterprises Still Use Tape?
Primary
Storage
• Low upfront cost
DISK
TAPE
• Tape can store the massive
amount of redundant data
created by backups
• Transportable for offsite DR
Backup
Storage
5x-10x
Primary
© Copyright 2010 EMC Corporation. All rights reserved.
4
EMC Data Domain:
Leadership and Innovation
• Deduplication storage systems
More than 12,000 systems installed
More than 4,300 customers
More than 2,600 PB under Data Domain protection worldwide
• A history of industry firsts
2003
2004
2005
First Deduplication
NAS
2006
2007
First Deduplication
Virtual Tape Library
First Deduplication
Volume Replication
2008
Largest
Deduplication
Array
2010
Fastest Backup
Controller
First
Deduplication
Encryption
First Deduplication
Directory Replication
First Deduplication
Nearline Storage
© Copyright 2010 EMC Corporation. All rights reserved.
2009
Cascaded
Replication
First Distributed
Processing
5
Data Domain – works with what you have
Backup
Archive
Database
VMware
© Copyright 2010 EMC Corporation. All rights reserved.
6
De-duplication principles
Unique segments (4KB-12KB) – varies “on-the-fly”
7
© Copyright 2010 EMC Corporation. All rights reserved.
Confidential
7
De-duplication principles
Unique segments (4KB-12KB) – varies “on-the-fly”
8
© Copyright 2010 EMC Corporation. All rights reserved.
Confidential
8
Data Deduplication: Technology Overview
Store more backups in a smaller footprint
Friday Full Backup
A B C D A E F G
Mon Incremental
Tues Incremental
Weds Incremental
Thurs Incremental
A
C
E
A
B
C
Logical
Estimated Physical
Reduction
FRIDAY FULL
1 TB
2–4x
250 GB
Monday Incremental
100 GB
7–10x
10 GB
Tuesday Incremental
100 GB
7–10x
10 GB
Wednesday Incremental 100 GB
7–10x
10 GB
Thursday Incremental
100 GB
7–10x
10 GB
Second FRIDAY FULL
1 TB
50–60x
18 GB
2.4 TB
7.8x
308 GB
H
B
G
Backup
Data
I
J
K
Second Friday Full Backup
B C D E F
L G H
A BCDE FGH I J K L
© Copyright 2010 EMC Corporation. All rights reserved.
TOTAL
9
Deduplication Dramatically Reduces Storage
Capacity Requirements
Deduplication
10–30 times less data stored versus fulls + incrementals with typical retention policies
Data Stored
30
20
10
0
1
5
10
15
20
Weeks in Use
Deduplication storage
Traditional storage
© Copyright 2010 EMC Corporation. All rights reserved.
10
Data Domain Scale
Data Domain SISL™ Scalable Architecture: CPU-Centric
5
2011 (est.)
3
1.5
DD880, 7/09
Industry’s Fastest
Backup Storage Controller
Throughput
GB/sec.
0.04
6-Year Improvement
• Throughput: ~90x
• Capacity: ~225x
DD200 (2004)
1.25
70
>PB
Addressable Capacity in TB
Post-RAID (Physical)
© Copyright 2010 EMC Corporation. All rights reserved.
11
Inline vs Post-Process Deduplication:
Provisioning & Admin
Post Process:
Deduplication After Storing
Inline:
Deduplication Before Storing
At least 3x disk accesses to
shared store
Store
Dedupe
Dedupe
Updedupe?
Process contention increases with
#processes
−
−
−
−
Replicate
Replicate?
Restore
Restore
Other activities unimpeded
− Predictable
− Simpler
Copy to tape: Too slow to stream tape
Recovery: SLA predictability
Replication: Poor time-to-DR
Deduplication itself if interleaved with backup or
restore
More admin needed to fight these issues
© Copyright 2010 EMC Corporation. All rights reserved.
12
Data Integrity: Data Invulnerability Architecture
Trust but verify—”hope” is not a strategy
Data verification
Checksum
Deduplication, write to disk
Verify
Generate
Checksum
Verify
Data
File System
Global Compression
Self-healing file system
Cleaning
Expired data
Defrag
Verify
Local Compression
RAID
Verify the file system
metadata integrity
Verify user data
integrity
Verify stripe integrity
Other
RAID 6
NVRAM
Snapshots
© Copyright 2010 EMC Corporation. All rights reserved.
13
Network-Efficient Replication for True
Disaster Recovery
Lowers WAN costs; improves service level agreements
Flexible replication
1–5%
DB
Data Domain system
 One-to-many
 Many-to-one
 Bi-directional
 System-tosystem
 Cascaded
DIR A
Home
Archive data
WAN
Backup data
Data Domain system
1–5%
1–5%
Home
Data Domain system
Source:
Remote sites
95–99% cross-site bandwidth reduction
Data Domain DDX Array
with DD880s
Destination:
Data Center Hub
Supports hundreds
of remote sites
© Copyright 2010 EMC Corporation. All rights reserved.
14
Industry’s Most Scalable Inline Deduplication
Systems
New
Global Deduplication Array
DD880
DD600
Appliance Series
Software options:
DD Boost, DD Virtual Tape Library,
DD Replicator, Retention Lock,
and DD Encryption
DD140 Remote Office
Appliance
DDX Array Series
Up to 16 Controllers
Global
Deduplication
Array
DD140
DD610
DD630
DD660
DD690
DD880
450 GB/hr
675 GB/hr
1.1 TB/hr
2.0 TB/hr
2.7 TB/hr
5.4 TB/hr
Speed (DD Boost) 490 GB/hr
1.3 TB/hr
2.1 TB/hr
2.7 TB/hr
3.9 TB/hr
8.8 TB/hr
12.8 TB/hr
140 TB/hr
Logical capacity
17–43 TB
75–195 TB
165–420 TB
.520–1.31 PB .710–1.7 PB
2.8–7.1 PB
5.7–14 .2 PB
45.6–114 PB
Raw capacity
1.5 TB
Up to 6 TB
Up to 12 TB
Up to 36 TB
Up to 192 TB
Up to 384 TB
Up to 3.07 PB
Usable capacity
0.86 TB
Up to 3.98 TB Up to 8.4 TB
Up to 285 TB
Up to 2.28 PB
Speed (Other)
© Copyright 2010 EMC Corporation. All rights reserved.
Up to 48 TB
Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB
DDX Array
86.4. TB/hr
15
Why Data Domain?
• Less disk to resource, less to manage
– CPU-centric deduplication
– Inline
– Green
• Simple, mature, and flexible
– Simple, mature appliance
– Nearline tier: any fabric, any software, backup or nearline
applications
• Resilience and disaster recovery
– Storage of last resort
– Cross-site global compression: data center or remote office
© Copyright 2010 EMC Corporation. All rights reserved.
16