EMC Data Domain : Data Protection and Deduplication © Copyright 2010 EMC Corporation. All rights reserved. 1 Why backup? Goals – Backups are done for restores Operational Disaster Recovery – – – – Disaster recovery requires offsite backup Operational recovery requires onsite backup Need both onsite and offsite copies on disk Need quick restores Don’t have time for moving physical assets – Protection of personal data & intellectual property © Copyright 2010 EMC Corporation. All rights reserved. 2 Why So Much Interest in Data Deduplication? Backup & Archive processes have been overwhelmed by information growth Primary storage efficiency has become a necessity to cope with massive growth ROI drives the compelling appeal of Dedupe – – – – Reduced Storage Capacities Lower Infrastructure Costs Improved SLA’s Efficient Replication for Business Continuance/DR One of the top 10 Technology Considerations Deduplication 59% Very important Deploying Deduplication 24% In use © Copyright 2010 EMC Corporation. All rights reserved. 55% Evaluating / In Near – Long Term plan 21% Not in Plan - Source: TheInfoPro Wave 11 Storage Study, 2008 3 Why Do Enterprises Still Use Tape? Primary Storage • Low upfront cost DISK TAPE • Tape can store the massive amount of redundant data created by backups • Transportable for offsite DR Backup Storage 5x-10x Primary © Copyright 2010 EMC Corporation. All rights reserved. 4 EMC Data Domain: Leadership and Innovation • Deduplication storage systems More than 12,000 systems installed More than 4,300 customers More than 2,600 PB under Data Domain protection worldwide • A history of industry firsts 2003 2004 2005 First Deduplication NAS 2006 2007 First Deduplication Virtual Tape Library First Deduplication Volume Replication 2008 Largest Deduplication Array 2010 Fastest Backup Controller First Deduplication Encryption First Deduplication Directory Replication First Deduplication Nearline Storage © Copyright 2010 EMC Corporation. All rights reserved. 2009 Cascaded Replication First Distributed Processing 5 Data Domain – works with what you have Backup Archive Database VMware © Copyright 2010 EMC Corporation. All rights reserved. 6 De-duplication principles Unique segments (4KB-12KB) – varies “on-the-fly” 7 © Copyright 2010 EMC Corporation. All rights reserved. Confidential 7 De-duplication principles Unique segments (4KB-12KB) – varies “on-the-fly” 8 © Copyright 2010 EMC Corporation. All rights reserved. Confidential 8 Data Deduplication: Technology Overview Store more backups in a smaller footprint Friday Full Backup A B C D A E F G Mon Incremental Tues Incremental Weds Incremental Thurs Incremental A C E A B C Logical Estimated Physical Reduction FRIDAY FULL 1 TB 2–4x 250 GB Monday Incremental 100 GB 7–10x 10 GB Tuesday Incremental 100 GB 7–10x 10 GB Wednesday Incremental 100 GB 7–10x 10 GB Thursday Incremental 100 GB 7–10x 10 GB Second FRIDAY FULL 1 TB 50–60x 18 GB 2.4 TB 7.8x 308 GB H B G Backup Data I J K Second Friday Full Backup B C D E F L G H A BCDE FGH I J K L © Copyright 2010 EMC Corporation. All rights reserved. TOTAL 9 Deduplication Dramatically Reduces Storage Capacity Requirements Deduplication 10–30 times less data stored versus fulls + incrementals with typical retention policies Data Stored 30 20 10 0 1 5 10 15 20 Weeks in Use Deduplication storage Traditional storage © Copyright 2010 EMC Corporation. All rights reserved. 10 Data Domain Scale Data Domain SISL™ Scalable Architecture: CPU-Centric 5 2011 (est.) 3 1.5 DD880, 7/09 Industry’s Fastest Backup Storage Controller Throughput GB/sec. 0.04 6-Year Improvement • Throughput: ~90x • Capacity: ~225x DD200 (2004) 1.25 70 >PB Addressable Capacity in TB Post-RAID (Physical) © Copyright 2010 EMC Corporation. All rights reserved. 11 Inline vs Post-Process Deduplication: Provisioning & Admin Post Process: Deduplication After Storing Inline: Deduplication Before Storing At least 3x disk accesses to shared store Store Dedupe Dedupe Updedupe? Process contention increases with #processes − − − − Replicate Replicate? Restore Restore Other activities unimpeded − Predictable − Simpler Copy to tape: Too slow to stream tape Recovery: SLA predictability Replication: Poor time-to-DR Deduplication itself if interleaved with backup or restore More admin needed to fight these issues © Copyright 2010 EMC Corporation. All rights reserved. 12 Data Integrity: Data Invulnerability Architecture Trust but verify—”hope” is not a strategy Data verification Checksum Deduplication, write to disk Verify Generate Checksum Verify Data File System Global Compression Self-healing file system Cleaning Expired data Defrag Verify Local Compression RAID Verify the file system metadata integrity Verify user data integrity Verify stripe integrity Other RAID 6 NVRAM Snapshots © Copyright 2010 EMC Corporation. All rights reserved. 13 Network-Efficient Replication for True Disaster Recovery Lowers WAN costs; improves service level agreements Flexible replication 1–5% DB Data Domain system One-to-many Many-to-one Bi-directional System-tosystem Cascaded DIR A Home Archive data WAN Backup data Data Domain system 1–5% 1–5% Home Data Domain system Source: Remote sites 95–99% cross-site bandwidth reduction Data Domain DDX Array with DD880s Destination: Data Center Hub Supports hundreds of remote sites © Copyright 2010 EMC Corporation. All rights reserved. 14 Industry’s Most Scalable Inline Deduplication Systems New Global Deduplication Array DD880 DD600 Appliance Series Software options: DD Boost, DD Virtual Tape Library, DD Replicator, Retention Lock, and DD Encryption DD140 Remote Office Appliance DDX Array Series Up to 16 Controllers Global Deduplication Array DD140 DD610 DD630 DD660 DD690 DD880 450 GB/hr 675 GB/hr 1.1 TB/hr 2.0 TB/hr 2.7 TB/hr 5.4 TB/hr Speed (DD Boost) 490 GB/hr 1.3 TB/hr 2.1 TB/hr 2.7 TB/hr 3.9 TB/hr 8.8 TB/hr 12.8 TB/hr 140 TB/hr Logical capacity 17–43 TB 75–195 TB 165–420 TB .520–1.31 PB .710–1.7 PB 2.8–7.1 PB 5.7–14 .2 PB 45.6–114 PB Raw capacity 1.5 TB Up to 6 TB Up to 12 TB Up to 36 TB Up to 192 TB Up to 384 TB Up to 3.07 PB Usable capacity 0.86 TB Up to 3.98 TB Up to 8.4 TB Up to 285 TB Up to 2.28 PB Speed (Other) © Copyright 2010 EMC Corporation. All rights reserved. Up to 48 TB Up to 26.1 TB Up to 35.3 TB Up to 142.5 TB DDX Array 86.4. TB/hr 15 Why Data Domain? • Less disk to resource, less to manage – CPU-centric deduplication – Inline – Green • Simple, mature, and flexible – Simple, mature appliance – Nearline tier: any fabric, any software, backup or nearline applications • Resilience and disaster recovery – Storage of last resort – Cross-site global compression: data center or remote office © Copyright 2010 EMC Corporation. All rights reserved. 16