Advanced Data Deduplication Solutions for Medium, Large and

IBM ProtecTIER Deduplication Solutions
© 2011 IBM Corporation
got
data?
too much
And not enough ( blank ) to store it all?
Time Money People Floor Space Electricity Air Conditioning
2
Protect More. Store Less.®
© 2011 IBM Corporation
The tidal wave of data continues …
 The amount of digital information continues to grow exponentially
 And we need to keep more of it, longer
 And the costs of losing data are increasingly unacceptable
–Lost revenues
–Lost customer confidence
–Embarrassment in the market
–Fines from contracts, government agencies
–CEO and CFO could go to jail
 But budgets are not increasing
2005
2006
2007
2008
2009
2010
Data created and copied is expected to grow at
48% CAGR through 2010
We Need to do More with Less,
and we need to do it smarter
3
Protect More. Store Less.®
Source: Various external consultant reports
© 2011 IBM Corporation
Survey - what are your two biggest storage pain points?
Managing Storage Growth
Proper Capacity Forecasting & Storage Reporting
Backup Administration & Management
Managing Costs
Managing Complexity
Storage Provisioning
Archiving and Archive Management
Dealing with Performance Problems
Data Mobility
Lack of Integrated Tools
Regulatory Compliance
Power Management
Vendor Management
0%
10%
20%
30%
40%
50%
* TheInfoPro Storage Study: F1000 Sample. n=149. Other n=14. *Multiple responses recorded
4
Protect More. Store Less.®
© 2011 IBM Corporation
The pressures on backup administrators are growing
More new data coming
Backup takes longer
Growth
Backup
Manage
Recover
Can’t buy more storage
5
Recovery takes longer
Protect More. Store Less.®
© 2011 IBM Corporation
Using the right balance of high density tape
and high performance disk will help . . .
•
Long Term Retention
• Cost effective capacity
• Removable & transportable
Compliance
• Meet financial & regulatory
requirements
• Data encryption, WORM
•
6
•
•
Protect More. Store Less.®
Short Term Retention
• Use disk for daily backup
& restore operations
Performance
• Fast backups
• Even faster restores
• Meet “backup windows”
© 2011 IBM Corporation
And data deduplication is the key
to using more disk more cost effectively!
7
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Overview
© 2011 IBM Corporation
Protect More. Store Less.®
ProtecTIER reduces the required backup disk capacity by
up to 25 times!
9
Protect More. Store Less.®
© 2011 IBM Corporation
IBM ProtecTIER Deduplication Innovation and Leadership
2003
2004
6 PhDs begin
researching
massively
scalable
deduplication
algorithms
2005
2006
First Deduplication
Virtual Tape Library
deployed into
production
First non-hash
deduplication
algorithm developed,
designed for 100%
data integrity
2007
2008
2009
2010
2011
First single node
system to store
over 1PB of
deduplicated data
First to deliver VTL
solutions for both
Open and Mainframe
environments
First to deliver
Fastest single
Many-to-Many
node inline The only “true”
replication
deduplication enterprise-class
solution
deduplication
solution
on the
IBM acquires
First
Fastest restore
market today
Diligent
Deduplication
speed – up to
solution for
2800 MB/sec!
First true
System z
IBM’s
first
clustered system
midrange solution
with Global
released
Deduplication
 Installed in all major industries
– Over 1,400 ProtecTIER systems sold to date
– Production systems range in size from 5TB
to over 700TB
– Over 90 PB of physical disk capacity behind
ProtecTIER servers in production protecting
thousands of PBs of backup data
10
Protect More. Store Less.®
© 2011 IBM Corporation
How ProtecTIER works
Repository
New Data Stream
HyperFactor™
Memory
Resident Index
ProtecTIER™
Server
Backup Servers
11
Only
Backup
4GB
with
needed
Inlinetodeduplication
map
1PB
Up to
of1400MB/sec
physical disk!
per server or
2000MB/sec
with 2 node cluster!
“Filtered” data
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Deduplication Operation and Results Example
Backup application writes data to
ProtecTIER as it would to tape
Only unique data is stored, existing
duplicate data is referenced
When data objects expire, references
are removed and free space is reclaimed
and reused
1
2
A B C D E
12
3
4
F
G H I
Backup
Event
Amount
Amount
Received
Stored
Dedupe
Ratio
First Full Backup
1 TB
250 GB
4:1
Incremental Backup
100 GB
10 GB
4.2:1
Incremental Backup
100 GB
10 GB
4.4:1
Second Full Backup
1 TB
10 GB
7.8:1
Incremental Backup
100 GB
10 GB
8:1
Third Full Backup
1 TB
10 GB
11:1
7.8 TB
350 GB
22:1
5
J
After two months . . .
Protect More. Store Less.®
© 2011 IBM Corporation
Storage Impact from ProtecTIER Deduplication
Represented
capacity
Master
Server
Backup
Server
ProtecTIER
Server
Store up to
13
Physical
capacity
25 times backup data on given physical storage capacity
Protect More. Store Less.®
© 2011 IBM Corporation
Significantly Reduces Replication Bandwidth
Primary Site
Represented capacity
Backup
Server
ProtecTIER
Gateway
Physical
capacity
Backup
Server
IP-based
WAN link
Deduplication
enables a large
amounts of data to
be replicated with
significantly less
bandwidth
Secondary Site
Backup
Server
14
ProtecTIER
Gateway
Physical
capacity
Virtual
cartridges can
be cloned to
tape at DR site
Tape
library
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Many-to-One Replication Overview
Up to 12 Branch Offices (spokes): Gateways and/or Appliances
1 target (hub): Appliance, Gateway, single or two-node cluster
IP based
NR links
Backup
Server
ProtecTIER
Gateway
Physical
capacity
Central / DR Site
15
Protect More. Store Less.®
Virtual
cartridges can
be cloned to
tape by the
Main-Site B/U
server
Tape
library
© 2011 IBM Corporation
ProtecTIER Many-to-Many Native Replication Grid
Site A
Up to 4 hubs in a grid
Site B
Site C
Backup
Server
Site D
ProtecTIER
Gateway
Physical
capacity
Supports any combination of Gateways, Appliances, single or two-node clusters
16
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Support for Symantec OpenStorage (OST)
• OST API separates the backup
logic from the storage appliance
logic and implementation
NetBackup
Server
NetBackup
Policy and Control
OpenStorage API
ProtecTIER
OST Plugin
IBM ProtecTIER:
ProtecTIER
Server
17
Backup storage appliance
with Deduplication and
Native Replication
Protect More. Store Less.®
© 2011 IBM Corporation
17
IBM ProtecTIER® Deduplication Family
TS7650
ProtecTIER
Appliances
TS7610
ProtecTIER
Appliance
Express
TS7650G & TS7680
ProtecTIER
Gateways
Highest Performance
Largest Capacity
High Availability
Better
Performance
Larger Capacity
Scalable
Good Performance
Backup: Up to 2000 MB/sec
Entry Level
Restore: Up to 2800 MB/sec
Easy to Install
Up to 500 MB/sec
Up to 1 PB Useable Capacity
7 TB to 36 TB
Useable Capacity
Up to 100 MB/sec
4 TB and 5.4 TB
Useable Capacity
18
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Differentiation
© 2011 IBM Corporation
ProtecTIER Advantage: Data Integrity
 Unique and patented HyperFactor® deduplication
technology
 The only production proven deduplication solution not
based on a hash algorithm
 Designed for 100% data integrity
 Bit for bit comparison of data to ensure data is a duplicate
 Can NEVER lose data due to a hash collision
Although the chance of losing data from a hash collision is
low, it is NOT ZERO as it is with a ProtecTIER solution
20
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Advantage: Restore Performance
 Restoring data from a ProtecTIER solution is even FASTER
than backing up
 ProtecTIER can easily restore at 2800MB/sec!
 High restore performance not limited to certain backup
applications or specific data sets like other vendors
 High restore performance achieved on real data with realistic
20% change rate in production environments
 Never requires agents on backup servers
Other vendor’s “CPU-centric” architectures are optimized for
processing hashes not moving data
21
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Advantage: Scalability
 A single ProtecTIER system can support up to 1 Petabyte of
useable capacity
 ProtecTIER supports the use of any IBM storage system
(DS8000, DS5000, XIV, etc.) and most third party storage
systems for the repository
 IBM has hundreds of ProtecTIER systems with over 100TBs of
useable capacity in production environments throughout the
world
 IBM always states “Useable Capacity” and never uses the
deceptive “RAW capacity” terms like other vendors
The hidden costs associated with managing, maintaining, powering
and cooling multiple appliances is significant and should not be ignored!
22
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Advantage: Global Deduplication
 ProtecTIER Cluster with true Global Deduplication has been
Generally Available and in production since 2008
 Supported with all major backup applications and available for
all Open Systems, System z and System I platforms
 No agents or backup server upgrades required
 Other vendor’s Global Deduplication capabilities are immature
and incomplete with very few if any systems in production
 Other vendor’s Global Dedupe restricted to certain models,
only with NetBackup OST and require agents to be installed
Many vendors claim to have Global Deduplication but create multiple separate
repositories that may contain redundant data!
23
Protect More. Store Less.®
© 2011 IBM Corporation
ProtecTIER Advantage: Inline Deduplication
Example: Disk activity needed to ingest and deduplicate 10 TBs of backup data
Post Process Approach: Deduplicate after Storing
10 TB Data
Hash-based
Post Process
Write 10 TB
Read 10 TB
2x
Requires:
> storage
> I/Os
> Time
> Effort
> Admin
ProtecTIER Inline Approach: Deduplicate before Storing
10 TB Data
24
HyperFactor
Read or Write
10 TB
Protect More. Store Less.®
1x
Results:
simple
faster
easier
cheaper
efficient
© 2011 IBM Corporation
ProtecTIER Advantage: Inline Deduplication
Inline Processing
Backup
Server
Truck
ProtecTIER VT
Tape Library
SLA is Met
Dedupe
8:00 PM
2:00 AM
8:00 AM
8:00 PM
Post Processing
Dedupe
Backup
Overlap
Server
Truck
VTL
Tape Library
Dedupe
8:00 PM
25
2:00 AM
8:00 AM
Protect More. Store Less.®
8:00 PM
© 2011 IBM Corporation
With an IBM ProtecTIER Solution you can . . .
 Store up to 25 times more data on disk
– Up to 25:1 reduction with 100% data integrity
 Reduce backup and restore times
– Fast inline deduplication up to 2000 MB/sec
– Even faster restores up to 2800 MB/sec
 Improve the reliability of backup operations
– Eliminates mechanical & handling failures
 Drive the cost of disk based backup down
– Reduces energy, cooling, and space required
 Increase data retention
– Store more backup data on disk for a longer
time with very little additional cost
26
Protect More. Store Less.®
© 2011 IBM Corporation
For More Information on IBM’s ProtecTIER
IBM Customers
The main ProtecTIER Web Page
www.ibm.com/systems/storage/tape/protectier
IBM and Business Partners
 Visit the IBM ProtecTIER Sales Kit on
PartnerWorld
https://www304.ibm.com/jct09002c/partnerworld/wps/servlet/mem/ContentHandler/Prote
cTIER%20SalesKit/lc=en_US
Visit the IBM ProtecTIER Sales Kit on W3
 http://w303.ibm.com/sales/support/ShowDoc.wss?docid=C469520B08856D52&infoty
pe=SK&infosubtype=S0&node=doctype,S0|doctype,SKT|brands,B5000|client
set,IA|geography,AMR|industries,&appname=CC_CFSS
27
Protect More. Store Less.®
© 2011 IBM Corporation
Hindi
Hebrew
Simplified
Chinese
Russian
Gracias
Thank You Obrigado
Spanish
English
Brazilian Portuguese
Arabic
Da n k e
Grazie
German
Italian
Korean
M erci
French
Japanese
Tamil
Traditional Chinese
28
Protect More. Store Less.®
Thai
© 2011 IBM Corporation
Trademarks and Disclaimers
8 IBM Corporation 1994-2011. All rights reserved.
References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at
http://www.ibm.com/legal/copytrade.shtml.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered
trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual
environmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does
not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information,
including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or
any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance,
function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here
to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any
user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements
equivalent to the ratios stated here.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
29
Protect More. Store Less.®
© 2011 IBM Corporation
30
Protect More. Store Less.®
© 2011 IBM Corporation