The IBM view on storage archive solutions

advertisement
IBM Systems and Technology Group
The IBM view on storage archive solutions:
requirements to solve
and trends for the future
31st ADLUG ANNUAL MEETING - Firenze, September 19-21st
Marco Ceresoli
Data Protection and Retention Sales Leader
IBM Europe
© 2012 IBM Corporation
IBM Systems and Technology Group
Agenda
 The growth and the variety of digital information
 The shift of market dynamics and trends for Archiving
 Technologies for data archiving: comparison
 New trends: Linear Tape File System value proposition
 Role and history of IBM in Tape technology
 Case studies and conclusions
2
© 2012 IBM Corporation
IBM Systems and Technology Group
Storage is growing… and not only
in terms of capacity
•Velocity
• Variety
• Volumes
Growth
Digital
Universe
2005
150 ExaByte
(150 millions TB)
3
Source: 2011 IDC Digital Universe Study
Growth
Digital
Universe
2011
1.800 ExaByte
(1,8 billionsTB)
© 2012 IBM Corporation
IBM Systems and Technology Group
 Every day 15 PetaBytes of new information in
digital format are created
 80% of this new data is unstructured generated
mainly by email, documents, images, video and
audio.
EFFECTS…
 A company with 1,000 employees spend on
average 5,3M$ every year to search for
information which is difficult to find.
 42% of managers say that they utilize INCORRECT
information at least once a week.
 During 2007 in the USA there were 37.000
security breaches (cyber attacks) with an
increment of 158% versus 2006.
 More than 20.000 laws at global level require not
only pure storage capacity but classification and
Information lifecycle management.
4
Information Week, “State Of Enterprise Storage Changing Priorities, Changing Practices”, 2009.
© 2012 IBM Corporation
IBM Systems and Technology Group
Smarter Systems Are Creating an Information Explosion
1,800
 Storage requirements growing 20-40% per year
1,600
Exabytes
1,400
RFID,
1,200
Digital TV,
1,000
MP3 players,
Digital cameras,
800
Camera phones, VoIP,
Medical imaging, Laptops,
600
smart meters, multi-player games,
400
Satellite images, GPS, ATMs, Scanners,
Sensors, Digital radio, DLP theaters, Telematics ,
Peer-to-peer, Email, Instant messaging, Videoconferencing,
CAD/CAM, Toys, Industrial machines, Security systems, Appliances
200
0
2005
2006
2007
2008
2009
2010
2011
Source:: Semantics, “Linked Data” guidelines, 2006.
5
© 2012 IBM Corporation
IBM Systems and Technology Group
Changing
Market
Dynamics
& Trends
 Value has Shifted
toward
Archiving Software
– Shift from Hardware to Archiving Software for addressing compliance, data retention management
and lifecycle governance requirements
– Email archiving and eDiscovery adding additional content types
 Information Lifecycle Governance is needed
– Clients understand they can no longer address data growth issues by adding more storage
 Backup as Archive
– Significant proportion (over 50%) of customers continue to use backups as archive copies for long
term retention
 Industry Specific Archives
– Healthcare & Life Sciences requirements for archival of Medical Images and Electronic Medical
Records
– Government, Oil & Gas, and other industries demanding solutions specific to their needs
– Cross-Industry requirements also rising (e.g., Compliance, retaining Surveillance data for long periods
of time)
 Cloud Based Archiving
– Hosted offerings replaced by clouds (e.g., for eDiscovery)
– Shift in deployment models from ‘siloed’ on-premise installations to consolidated solutions, archive as
a service, and cloud archiving
6
© 2012 IBM Corporation
IBM Systems and Technology Group
Significant growth expected in Digital Archiving
 Archival (Tier 3) data is:
– Fastest growing at 65% CAGR
– Stored on Disk, Tape, and Optical Media
– (Not captured in Tape IDC or GMV forecasts)
7
Graph illustrates Active and Deep Archiving combined
© 2012 IBM Corporation
IBM Systems and Technology Group
Why store data for long-term, and how?
 Why I need to store for a long time?
– Cultural and scientific vale
– Value for the company
– More than 22.000 norms/laws at
worldwide level to rule the data
preservation
8
 How to store this data?
– Multi-level storage infrastructure
with different costs
– Data reduction (compression and
data deduplication)
– Automatic data management
based on archiving rules
– Virtualization and independence
from the storage infrastructure
– “anywhere” and “self-service”
accessibility cloud-oriented
– Focus on storing documents and
data interconnections (metadata)
together
© 2012 IBM Corporation
IBM Systems and Technology Group
What to archive and how much time?
 Which data needs to be stored?
Source: ESG - Requested Record Types During Electronic Discovery Processes
9
 How long to store?
Source: SNIA – 100 Year Archive Requirements Survey
© 2012 IBM Corporation
IBM Systems and Technology Group
You Might Think “Archiving” Means any of These…
 Archive -- a long-term collection of data that typically is fixed-content data; i.e., no I/O writes are allowed
to change the data.
 Deep archiving – The original definition of archiving, whereby production data is written to another set of
storage media (typically tape) and moved offsite while the original version is deleted (typically from disk).
 Active archiving – Data for which frequency of access is active rather than inactive, while frequency of
updating is nonexistent so the data is fixed (i.e., is unchanging) and not subject to I/O writes that could
change the data.
 Long-term archiving – Active archived data for which the frequency of access has fallen so low that a
tier of more cost-effective storage may be an appropriate place to house the data.
 Backup – a dated (i.e., specified-time) duplication of a designated set of data from a data source on one
set of media (typically disk) to a backup set of media (either disk or tape)
 Vaulting – Typically, the movement of data on tapes from a target site to a protected remote site.
Source of these definitions: Data Protection, David Hill, 2009, CRC Press
10
© 2012 IBM Corporation
IBM Systems and Technology Group
Major Archive Segments
Structured Data (database archiving)
eMail archiving, eDiscovery .
 What? Relational tables, rows, periodic
reports, retire applications
 Why? Reduce storage growth, improve
performance, lower cost, Compliance
(reports)
 Available products? IBM Optim with IBM
disk storage
 What? email, but any other data type
potentially too
 Why? Litigation support, Compliance
 Available products? IBM Content Collector
with IBM disk storage
Unstructured Data (files)
Unstructured Data (kept from birth)
What? MS office, SharePoint, contracts,
images, etc.
 Why? Reduce storage growth, offer a
service or product, improve performance,
lower cost, Compliance
 Available products? IBM Content
Collector, FileNet, Content Manager, etc.
What? Medical Images, “Content” (M&E),
DVS, Seismic shots, Scientific
Why? Reduce storage growth, offer a
service or product, improve performance,
lower cost
 Available products? VAD Medical Archive
solution or or LTFS/tape with an ISV app
11
© 2012 IBM Corporation
IBM Systems and Technology Group
Technologies for data archiving and preservation
 Fault tolerance: redundancy, ECC, RAID(*), ...
 Data protection: “space-efficient” internal replication
 Disaster recovery: automated remote data replication
 Data immutability: NENR(*) e WORM(*)
 Archiving and preservation rules: API(*) and standard interfaces
 Cost reduction: storage tiering, WORN(*)
 Data growth reduction: data deduplication and data compression
 Data security: data encryption and data shredding
 Access control: tamper protection, audit logs, ...
More than 50 years of continuous innovation
(*)
12
ECC = Error Correction Code,
RAID = Redundant Array of Independent Disk,
NENR = Non Erasable Non Rewritable,
WORM = Write Once Read Many,
API = Application Program Interface,
WORN = Write Once Read Never
© 2012 IBM Corporation
IBM Systems and Technology Group
Storage management at 360°:
archiving, backup, migration, DR
backup
copies
Archiving and
ILM
management
Migration to new
technologies
Compression?
De-duplication?
Enterprise
class
NENR/WORM
storage
Mid-range
Low-cost
Automated
Off-line
Manual
Off-line
NENR
WORM
NENR
Disaster
protection
Encryption?
The processes can be automated and repeated
13
© 2012 IBM Corporation
IBM Systems and Technology Group
The IBM Smart Archive strategy
ERP / CRM …
Reports
Content
(SAP, PeopleSoft …)
Collaborative
Paper
(Quickr, SharePoint)
(Documents, Images …)
Email
Data
(Notes, Exchange)
Optimized and Unified Assessment, Collection and Classification
Value Added Services
•
•
•
•
•
Optimization Services
System Services
Managed Services
Reference Architecture
Information Governance
Cloud Ready Archive
Storage with
Optional ECM
14
Flexible and Secure Infrastructure with
Unified Retention and Protection
On Premise
Appliance
As A Service
(Custom Config)
(Pre-Config)
(SaaS, Cloud Storage)
Integrated Compliance, Records Management, Analytics and eDiscovery
© 2012 IBM Corporation
IBM Systems and Technology Group
Long term data archiving: Total Cost
From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010.
15
© 2012 IBM Corporation
IBM Systems and Technology Group
Long term data archiving: TCO and technology evolution
16 From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010. © 2012 IBM Corporation
IBM Systems and Technology Group
Tape Advantages for Archiving/Long-Term Preservation
Tape
17
Disk
© 2012 IBM Corporation
Source: Tape The Digital Curator of the Information Age. By Fred Moore, President, Horison, Inc.
IBM Systems and Technology Group
Technology Roadmap Comparisons for TAPE, HDD, and NAND
Flash Outline : Implications for Data Storage Applications

The annual rate of areal density increases for TAPE will likely exceed the annual rate of
areal density increases for NAND and HDD
–
–
–

TAPE bit cell is large and paths for scaling to higher bit densities exist
NAND bit cells and HDD Patterned Media bit cells are approaching nanoscale issues in minimum
feature lithography requirements
NAND bit endurance or bit retention and HDD bit stability are approaching
A Possible Annual Areal Density Growth
Scenarios
–
–
–
20% for HDD
20% to 30% for NAND Flash
40% to 80% for TAPE

Implications for Storage:

TAPE, NAND, and HDD will continue to
offer complementary storage solutions
Implications for TAPE: TAPE volumetric
density will increase, enhancing its cost
advantages

18
© 2012 IBM Corporation
IBM Systems and Technology Group
Annual Areal Density Growth Rate Scenarios
 HDD – 20% to 25% – Transition to New Technology, Sensor Output, Lithography
 NAND Flash – 25% to 30% – Lithography and Endurance
 TAPE – 40% to 80% -- No Lithography Issues, Mechanical Realities
AREAL DENSITY (Gbit/in²)
10000
HDD Products
NAND Products
TAPE Products
1000
20%/yr
40%/yr
100
HDD
NAND
80%/yr
10
40%/yr
40%/yr
1
TAPE
0.1
2002
2004
2006
2008
2010
2012
2014
2016
2018
YEAR
19
© 2012 IBM Corporation
IBM Systems and Technology Group
Cost evolution of the magnetic storage
~6-10X
SSD
Source: IBM elaboration and Information Storage Industry Consortium (INSIC) – 2008
20
© 2012 IBM Corporation
IBM Systems and Technology Group
Magnetic Tape
 The cheaper storage support of the hierarchy
 Most used for long term archiving purposes
 LTO (Linear Tape Open) standard: Fifth generation available today with 1,5TB cartridges (3TB
compressed)
 January 2010: the IBM Zurich Research Laboratory performed a technology demonstration of a
35TB cartridge(1) . Today they are working on a technology demo of a 100TB cartridge.
http://lto.org/technology/roadmap.html
21
(1) http://www.ibm.com/press/us/en/pressrelease/29245.wss
© 2012 IBM Corporation
Indications in green = Live content
Indications in white = Edit in master
IBM Systems and Technology Group
Indications in blue
Template release: Oct 02
For the latest, go to http://w3.ibm.com/ibm/presentations
= Locked elements
Indications in black = Optional elements
• Group name:
14pt Arial Regular, white
Rich Media Driving New Storage
Requirements
Maximum length: 1 line
• Slide heading:
28pt Arial Regular,
blue R120 | G137 | B251
Smarter Systems Are Creating an Information Explosion
Especially in Media and Entertainment (M&E)
Maximum length: 2 lines
Video, images, etc. a
major factor driving growth
1,800
 Storage requirements growing 20-40% per year
1,600
1,400
Exabytes
• Slide body:
18pt Arial Regular, black
Square bullet color:
teal R045 | G182 | B179
Recommended maximum
text length: 5 principal
points
 Characteristics of data stored is changing
– Mix of traditional business data (ie. transactional, docs,
email, databases, and backup of those assets) vs “rich
media” (ie. video, images, digitized content, etc) is
rapidly changing
RFID,
1,200
Digital TV,
1,000
MP3 players,
Digital cameras,
800
600
Camera phones, VoIP,
Medical imaging, Laptops,
400
smart meters, multi-player games,
Satellite images, GPS, ATMs, Scanners,
Sensors, Digital radio, DLP theaters,Telematics,
Peer-to-peer, Email, Instant messaging, Videoconferencing,
CAD/CAM, Toys, Industrial machines, Security systems, Appliances
200
0
2005
2006
2007
2008
2009
2010
2011
Source:: Semantics, “Linked Data” guidelines, 2006.
3
IBM and BP Internal Use
Optional slide number:
10pt Arial Bold, white
• Title/subtitle/confidentiality line: 10pt Arial Regular, white
Maximum length: 1 line
© 2011 IBM Corporation
• Copyright: 10pt Aria
Regular, white
Information separated by vertical strokes,
with two spaces on either side
 Access & asset management profiles of rich media are significantly different from
traditional business data
– Much of traditional business data stored is a cost center
Regulatory, compliance, disaster recovery for business critical data and
processes
– Rich media is primarily stored for monetization purposes
Production archives and asset protection
Repurposing content and distribution
Long term archives to monetize assets
– BW changes everything
Eg. Key to M&E industry move
to digital workflows
access to/from content, business motivation to make content available
22
© 2012 IBM Corporation
IBM Systems and Technology Group
Elements to address new role of TAPE
 Self-Describing cartridge
–
–
Remove requirement to commit long term to tape software application
Content protection in event of database corruption or loss
 Improve content interchange/distribution
–
–
Eliminate need for common tape software across enterprise and/or interchange locations
Reduce cost of data interchange
 Partial Recall
–
Eliminate time penalty in moving large video content to tape in event of need small part of video
content (ie. Goal in game)
 Improved Tier management of content
–
–
Ease complexity in movement from Tier 1 (disk) to Tier 2 (online tape) and Tier 3 (archive)
Improve data import/export to system management
 $/GB, Power
–
Reduce cost of digital storage – power and $/min
 Open Standards
–
–
Large diverse infrastructure requires open standard
Standard/support of MXF video
 Long Term Content Archive Life
–
23
Archive life desire for 50-100 years
© 2012 IBM Corporation
IBM Systems and Technology Group
LTFS Value Proposition
 Digital archives need and want the Value Proposition of Tape:
–
–
–
–
–
$/GB – lowest cost storage
Watt/GB – green storage
Portability – ability to manage archive outside system
Scalability – easy to add additional storage (ie. buy cartridge)
Investment protection – LTO has an 8 generation roadmap (up to a 32TB cartridge (compr.))
 But - Inhibitors to use tape:
– Proprietary tape applications require long term commitment and support of tape application to
maintain archive
– Non-self describing data formats requiring centralized archive database to recover content on
individual tapes
– Import/export & distribution of tapes in archive is difficult due to proprietary tape applications
 Solution: LTFS addresses the inhibitors and unlocks value proposition of
tape for digital archives
–
–
–
–
–
24
Open, non-proprietary tape format
Self-describing data structure on cartridge
File system support on Linux, Mac, Windows provides:
Distribution and cross platform interchange
Enables transition to integrated file based tape/disk storage systems
© 2012 IBM Corporation
IBM Systems and Technology Group
Introduction to LTFS (Linear Tape File System)

IBM Linear Tape File System is:
1. Open Format for data which is written to tape
Describes the format of data and meta data stored on tape
Meta data is based on XML schema
Developed and disclosed by IBM
Applicable to LTO-5 and Jag-4
Requires tape partitioning
Engineering EMMY
Award – Oct 2011
2. File System support (code) to R/W tapes in LTFS format
externalizes the LTO-5 tape as file system
Enables standard applications to write/read LTFS tapes
Supports update, edit, delete of files on LTFS tape
Supports partial recall
Available on Linux, Mac OS X and Windows
25
© 2012 IBM Corporation
IBM Systems and Technology Group
Logical View of LTFS Volume

LTFS utilizes media partitioning (new to LTO Gen 5 and Jag 4)

The tape is logically divided “lengthwise”
•

(think C: & D: drives on single hard disk unit)
LTFS places the index on one partition and data on the other
LTFS Index XML
Index Partition
Guard Wraps
File
B
O
T
26
File
File
Data Partition
File
E
O
T
© 2012 IBM Corporation
IBM Systems and Technology Group
IBM : 60 Years of Tape Innovation
In tape drive technology
1952
IBM 726
1st magnetic tape drive
1964
IBM 2104
1st read/back drive
1959
IBM 729
1st read/write drive
1995
IBM 3590
1984
IBM 3480
1st cartridge drive
2000
LTO Gen1
2002
LTO Gen2
1999
IBM 3590E
2004
LTO Gen3
2003
3592 Gen1
2007
LTO Gen4
2010
LTO Gen5
2005
2008
TS1120
TS1130
(3592 G2) (3592 G3)
2011
TS1140
(3592 G4)
In tape automation and virtualization
1962
IBM Tractor System
1992
IBM 3495
1994
IBM 3494
1974
3850 MSS
2005
TS3200
TS3300
2000
TS3500
1999
1997
VTS G1 VTS G2
2001
VTS G3
2007
TS3400
2006
TS7740 (VTS Gen 4)
2005
TS7510 VTL
27
2008
TS2900
TS3500
High Density
2007
TS7520
2007
TS7530
2008
TS7720
2011
TS3500
Connector & Shuttle
2010
TS7610 2011
TS7740
2009 TS7680
2008
TS7650
TS7650G Appliance
TS7720
© 2012 IBM Corporation
IBM Systems and Technology Group
LTO Roadmap
http://ultrium.com/technology/roadmap.html
28
© 2012 IBM Corporation
IBM Systems and Technology Group
And data deduplication is the key
to using more disk more cost effectively!
29
© 2012 IBM Corporation
IBM Systems and Technology Group
IBM ProtecTIER® Deduplication Family
TS7650
ProtecTIER
Appliances
TS7620
ProtecTIER
Appliance
Express
TS7650G & TS7680
ProtecTIER
Gateways
Highest Performance
Largest Capacity
High Availability
Better
Performance
Larger Capacity
Scalable
Up to 2800 MB/sec
Good Performance
Entry Level
Up to 1 PB
Useable Capacity
Easy to Install
Up to 500 MB/sec
7 TB to 36 TB
Useable Capacity
Up to 145 MB/sec
5.5 TB and 11 TB
30
Useable Capacity
© 2012 IBM Corporation
IBM Systems and Technology Group
Koninklijke Bibliotheek
National Library of the Netherlands
• During year 2000 IBM and KB
projected and implemented a digital
data preservation system called
DIAS (Digital Information Archiving
System).
• DIAS is the solution for the archiving
and preservation of the multimedia
and electronic digital-format
documents.
Query
SIP
Data
Delivery
&
Ingest
Capture SIP
• DIAS is compliant to the OAIS(1)
standards related to the “logical” and
“physical preservation”.
• IBM realized the DIAS solution using
standard software components of
general usage: WebSphere, DB2,
Tivoli Storage Manager and Content
Manager.
(1) OAIS: http://public.ccsds.org/publications/archive/650x0b1.pdf
31
DIP
Preservation
Data
Management
AIP
Packaging Data
Access
&
DIP Delivery
AIP
Archival Storage
Administration
Monitoring & Logging
IBM DIAS - Digital Information Archiving System
Koninklijke Bibliotheek: http://www.kb.nl/dnp/e-depot/e-depot-en.html
© 2012 IBM Corporation
IBM Systems and Technology Group
Ecosystem: Thought Equity Motion
Sports Video Archiving in the Cloud
Challenges
• Low cost delivery platform for enterprise scale Video Supply
Chain as a Service
• Information growth of ~100 TB per month
• Easy self-serve access required by clients
Solution
• IBM LTFS at several global locations, including some client
facilities
• IBM System Storage® TS3200 Tape Library, LTO®-5 tape
drives
Benefits
• Opened up new business opportunities
• Enabled more predictable and transparent pricing for
clients
• Portable, interoperable, scalable, cost-effective data
protection and long-term storage
32
‘LTO 5 and LTFS
significantly reduce the
ancillary costs around
storage. This is a real
game-changer from
IBM’
Mark Lemmons
CTO, Thought Equity Motion
TEM with LTFS on Youtube: http://www.youtube.com/watch?v=M7w0jrkQnj4
TSP03327-USEN-00
© 2012 IBM Corporation
IBM Systems and Technology Group
Thank you
for your attention!
33
© 2012 IBM Corporation
Download