IBM Systems and Technology Group The IBM view on storage archive solutions: requirements to solve and trends for the future 31st ADLUG ANNUAL MEETING - Firenze, September 19-21st Marco Ceresoli Data Protection and Retention Sales Leader IBM Europe © 2012 IBM Corporation IBM Systems and Technology Group Agenda The growth and the variety of digital information The shift of market dynamics and trends for Archiving Technologies for data archiving: comparison New trends: Linear Tape File System value proposition Role and history of IBM in Tape technology Case studies and conclusions 2 © 2012 IBM Corporation IBM Systems and Technology Group Storage is growing… and not only in terms of capacity •Velocity • Variety • Volumes Growth Digital Universe 2005 150 ExaByte (150 millions TB) 3 Source: 2011 IDC Digital Universe Study Growth Digital Universe 2011 1.800 ExaByte (1,8 billionsTB) © 2012 IBM Corporation IBM Systems and Technology Group Every day 15 PetaBytes of new information in digital format are created 80% of this new data is unstructured generated mainly by email, documents, images, video and audio. EFFECTS… A company with 1,000 employees spend on average 5,3M$ every year to search for information which is difficult to find. 42% of managers say that they utilize INCORRECT information at least once a week. During 2007 in the USA there were 37.000 security breaches (cyber attacks) with an increment of 158% versus 2006. More than 20.000 laws at global level require not only pure storage capacity but classification and Information lifecycle management. 4 Information Week, “State Of Enterprise Storage Changing Priorities, Changing Practices”, 2009. © 2012 IBM Corporation IBM Systems and Technology Group Smarter Systems Are Creating an Information Explosion 1,800 Storage requirements growing 20-40% per year 1,600 Exabytes 1,400 RFID, 1,200 Digital TV, 1,000 MP3 players, Digital cameras, 800 Camera phones, VoIP, Medical imaging, Laptops, 600 smart meters, multi-player games, 400 Satellite images, GPS, ATMs, Scanners, Sensors, Digital radio, DLP theaters, Telematics , Peer-to-peer, Email, Instant messaging, Videoconferencing, CAD/CAM, Toys, Industrial machines, Security systems, Appliances 200 0 2005 2006 2007 2008 2009 2010 2011 Source:: Semantics, “Linked Data” guidelines, 2006. 5 © 2012 IBM Corporation IBM Systems and Technology Group Changing Market Dynamics & Trends Value has Shifted toward Archiving Software – Shift from Hardware to Archiving Software for addressing compliance, data retention management and lifecycle governance requirements – Email archiving and eDiscovery adding additional content types Information Lifecycle Governance is needed – Clients understand they can no longer address data growth issues by adding more storage Backup as Archive – Significant proportion (over 50%) of customers continue to use backups as archive copies for long term retention Industry Specific Archives – Healthcare & Life Sciences requirements for archival of Medical Images and Electronic Medical Records – Government, Oil & Gas, and other industries demanding solutions specific to their needs – Cross-Industry requirements also rising (e.g., Compliance, retaining Surveillance data for long periods of time) Cloud Based Archiving – Hosted offerings replaced by clouds (e.g., for eDiscovery) – Shift in deployment models from ‘siloed’ on-premise installations to consolidated solutions, archive as a service, and cloud archiving 6 © 2012 IBM Corporation IBM Systems and Technology Group Significant growth expected in Digital Archiving Archival (Tier 3) data is: – Fastest growing at 65% CAGR – Stored on Disk, Tape, and Optical Media – (Not captured in Tape IDC or GMV forecasts) 7 Graph illustrates Active and Deep Archiving combined © 2012 IBM Corporation IBM Systems and Technology Group Why store data for long-term, and how? Why I need to store for a long time? – Cultural and scientific vale – Value for the company – More than 22.000 norms/laws at worldwide level to rule the data preservation 8 How to store this data? – Multi-level storage infrastructure with different costs – Data reduction (compression and data deduplication) – Automatic data management based on archiving rules – Virtualization and independence from the storage infrastructure – “anywhere” and “self-service” accessibility cloud-oriented – Focus on storing documents and data interconnections (metadata) together © 2012 IBM Corporation IBM Systems and Technology Group What to archive and how much time? Which data needs to be stored? Source: ESG - Requested Record Types During Electronic Discovery Processes 9 How long to store? Source: SNIA – 100 Year Archive Requirements Survey © 2012 IBM Corporation IBM Systems and Technology Group You Might Think “Archiving” Means any of These… Archive -- a long-term collection of data that typically is fixed-content data; i.e., no I/O writes are allowed to change the data. Deep archiving – The original definition of archiving, whereby production data is written to another set of storage media (typically tape) and moved offsite while the original version is deleted (typically from disk). Active archiving – Data for which frequency of access is active rather than inactive, while frequency of updating is nonexistent so the data is fixed (i.e., is unchanging) and not subject to I/O writes that could change the data. Long-term archiving – Active archived data for which the frequency of access has fallen so low that a tier of more cost-effective storage may be an appropriate place to house the data. Backup – a dated (i.e., specified-time) duplication of a designated set of data from a data source on one set of media (typically disk) to a backup set of media (either disk or tape) Vaulting – Typically, the movement of data on tapes from a target site to a protected remote site. Source of these definitions: Data Protection, David Hill, 2009, CRC Press 10 © 2012 IBM Corporation IBM Systems and Technology Group Major Archive Segments Structured Data (database archiving) eMail archiving, eDiscovery . What? Relational tables, rows, periodic reports, retire applications Why? Reduce storage growth, improve performance, lower cost, Compliance (reports) Available products? IBM Optim with IBM disk storage What? email, but any other data type potentially too Why? Litigation support, Compliance Available products? IBM Content Collector with IBM disk storage Unstructured Data (files) Unstructured Data (kept from birth) What? MS office, SharePoint, contracts, images, etc. Why? Reduce storage growth, offer a service or product, improve performance, lower cost, Compliance Available products? IBM Content Collector, FileNet, Content Manager, etc. What? Medical Images, “Content” (M&E), DVS, Seismic shots, Scientific Why? Reduce storage growth, offer a service or product, improve performance, lower cost Available products? VAD Medical Archive solution or or LTFS/tape with an ISV app 11 © 2012 IBM Corporation IBM Systems and Technology Group Technologies for data archiving and preservation Fault tolerance: redundancy, ECC, RAID(*), ... Data protection: “space-efficient” internal replication Disaster recovery: automated remote data replication Data immutability: NENR(*) e WORM(*) Archiving and preservation rules: API(*) and standard interfaces Cost reduction: storage tiering, WORN(*) Data growth reduction: data deduplication and data compression Data security: data encryption and data shredding Access control: tamper protection, audit logs, ... More than 50 years of continuous innovation (*) 12 ECC = Error Correction Code, RAID = Redundant Array of Independent Disk, NENR = Non Erasable Non Rewritable, WORM = Write Once Read Many, API = Application Program Interface, WORN = Write Once Read Never © 2012 IBM Corporation IBM Systems and Technology Group Storage management at 360°: archiving, backup, migration, DR backup copies Archiving and ILM management Migration to new technologies Compression? De-duplication? Enterprise class NENR/WORM storage Mid-range Low-cost Automated Off-line Manual Off-line NENR WORM NENR Disaster protection Encryption? The processes can be automated and repeated 13 © 2012 IBM Corporation IBM Systems and Technology Group The IBM Smart Archive strategy ERP / CRM … Reports Content (SAP, PeopleSoft …) Collaborative Paper (Quickr, SharePoint) (Documents, Images …) Email Data (Notes, Exchange) Optimized and Unified Assessment, Collection and Classification Value Added Services • • • • • Optimization Services System Services Managed Services Reference Architecture Information Governance Cloud Ready Archive Storage with Optional ECM 14 Flexible and Secure Infrastructure with Unified Retention and Protection On Premise Appliance As A Service (Custom Config) (Pre-Config) (SaaS, Cloud Storage) Integrated Compliance, Records Management, Analytics and eDiscovery © 2012 IBM Corporation IBM Systems and Technology Group Long term data archiving: Total Cost From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010. 15 © 2012 IBM Corporation IBM Systems and Technology Group Long term data archiving: TCO and technology evolution 16 From: “In Search of the Long-Term Archiving Solution - Tape Delivers Significant TCO Advantage over Disk”, The Clipper Group, Dec.23, 2010. © 2012 IBM Corporation IBM Systems and Technology Group Tape Advantages for Archiving/Long-Term Preservation Tape 17 Disk © 2012 IBM Corporation Source: Tape The Digital Curator of the Information Age. By Fred Moore, President, Horison, Inc. IBM Systems and Technology Group Technology Roadmap Comparisons for TAPE, HDD, and NAND Flash Outline : Implications for Data Storage Applications The annual rate of areal density increases for TAPE will likely exceed the annual rate of areal density increases for NAND and HDD – – – TAPE bit cell is large and paths for scaling to higher bit densities exist NAND bit cells and HDD Patterned Media bit cells are approaching nanoscale issues in minimum feature lithography requirements NAND bit endurance or bit retention and HDD bit stability are approaching A Possible Annual Areal Density Growth Scenarios – – – 20% for HDD 20% to 30% for NAND Flash 40% to 80% for TAPE Implications for Storage: TAPE, NAND, and HDD will continue to offer complementary storage solutions Implications for TAPE: TAPE volumetric density will increase, enhancing its cost advantages 18 © 2012 IBM Corporation IBM Systems and Technology Group Annual Areal Density Growth Rate Scenarios HDD – 20% to 25% – Transition to New Technology, Sensor Output, Lithography NAND Flash – 25% to 30% – Lithography and Endurance TAPE – 40% to 80% -- No Lithography Issues, Mechanical Realities AREAL DENSITY (Gbit/in²) 10000 HDD Products NAND Products TAPE Products 1000 20%/yr 40%/yr 100 HDD NAND 80%/yr 10 40%/yr 40%/yr 1 TAPE 0.1 2002 2004 2006 2008 2010 2012 2014 2016 2018 YEAR 19 © 2012 IBM Corporation IBM Systems and Technology Group Cost evolution of the magnetic storage ~6-10X SSD Source: IBM elaboration and Information Storage Industry Consortium (INSIC) – 2008 20 © 2012 IBM Corporation IBM Systems and Technology Group Magnetic Tape The cheaper storage support of the hierarchy Most used for long term archiving purposes LTO (Linear Tape Open) standard: Fifth generation available today with 1,5TB cartridges (3TB compressed) January 2010: the IBM Zurich Research Laboratory performed a technology demonstration of a 35TB cartridge(1) . Today they are working on a technology demo of a 100TB cartridge. http://lto.org/technology/roadmap.html 21 (1) http://www.ibm.com/press/us/en/pressrelease/29245.wss © 2012 IBM Corporation Indications in green = Live content Indications in white = Edit in master IBM Systems and Technology Group Indications in blue Template release: Oct 02 For the latest, go to http://w3.ibm.com/ibm/presentations = Locked elements Indications in black = Optional elements • Group name: 14pt Arial Regular, white Rich Media Driving New Storage Requirements Maximum length: 1 line • Slide heading: 28pt Arial Regular, blue R120 | G137 | B251 Smarter Systems Are Creating an Information Explosion Especially in Media and Entertainment (M&E) Maximum length: 2 lines Video, images, etc. a major factor driving growth 1,800 Storage requirements growing 20-40% per year 1,600 1,400 Exabytes • Slide body: 18pt Arial Regular, black Square bullet color: teal R045 | G182 | B179 Recommended maximum text length: 5 principal points Characteristics of data stored is changing – Mix of traditional business data (ie. transactional, docs, email, databases, and backup of those assets) vs “rich media” (ie. video, images, digitized content, etc) is rapidly changing RFID, 1,200 Digital TV, 1,000 MP3 players, Digital cameras, 800 600 Camera phones, VoIP, Medical imaging, Laptops, 400 smart meters, multi-player games, Satellite images, GPS, ATMs, Scanners, Sensors, Digital radio, DLP theaters,Telematics, Peer-to-peer, Email, Instant messaging, Videoconferencing, CAD/CAM, Toys, Industrial machines, Security systems, Appliances 200 0 2005 2006 2007 2008 2009 2010 2011 Source:: Semantics, “Linked Data” guidelines, 2006. 3 IBM and BP Internal Use Optional slide number: 10pt Arial Bold, white • Title/subtitle/confidentiality line: 10pt Arial Regular, white Maximum length: 1 line © 2011 IBM Corporation • Copyright: 10pt Aria Regular, white Information separated by vertical strokes, with two spaces on either side Access & asset management profiles of rich media are significantly different from traditional business data – Much of traditional business data stored is a cost center Regulatory, compliance, disaster recovery for business critical data and processes – Rich media is primarily stored for monetization purposes Production archives and asset protection Repurposing content and distribution Long term archives to monetize assets – BW changes everything Eg. Key to M&E industry move to digital workflows access to/from content, business motivation to make content available 22 © 2012 IBM Corporation IBM Systems and Technology Group Elements to address new role of TAPE Self-Describing cartridge – – Remove requirement to commit long term to tape software application Content protection in event of database corruption or loss Improve content interchange/distribution – – Eliminate need for common tape software across enterprise and/or interchange locations Reduce cost of data interchange Partial Recall – Eliminate time penalty in moving large video content to tape in event of need small part of video content (ie. Goal in game) Improved Tier management of content – – Ease complexity in movement from Tier 1 (disk) to Tier 2 (online tape) and Tier 3 (archive) Improve data import/export to system management $/GB, Power – Reduce cost of digital storage – power and $/min Open Standards – – Large diverse infrastructure requires open standard Standard/support of MXF video Long Term Content Archive Life – 23 Archive life desire for 50-100 years © 2012 IBM Corporation IBM Systems and Technology Group LTFS Value Proposition Digital archives need and want the Value Proposition of Tape: – – – – – $/GB – lowest cost storage Watt/GB – green storage Portability – ability to manage archive outside system Scalability – easy to add additional storage (ie. buy cartridge) Investment protection – LTO has an 8 generation roadmap (up to a 32TB cartridge (compr.)) But - Inhibitors to use tape: – Proprietary tape applications require long term commitment and support of tape application to maintain archive – Non-self describing data formats requiring centralized archive database to recover content on individual tapes – Import/export & distribution of tapes in archive is difficult due to proprietary tape applications Solution: LTFS addresses the inhibitors and unlocks value proposition of tape for digital archives – – – – – 24 Open, non-proprietary tape format Self-describing data structure on cartridge File system support on Linux, Mac, Windows provides: Distribution and cross platform interchange Enables transition to integrated file based tape/disk storage systems © 2012 IBM Corporation IBM Systems and Technology Group Introduction to LTFS (Linear Tape File System) IBM Linear Tape File System is: 1. Open Format for data which is written to tape Describes the format of data and meta data stored on tape Meta data is based on XML schema Developed and disclosed by IBM Applicable to LTO-5 and Jag-4 Requires tape partitioning Engineering EMMY Award – Oct 2011 2. File System support (code) to R/W tapes in LTFS format externalizes the LTO-5 tape as file system Enables standard applications to write/read LTFS tapes Supports update, edit, delete of files on LTFS tape Supports partial recall Available on Linux, Mac OS X and Windows 25 © 2012 IBM Corporation IBM Systems and Technology Group Logical View of LTFS Volume LTFS utilizes media partitioning (new to LTO Gen 5 and Jag 4) The tape is logically divided “lengthwise” • (think C: & D: drives on single hard disk unit) LTFS places the index on one partition and data on the other LTFS Index XML Index Partition Guard Wraps File B O T 26 File File Data Partition File E O T © 2012 IBM Corporation IBM Systems and Technology Group IBM : 60 Years of Tape Innovation In tape drive technology 1952 IBM 726 1st magnetic tape drive 1964 IBM 2104 1st read/back drive 1959 IBM 729 1st read/write drive 1995 IBM 3590 1984 IBM 3480 1st cartridge drive 2000 LTO Gen1 2002 LTO Gen2 1999 IBM 3590E 2004 LTO Gen3 2003 3592 Gen1 2007 LTO Gen4 2010 LTO Gen5 2005 2008 TS1120 TS1130 (3592 G2) (3592 G3) 2011 TS1140 (3592 G4) In tape automation and virtualization 1962 IBM Tractor System 1992 IBM 3495 1994 IBM 3494 1974 3850 MSS 2005 TS3200 TS3300 2000 TS3500 1999 1997 VTS G1 VTS G2 2001 VTS G3 2007 TS3400 2006 TS7740 (VTS Gen 4) 2005 TS7510 VTL 27 2008 TS2900 TS3500 High Density 2007 TS7520 2007 TS7530 2008 TS7720 2011 TS3500 Connector & Shuttle 2010 TS7610 2011 TS7740 2009 TS7680 2008 TS7650 TS7650G Appliance TS7720 © 2012 IBM Corporation IBM Systems and Technology Group LTO Roadmap http://ultrium.com/technology/roadmap.html 28 © 2012 IBM Corporation IBM Systems and Technology Group And data deduplication is the key to using more disk more cost effectively! 29 © 2012 IBM Corporation IBM Systems and Technology Group IBM ProtecTIER® Deduplication Family TS7650 ProtecTIER Appliances TS7620 ProtecTIER Appliance Express TS7650G & TS7680 ProtecTIER Gateways Highest Performance Largest Capacity High Availability Better Performance Larger Capacity Scalable Up to 2800 MB/sec Good Performance Entry Level Up to 1 PB Useable Capacity Easy to Install Up to 500 MB/sec 7 TB to 36 TB Useable Capacity Up to 145 MB/sec 5.5 TB and 11 TB 30 Useable Capacity © 2012 IBM Corporation IBM Systems and Technology Group Koninklijke Bibliotheek National Library of the Netherlands • During year 2000 IBM and KB projected and implemented a digital data preservation system called DIAS (Digital Information Archiving System). • DIAS is the solution for the archiving and preservation of the multimedia and electronic digital-format documents. Query SIP Data Delivery & Ingest Capture SIP • DIAS is compliant to the OAIS(1) standards related to the “logical” and “physical preservation”. • IBM realized the DIAS solution using standard software components of general usage: WebSphere, DB2, Tivoli Storage Manager and Content Manager. (1) OAIS: http://public.ccsds.org/publications/archive/650x0b1.pdf 31 DIP Preservation Data Management AIP Packaging Data Access & DIP Delivery AIP Archival Storage Administration Monitoring & Logging IBM DIAS - Digital Information Archiving System Koninklijke Bibliotheek: http://www.kb.nl/dnp/e-depot/e-depot-en.html © 2012 IBM Corporation IBM Systems and Technology Group Ecosystem: Thought Equity Motion Sports Video Archiving in the Cloud Challenges • Low cost delivery platform for enterprise scale Video Supply Chain as a Service • Information growth of ~100 TB per month • Easy self-serve access required by clients Solution • IBM LTFS at several global locations, including some client facilities • IBM System Storage® TS3200 Tape Library, LTO®-5 tape drives Benefits • Opened up new business opportunities • Enabled more predictable and transparent pricing for clients • Portable, interoperable, scalable, cost-effective data protection and long-term storage 32 ‘LTO 5 and LTFS significantly reduce the ancillary costs around storage. This is a real game-changer from IBM’ Mark Lemmons CTO, Thought Equity Motion TEM with LTFS on Youtube: http://www.youtube.com/watch?v=M7w0jrkQnj4 TSP03327-USEN-00 © 2012 IBM Corporation IBM Systems and Technology Group Thank you for your attention! 33 © 2012 IBM Corporation