Managing Data for Long Retention Periods Database Archiving Trends & Best Practices Including a “Titan Archive” Example Craig S. Mullins Corporate Technologist http://www.neonesoft.com Authors This presentation was prepared by: Craig S. Mullins Corporate Technologist NEON Enterprise Software, Inc. 14100 Southwest Freeway, Suite 400 Sugar Land, TX 77478 Tel: 281.491.6366 Fax: 281.207.4973 E-mail: craig.mullins@neonesoft.com This document is protected under the copyright laws of the United States and other countries as an unpublished work. This document contains information that is proprietary and confidential to NEON Enterprise Software, which shall not be disclosed outside or duplicated, used, or disclosed in whole or in part for any purpose other than to evaluate NEON Enterprise Software products. Any use or disclosure in whole or in part of this information without the express written permission of NEON Enterprise Software is prohibited. © 2007 NEON Enterprise Software (Unpublished). All rights reserved. 1 Confidential Material of NEON Enterprise Software, Inc. Agenda Data Retention: The Long-Term Data Storage Problem Trends Driving the Need to Archive Long Term Data Storage Solutions or, what database archiving is and is not! Required Database Archiving Capabilities TITAN Archive for Long-Term Data Retention 2 Confidential Material of NEON Enterprise Software, Inc. Trends Impacting Data Retention Data Retention Issues: Data growth (125% CAGR) Length of retention requirement Varied types of data Security issues 0 4 Time Required Confidential Material of NEON Enterprise Software, Inc. 30 Yrs (and more) Data Retention Drivers Data Retention Requirements refer to the length of time you need to keep data Determined by laws – regulatory compliance More than 150 state and federal laws Dramatically increasing retention periods for corporate data Determined by business needs Reduce operational costs Isolate content from changes 5 Large volumes of data interfere with operations Protect archived/retained data from modification Confidential Material of NEON Enterprise Software, Inc. What is Meant by “Long Term Data Retention” Source: 100 Year Archive Requirements Survey (January 2007), SNIA-DMF 100 Year Archive Task Force 6 Confidential Material of NEON Enterprise Software, Inc. Retention Requirements Vary Source: 100 Year Archive Requirements Survey (January 2007), SNIA-DMF 100 Year Archive Task Force 7 Confidential Material of NEON Enterprise Software, Inc. Regulatory Compliance & Data Retention Requirements 8 Confidential Material of NEON Enterprise Software, Inc. Can You Read This? 9 Confidential Material of NEON Enterprise Software, Inc. Regulations Tracked by Gartner 10 Confidential Material of NEON Enterprise Software, Inc. Regulatory Compliance is International Country Examples of Regulations Australia Commonwealth Government’s Information Exchange Steering Committee, Evidence Act 1995, more than 80 acts governing retention requirements Brazil Electronic Government Programme, EU GMP Directive 1/356/EEC-9 France Model Requirements for the Management of Electronic Records, EU Directive 95/46/EC Germany Federal Data Protection Act, Model Requirements for the Management of Electronic Records, EU Directive 95/46/EC Japan Personal Data Protection Bill, J-SOX Switzerland Swiss Code of Obligations articles 957 and 962 United Data Protection Act, Civil Evidence Act 1995, Police and Criminal Evidence Kingdom Act 1984, Employment Practices Data Protection Code, Combined Code on Corporate Governance 2003 11 Confidential Material of NEON Enterprise Software, Inc. http://www.itcinstitute.com/ucp/index.aspx 12 Confidential Material of NEON Enterprise Software, Inc. Retention: The Need for Archiving… Paper Blueprints Forms Claims 13 Word Excel PDF XML IMS DB2 ORACLE SYBASE SQL Server IDMS Confidential Material of NEON Enterprise Software, Inc. VSAM Programs UNIX Files Outlook Lotus Notes Attachments Sound Pictures Video Discovery and e-Discovery Data retention and e-Discovery intersect… Discovery is the compulsory disclosure, at a party’s request, of information that relates to the litigation. (Source: Black’s Law Dictionary) Therefore, e-Discovery is the discovery of electronic information 14 Confidential Material of NEON Enterprise Software, Inc. E-Discovery Electronic evidence is the predominant form of discovery today. (Gartner, Research Note G00136366) Electronic evidence could encompass anything that is stored anywhere. (Gartner, Research Note G00133224) When data is being collected (for e-discovery) it is imperative that it is not changed in any way. Metadata must be preserved… (Gartner, Research Note G00133224) …it is not the job of the IT organization to determine what should and should not be saved or how individual business users manage their data. (Gartner, Research Note G00147735) Through 2007, more than half of IT organizations and in-house legal departments will lack the people and the appropriate skills to handle electronic discovery requirements (0.8 probability). (Gartner,Research Note G00131014) By YE10, 75% of IT departments in large enterprises will employ one or more legal IT or e-discovery specialists (0.8 probability). (Gartner, Research Note G00146630) 15 Confidential Material of NEON Enterprise Software, Inc. E-Discovery and FRCP Changes to the Federal Rules of Civil Procedure Examples: Rules 26(b)(1) and 34(b) Changes took effect December 2006 A party who produces documents for inspection shall produce them . . . “as they are kept in the usual course of business...” The rules no longer use the term “data compilations” instead using the term “electronically stored information” The amended rules state that requested information must be turned over within 90 to 120 days after a complaint has been served. So data stored in database systems must be produced in electronic form. 16 Confidential Material of NEON Enterprise Software, Inc. What Does It Mean? Enterprises must recognize that there is a business value in organizing their information and data. Organizations that fail to respond run the risk of seeing more of their cases decided on questions of process rather than merit. (Gartner, 20-April-2007, Research Note G00148170: “Cost of E-Discovery Threatens to Skew Justice System”) 17 Confidential Material of NEON Enterprise Software, Inc. Some Sample Cases Morgan Stanley - $1.6B – lost backup tapes UBS Warburg - $29.3M – deleted email, could not produce backups. Court to jurors: “Assume discarded emails would have negatively impacted the case” Bank of America - $10M – “repeatedly failed to promptly furnish email” Philip Morris - $2.75M – did not save email Arthur Anderson - $500,000 (overturned) 18 Innocently destroyed documents involved in a court “hold” order Confidential Material of NEON Enterprise Software, Inc. Retiring Legacy Applications Older applications, perhaps running on outdated hardware and/or outdated DBMS — May be looking to save HW and/or SW licensing costs Archive the data to a secure database archive Retire the application – and perhaps HW/SW Application data is secured and available for access from the database archive 19 Confidential Material of NEON Enterprise Software, Inc. An Example of Retiring a Legacy Application Only one IMS application remaining CTH0 … … … Archive IMS data Archive Store Retire IMS application and database Data & Metadata Eliminate IMS license 20 Continue to access the data in the archive Confidential Material of NEON Enterprise Software, Inc. Operational Efficiency Drives Archiving, Too In addition to supporting regulatory compliance requirements, database archiving improves operational efficiency 21 Large volumes of data in operational databases interfere with production operations — Efficiency of transactions — Efficiency of utilities: COPY, REORG, etc. — Improved storage Archived data can be stored on cheaper media Gartner: databases copied an average of 6 times! Confidential Material of NEON Enterprise Software, Inc. Key Reasons to Archive Source: Forrester Research, Database Archiving Remains An Important Part Of Enterprise DBMS Strategy (August 13, 2007) 22 Confidential Material of NEON Enterprise Software, Inc. Database Archiving Purge Database Archiving: The process of removing selected data records from operational databases that are not expected to be referenced again and storing them in an archive data store where they can be retrieved if needed. Data comes from a structured DBMS (DB2, IMS, etc.) Selection criteria is logical rather than physical 23 Data can be retrieved after a long period of time, regardless of whether the original DBMS is still in place (data independence) Confidential Material of NEON Enterprise Software, Inc. The Lifecycle of Data Create Operational Reference Archive Needed for completing business transactions Needed for reporting or expected queries Needed for compliance and business protection Mandatory Retention Period 24 Confidential Material of NEON Enterprise Software, Inc. Discard Database or Archive? Keep in DB Performance Space Compliance 25 Confidential Material of NEON Enterprise Software, Inc. Keep in Archive Based on Data Availability Keep in DB Must be Available to App Must be Available Must Be Secure Not Needed 26 Confidential Material of NEON Enterprise Software, Inc. Keep in Archive Purge What Solutions Are Out There? Keep Data in Operational Database — — Store Data in UNLOAD files (or backups) — Problems with schema change and reading archived data — Using backups poses even more serious problems Move Data to a Parallel Reference Database — Combines problems of the previous two Move Data to a Database Archive — 27 Problems with authenticity of large amounts of data over long retention times Operational performance degradation Secure, durable, and accessible Confidential Material of NEON Enterprise Software, Inc. Components of a Database Archiving Solution Production Database Metadata Capture, Archive & Retention Policies Archive Data Query & Access Data Extract Recall Database Archive Data Store and Retrieve Data Recall Archive Store Metadata Policies History Archive Administration 28 Confidential Material of NEON Enterprise Software, Inc. Data & Metadata Database Archiving Requirements Policy-Driven — Policy based archiving: logical selection — Discard data after retention period Data Requirements — Store very large amounts of data in archive — Keep data for very long periods of time — Access data when needed; as needed — 29 Even across schema breaks Protect authenticity of data Independence — Maintain archives for ever-changing operational systems — Become independent from operational metadata — Become independent from Applications/DBMS/Systems Confidential Material of NEON Enterprise Software, Inc. Database Data is at Risk! Source: 100 Year Archive Requirements Survey (January 2007), SNIA-DMF 100 Year Archive Task Force 30 Confidential Material of NEON Enterprise Software, Inc. Database Archiving Storage Capacity Total Worldwide Database Archive Capacity, 2007-2012 (Petabytes) 16,000 13,639 14,000 12,000 10,000 8,110 8,000 6,000 4,824 4,000 2,991 2,000 1,198 1,838 0 2007 2008 2009 Source: Enterprise Strategy Group 31 Confidential Material of NEON Enterprise Software, Inc. 2010 2011 2012 TITAN Archive An Introduction to Database Archiving with TITAN ArchiveTM TITAN Archive NEON Enterprise Software’s database archive solution: 33 Architected as a long-term data retention solution. Built to address the regulatory compliance issues impacting data and database systems. Delivers operational benefits to the existing database environment and applications. Supports e-Discovery needs Built from the ground-up as an enterprise solution. Confidential Material of NEON Enterprise Software, Inc. TITAN Architecture 34 Confidential Material of NEON Enterprise Software, Inc. Details of the Physical Architecture zOS Browser HTTP DB2, IMS TITAN Extractor LINUX MQ SOCKETS Linux/Unix/ Windows ORACLE, Sybase, UDB, SQL Server 35 TITAN Archive Appliance TITAN Extractor Confidential Material of NEON Enterprise Software, Inc. TITAN Archive Catalog, TITAN EADO SAN TITAN EADO Product Task Overview USER ROLES ADMINISTRATOR CREATE APPLICATION • Add schema • Edit schema objects • Edit relationships • Validate schema ADMINISTRATOR ASSIGN USERS ANALYST DEFINE SCHEMA 36 Confidential Material of NEON Enterprise Software, Inc. Search DB2 for desired objects 37 Confidential Material of NEON Enterprise Software, Inc. Drag and Drop objects from the DB2 catalog into Titan 38 Confidential Material of NEON Enterprise Software, Inc. Use the Schema Editor to arrange objects and modify relationships 39 Confidential Material of NEON Enterprise Software, Inc. Use the Column Editor to expand the basic metadata by adding annotations 40 Confidential Material of NEON Enterprise Software, Inc. IMS Schema Editor IMS Schema Editor Use the Schema Editor to arrange objects and modify relationships 41 Confidential Material of NEON Enterprise Software, Inc. Product Task Overview USER ROLES ADMINISTRATOR • Add plan • Copy tables from schema • Select root table • Assign archive actions • Review relationships • Specify EADO properties • Validate and register the plan CREATE APPLICATION ADMINISTRATOR ASSIGN USERS ANALYST ANALYST DEFINE ARCHIVE PLAN 42 Confidential Material of NEON Enterprise Software, Inc. DEFINE SCHEMA Remember the schema that we created in the previous section. We want to build a plan to archive accounts… 43 Confidential Material of NEON Enterprise Software, Inc. Customer data is treated as reference data and is to be copied to the archive Accounts data is to be moved to the archive Use the Plan Editor to design each archive Plan 44 Transaction data is to be moved to the archive Confidential Material of NEON Enterprise Software, Inc. Product Task Overview USER ROLES ADMINISTRATOR CREATE APPLICATION • Add job to plan • Specify job options • Generate JCL • Edit JCL • Register the job ANALYST DEFINE ARCHIVE JOB Confidential Material of NEON Enterprise Software, Inc. ASSIGN USERS ANALYST ANALYST DEFINE ARCHIVE PLAN 45 ADMINISTRATOR DEFINE SCHEMA Use the job editor to define the parameters for the archive extract 46 Confidential Material of NEON Enterprise Software, Inc. Define the archive policy using standard SQL . 47 Confidential Material of NEON Enterprise Software, Inc. Define archive storage attributes such as number of copies, storage locations and encryption keys. 48 Confidential Material of NEON Enterprise Software, Inc. Define the discard policy using standard SQL . 49 Confidential Material of NEON Enterprise Software, Inc. Getting Started—Product Task Overview USER ROLES ADMINISTRATOR CREATE APPLICATION ANALYST ADMINISTRATOR RUN SIMULATION AND/OR STATS TROUBLESHOOT AND VERIFY ASSIGN USERS ANALYST DEFINE ARCHIVE JOB ANALYST ANALYST DEFINE ARCHIVE PLAN 50 Confidential Material of NEON Enterprise Software, Inc. DEFINE SCHEMA Define JCL parameters with the job editor 51 For initial testing and validation run the extract in simulate mode Confidential Material of NEON Enterprise Software, Inc. Getting Started—Product Task Overview USER ROLES ANALYST OR READER ADMINISTRATOR TROUBLESHOOT/ VERIFY ARCHIVE CREATE APPLICATION ANALYST ADMINISTRATOR RUN SIMULATION AND/OR STATS ASSIGN USERS ANALYST DEFINE ARCHIVE JOB ANALYST ANALYST DEFINE ARCHIVE PLAN 52 Confidential Material of NEON Enterprise Software, Inc. DEFINE SCHEMA Titan will generate the archive extract JCL. For testing purposed the job can be submitted through the GUI. Eventually you will want to add the job to your scheduler. 53 Confidential Material of NEON Enterprise Software, Inc. Archive extract results can be viewed on the operational system 54 Confidential Material of NEON Enterprise Software, Inc. Or from within Titan 55 Confidential Material of NEON Enterprise Software, Inc. Data can be retrieved from the archive by using the Titan select tool or . . . 56 Confidential Material of NEON Enterprise Software, Inc. Data can be retrieved using any JDBC compliant SQL tool. Here we use a free tool called SQuirreL to access data that we have archived with Titan 57 Confidential Material of NEON Enterprise Software, Inc. Getting Started—Product Task Overview ANALYST USER ROLES SCHEDULE & RUN ARCHIVE JOB ANALYST OR READER ADMINISTRATOR TROUBLESHOOT/ VERIFY ARCHIVE CREATE APPLICATION ANALYST ADMINISTRATOR RUN SIMULATION AND/OR STATS ASSIGN USERS ANALYST DEFINE ARCHIVE JOB ANALYST ANALYST DEFINE ARCHIVE PLAN 58 Confidential Material of NEON Enterprise Software, Inc. DEFINE SCHEMA Highlighted TITAN Archive Features Protects Data Authenticity Archived data never changes Data can be encrypted Role-based security Signature / Checksum Discard Policies Forensic discard (zeroed out) Flexible policy-based definition Discards archive & backups Handles Media Rot Contingency Planning Up to four backup copies Recover ability Access Assistance 60 EADO Indexing EADO Scoping Variables Confidential Material of NEON Enterprise Software, Inc. Checks viability of media Repurpose to new media The Three Most Important Things to Remember TITAN Archive Qualities Accessible: the right data can be retrieved in the required timeframe Authentic: the data is unchanging, accurately represents the business, and can be proven as such for legal purposes Enterprise: engineered to be non-disruptive 61 Confidential Material of NEON Enterprise Software, Inc. Summary Points Keeping data in operational systems is a bad idea Putting data in UNLOAD files is a bad idea Putting data in a parallel references database is a bad idea Using a DBMS to store the archive does not work Database archiving requires a great deal of data design — — — Database archives must be continuously managed — — — — 62 Establishing and maintaining metadata Designing how data looks in the archive Achieving application independence Copying data for storage problems (e.g. media rot) Copying data for system changes Copying data for data encoding standard changes Logging, auditing, and monitoring Archive events Partition management Accesses New IT/Business Position: Database Archivist Confidential Material of NEON Enterprise Software, Inc. Database Archivist… or Archive Analyst If you are serious about long-term data retention you will need to staff a database archivist: Understand the retention requirements & regulations Understand business data — Interact with business and legal experts Build archive plans and jobs On-going archive administration 63 Classification of data to match to regulations Assist in e-discovery and other projects that require access to archived data Confidential Material of NEON Enterprise Software, Inc. Intelligent Solutions for Enterprise Data. 64 Confidential Material of NEON Enterprise Software, Inc. Craig S. Mullins – Contact Info NEON Enterprise Software, Inc. craig.mullins@neonesoft.com www.neonesoft.com www.craigsmullins.com www.DB2portal.com 65 Confidential Material of NEON Enterprise Software, Inc.