Cross-Industry Preservation Architectures Michael Peterson May 2011 “Cross-Industry Preservation Architectures” – PASIG May, 2011 1 Introductions ● Michael Peterson Founder, past President, and Chief Strategy Advocate for the SNIA Currently working on Cloud Archive and Long-term Retention standards, best practices, market education Information Services Architect – consulting in long-term retention and digital preservation system design and implementation Currently driving the LTDPRM.org & ILM2.0.org Communities Author: “100 Year Archive Requirements Study,” 2008, “Building a Terminology Bridge: Guidelines for Digital Information Retention and Preservation Practices in the Datacenter,” Sept. 2009 “Cross-Industry Preservation Architectures” – PASIG May, 2011 2 www.ltdprm.org “Cross-Industry Preservation Architectures” – PASIG May, 2011 3 Agenda ● LTDP Paradoxes (Laws designed to be broken) ● New Digital Preservation Models That the cloud empowers ● Using the Cloud for Digital Archives (Digital Preservation) “Cross-Industry Preservation Architectures” – PASIG May, 2011 4 4 Paradoxes of Digital Preservation ● Data will be lost ● Migration does not scale ● Access & use models keep changing ● Cost overwhelms everything complexity does not “Cross-Industry Preservation Architectures” – PASIG May, 2011 5 “Old” Laws of Digital Preservation How do we break them? “Cross-Industry Preservation Architectures” – PASIG May, 2011 6 100 Year Complexity Barrier ● Overwhelming growth, cost, change Constant Physical and Logical migration Power, cooling, space, people, resources, maintenance,… Always adding & migrating systems, networking, storage Managing thousands of formats Constant Auditing and recovery of damaged or lost data Thousands of moving parts Complex systems and architectures Changing software platforms “Cross-Industry Preservation Architectures” – PASIG May, 2011 7 Aha! Move from “physical” preservation architectures and design as in physical media or a physical repository (a 2002 ‘OAIS’) to “virtualized” Preservation Services based on Service Management principles (Sounds like the Cloud…) “Cross-Industry Preservation Architectures” – PASIG May, 2011 8 Using the Cloud New Digital Preservation Models “Cross-Industry Preservation Architectures” – PASIG May, 2011 9 “Physical” Doesn’t Scale ● Old Architecture “Storing digital images effectively requires standards related to the storage media, such as CDROMs, and the file formats, such as TIFF.” ● Physical Standards Source: “A Resource List for Standards Related to Digital Imaging” Dec. 2010 Physical Application & Storage infrastructure Architecture: OAIS 2002 Metadata: Dublin Core: ISO 15836:2009 Storage Media: ISO 18921:2008, ISO 18925:2008, Digitization: ISO/IEC 109181/Cor1:2005, ISO/IEC 109183/Amd1:1999 File Formats: ISO 190051:2005, Adobe TIFF Specification, V6, 1992 Transfer Protocols: ISO 15740:2008 “Cross-Industry Preservation Architectures” – PASIG May, 2011 10 “Infrastructure Virtualization” ● New Architecture Media independent System Architecture virtualized, self-protecting, cloud based, and self healing Integrated migration & transformation services Virtualized historical applications hosted in the cloud in specialized containers running in virtual machines ● New Standards Architecture: OAIS 2010 Metadata: FCIS, PREMIS, IETF Cloud: SNIA-CDMI Interoperability: NIST Smart Grid Framework, Cloud and Interoperability workgroups Object Containerization: SNIA-SIRF “Cross-Industry Preservation Architectures” – PASIG May, 2011 11 Add “Information Virtualization” ● Portable Information ● New Standards Objects Extensible Preservation Objects Location, media independent Secure, auditable, authentic, portable Self-healing ● On-demand, virtual emulation “Jumpbox” hosted emulators Populations of legacy ‘readers’ Web-based delivery and access Architecture: OAIS 2010 Metadata: FCIS, PREMIS, IETF Cloud: SNIA-CDMI Interoperability: NIST Smart Grid Framework, Cloud and Interoperability workgroups Object Containerization: SNIA-SIRF and CDMI “Cross-Industry Preservation Architectures” – PASIG May, 2011 12 Move to “Managed” Content Management Service Management ITIL, ITSM, ILM2.0, Information Governance Litigation ‘Ready’ Preservation begins at Operating Practices: ITSM - IT Service Mgmt. ITIL-IT Infrastructure Library ILM2.0 - Service mgmt. based approach to information mgmt. and automation Regulatory Compliance “Creation” Preservation is a new Datacenter Practice “Cross-Industry Preservation Architectures” – PASIG May, 2011 13 And to “Virtual Services” in the Cloud ● Platform as a Service ● Infrastructure as a ● ● ● Service Storage as a Service Evolving Web access and use models Private, Hybrid, Public Clouds Multiple clouds, multiple providers “Cross-Industry Preservation Architectures” – PASIG May, 2011 14 Using the “Cloud” in Preservation ● Most likely use-cases: Private and Hybrid clouds Virtualize infrastructure Virtualize delivery and access Virtualize emulation Virtualize information ● Examples Providing portability Web-access models Web-drop boxes Agile, Scalable, cost effective compute and storage resources (on demand) Virtual emulation Demand spikes Disaster recovery Distributed data sets Infrastructure extensions “Cross-Industry Preservation Architectures” – PASIG May, 2011 21 Emerging Cloud Standards ● Cloud Data Management Interface, CDMI SNIA to ISO: storage-to-cloud, cloud-to-cloud interchange format ● Self-contained Information Retention Format, SIRF SNIA to ISO: extensible preservation object format ● Interoperability ISO project: Data Preservation Interchange Framework, DPIF “Cross-Industry Preservation Architectures” – PASIG May, 2011 22 Cloud Data Management Interface ● Data Portability Standard with an Object Storage Interface Move data and metadata in standard portable containers in and out of the cloud and between clouds Simple XML container of objects plus metadata ● A data and information services management interface and control path Operate services through CDMI Rules and Policies in metadata Cloud Peering – cloud to cloud communications “Cross-Industry Preservation Architectures” – PASIG May, 2011 23 Design for the Cloud ● Considerations Establish Service Objectives Include verification of recovery, authenticity, availability, digital audit, etc. Consider using multiple cloud destinations or local and remote copies for increased reliability and availability Beware of excessive moving of data across the WAN due to high I/O and bandwidth costs ● Evaluate Cloud providers Establish strong contracts ● Test and Audit All required services ● Use CDMI ! “Cross-Industry Preservation Architectures” – PASIG May, 2011 24 Cloud Contract Considerations Costs Retention Management Preservation/Integrity/ Authentication Return and Secure Disposal – Subpoenas, Control Legal Hold Digital Audits & Verification Physical and logical migration practices and authenticity verifications • Access Availability, Protection, Security & Confidentiality Search/Discovery Multi‐Cloud Provider Relationships Right to Conduct Forensic Exams • • Cross‐Border Data Transfers “Cross-Industry Preservation Architectures” – PASIG May, 2011 25 Preservation Architectures: Virtualization and Cloud Summary Thoughts “Cross-Industry Preservation Architectures” – PASIG May, 2011 26 Move to Virtual Preservation ● Shift thinking from “Physical” Preservation to “Virtual” ● Virtualization Applies in many ways System, storage, application, infrastructure Information Migration – both physical and logical Cost reduction ● Conclusion: ‘Cloud’ has a positive role “Cross-Industry Preservation Architectures” – PASIG May, 2011 27 Using the Cloud ● Start out Private, Move to Hybrid ● Apply Service Management Principles Classify, Requirements, SLAs, Design, Audit, Improve ● Design for the Cloud Create strong and measureable SLA style contracts Test, Audit, Verify ● Use and Promote CDMI Need cloud interface, management, and information portability standards “Cross-Industry Preservation Architectures” – PASIG May, 2011 28 Contact Information ● Michael Peterson IMERGE consulting and LTDPRM.org mpeterson@ltdprm.org (805)201-3178 “Cross-Industry Preservation Architectures” – PASIG May, 2011 29