Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira mferreira@dsi.uminho.pt 2005-09-21 Supervisors Ana Alice Baptista José Carlos Ramalho Contents • Introductory concepts • Research problems • Proposed system • Methodology • Topics for discussion Introductory concepts • Digital preservation – The set of processes and activities that ensure the continued access to information and all kinds of cultural heritage existing in digital formats • Digital object – An information object, of any type of information or any format, that is expressed in digital form – Text documents, digital photos, vector graphics, databases, Web pages, software Strategies for digital preservation • Emulation – Reproduction of the behaviour of a hardware/software platform in a different technological environment • Encapsulation – Storing information about how the objects should be interpreted • Migration – Periodic transfer of digital materials from one hardware/software configuration to another • Others – Computer museums, viewers, Universal Virtual Computer Migration • Advantages – Updated formats that users can read and edit • Disadvantages – Requires a continuous diligence – Data loss • Variants – Migration on request – Normalisation – Distributed migration Distributed migration • A network of remote conversion services supported by a semantic layer [Hunter et al.] • Advantages – – – – – Platform independent Redundancy Multiple migration paths Cost reduction Compatible with other migration strategies • Disadvantages – bandwidth – Slow Conversion A-E • Examples Format A Format C Conversion A-C n Co nv B- ers C io Co n io rs ve o n C -D nv e A- rsio B n Format B Format E Conversion C-E C – PANIC – MyMorph (NLMed) – TOM Conversion B-C n si o er nv E Co D - Format D How to choose a preservation strategy? • Many preservation alternatives • Lack of universal acceptance • Distinct preservation requirements – Satisfaction of the designated community – Characteristics of the collection – Budget • Framework for evaluating preservation strategies [Rauber] – Utility Analysis Evaluation of preservation strategies 1. 2. Definition of objective tree Assignment of measurement units 3. 4. Identification of preservation alternatives Execution of preservation alternatives and evaluation of the outcome Weighting of criteria in the objective tree Calculation of partial and total values Ranking of alternatives 5. 6. 7. (e.g. millimetre, Mb, Euro) Objective tree (example) Research problems • Automation of preservation processes • Authenticity issues • Cost management • Evaluation of preservation alternatives Research questions • Is it feasible to design and implement a system that is able to automatically: – determine the amount of data loss occurred in a migration and generate detailed migration reports for inclusion in the objects’ preservation metadata? – provide recommendations of migration paths or target formats that will best suit users’ requirements? Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Proposed System Evaluate migration [Original object] [Migrated object] [Process metrics] Request Migration [Source object] [Migration report] [Migrated Object] [Migration Report] Migration Evaluator Meta Converter Request Advice [Migration Advice] Invoke Migration [Source object] [Migration advice] port] [Par [Criteria] User tion re KB ry M Que ameters] Request advice [Criteria] Store [Migra Migration Advisor [Migrated object] Migration Network [Mig ratio n da ta] Migration Knowledge Base (MKB) Methodology - proof of concept The concepts 1. Automatic quantification of data loss occurred in a migration and generation of preservation metadata 2. Automatic recommendation of migration strategies as well as target formats The proof (empirical validation) 1. Evaluator versus Human experts 2. Advisor versus Evaluation framework Key contributions • For individual preservers, digital archives and libraries: – Outsourcing and automation of digital preservation – Generation of preservation metadata (authenticity) – Ranking of migration alternatives • For designers and programmers of converters: – Possibility of publishing their converters as services • For metadata creators and users: – Increase adoption – Help to improve future versions – Accelerate the development of XML bindings Round-up • Service oriented architecture (SOA) – – – Automatic quantification of data loss Provides recommendations on which migration paths or target formats are best suited for each user Simplifies the creation of preservation metadata Based on migration – • Methodology – Proof of concept with empirical validation • Evaluator versus Human experts • Advisor versus Evaluation framework Topics for discussion • Relevance of research • Research methodology • System architecture • Format registry vocabulary – e.g. MIME types, TOM type descriptors, Global Digital Format Registry, PRONOM, etc. • Preservation metadata schema – e.g. PREMIS data dictionary (event entity)