Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira

advertisement
Automatic Evaluation of Migration
Quality in Distributed Networks of
Converters
Miguel Ferreira
mferreira@dsi.uminho.pt
2005-09-21
Supervisors
Ana Alice Baptista
José Carlos Ramalho
Contents
• Introductory concepts
• Research problems
• Proposed system
• Methodology
• Topics for discussion
Introductory concepts
• Digital preservation
– The set of processes and activities that
ensure the continued access to information
and all kinds of cultural heritage existing in
digital formats
• Digital object
– An information object, of any type of
information or any format, that is expressed
in digital form
– Text documents, digital photos, vector
graphics, databases, Web pages, software
Strategies for digital preservation
• Emulation
– Reproduction of the behaviour of a
hardware/software platform in a different
technological environment
• Encapsulation
– Storing information about how the objects should be
interpreted
• Migration
– Periodic transfer of digital materials from one
hardware/software configuration to another
• Others
– Computer museums, viewers, Universal Virtual
Computer
Migration
• Advantages
– Updated formats that users can read and
edit
• Disadvantages
– Requires a continuous diligence
– Data loss
• Variants
– Migration on request
– Normalisation
– Distributed migration
Distributed migration
• A network of remote conversion services supported by a
semantic layer [Hunter et al.]
• Advantages
–
–
–
–
–
Platform independent
Redundancy
Multiple migration paths
Cost reduction
Compatible with other migration strategies
• Disadvantages
– bandwidth
– Slow
Conversion
A-E
• Examples
Format
A
Format
C
Conversion
A-C
n
Co
nv
B- ers
C io
Co
n
io
rs
ve
o n C -D
nv
e
A- rsio
B
n
Format
B
Format
E
Conversion
C-E
C
– PANIC
– MyMorph (NLMed)
– TOM
Conversion
B-C
n
si o
er
nv E
Co D -
Format
D
How to choose a preservation strategy?
• Many preservation alternatives
• Lack of universal acceptance
• Distinct preservation requirements
– Satisfaction of the designated community
– Characteristics of the collection
– Budget
• Framework for evaluating
preservation strategies [Rauber]
– Utility Analysis
Evaluation of preservation strategies
1.
2.
Definition of objective tree
Assignment of measurement units
3.
4.
Identification of preservation alternatives
Execution of preservation alternatives
and evaluation of the outcome
Weighting of criteria in the objective tree
Calculation of partial and total values
Ranking of alternatives
5.
6.
7.
(e.g. millimetre, Mb, Euro)
Objective tree (example)
Research problems
• Automation of preservation
processes
• Authenticity issues
• Cost management
• Evaluation of preservation
alternatives
Research questions
• Is it feasible to design and
implement a system that is able to
automatically:
– determine the amount of data loss
occurred in a migration and generate
detailed migration reports for inclusion in
the objects’ preservation metadata?
– provide recommendations of migration
paths or target formats that will best suit
users’ requirements?
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Proposed System
Evaluate migration
[Original object]
[Migrated object]
[Process metrics]
Request Migration
[Source object]
[Migration report]
[Migrated Object]
[Migration Report]
Migration
Evaluator
Meta
Converter
Request Advice
[Migration Advice]
Invoke Migration
[Source object]
[Migration advice]
port]
[Par
[Criteria]
User
tion re
KB
ry M
Que ameters]
Request advice
[Criteria]
Store
[Migra
Migration
Advisor
[Migrated object]
Migration Network
[Mig
ratio
n da
ta]
Migration
Knowledge
Base
(MKB)
Methodology - proof of concept
The concepts
1. Automatic quantification of data loss
occurred in a migration and generation
of preservation metadata
2. Automatic recommendation of
migration strategies as well as target
formats
The proof (empirical validation)
1. Evaluator versus Human experts
2. Advisor versus Evaluation framework
Key contributions
• For individual preservers, digital
archives and libraries:
– Outsourcing and automation of digital preservation
– Generation of preservation metadata (authenticity)
– Ranking of migration alternatives
• For designers and programmers of
converters:
– Possibility of publishing their converters as services
• For metadata creators and users:
– Increase adoption
– Help to improve future versions
– Accelerate the development of XML bindings
Round-up
•
Service oriented architecture (SOA)
–
–
–
Automatic quantification of data loss
Provides recommendations on which
migration paths or target formats are best
suited for each user
Simplifies the creation of preservation
metadata
Based on migration
–
•
Methodology
–
Proof of concept with empirical validation
• Evaluator versus Human experts
• Advisor versus Evaluation framework
Topics for discussion
• Relevance of research
• Research methodology
• System architecture
• Format registry vocabulary
– e.g. MIME types, TOM type descriptors, Global
Digital Format Registry, PRONOM, etc.
• Preservation metadata schema
– e.g. PREMIS data dictionary (event entity)
Download