http://courses.ischool.utexas.edu/galloway/2011/spring/INF392K/sodmeta.ppt

advertisement
Metadata for Digital Objects
With an emphasis on preservation…
Pat Galloway, SoD, 9/10/09
Remarks on digitization





Cost-benefit
Sliver of a sliver? Or corpus?
Digitization as preservation
Obligation to preserve
Resulting requirements for metadata
What is metadata?

Data about data





Database usage
Web usage (metatags)
Functions?
Kinds?
Several perspectives from which to
consider metadata: orders, functions, lifecycle
First-order metadata: representation
schemes




Encoding (ASCII, proprietary formatting
schemes)
Compression schemes
Encryption or other intentional distortion
schemes
These lie at the base of digital objects and
exist before the creation of the object
Second-order metadata


Written natural language (for example)
Layout conventions




Separation of words
Arrangement of groups of words
Punctuation, capitalization, etc.
Note that this is usually considered to
belong to an external standard (“English”)
Third-order metadata


“Connections to the world”
Meaning


Semantics
Pragmatics
Fourth-order metadata

Functions





What can you do with the digital object?
What is its purpose?
How does it work?
Functionality significant for preservation
Explicit digital object types
Fifth-order metadata

Groups of digital objects




Archival series
Project files
“Complex documents”
Context of the group
More orders?



Additional intermediate orders could be
thought of
Depends on granularity
May depend on object type
Classic objects of preservation in
archives
 Content
 Context
 Structure
Functional types of metadata
 Administrative
 Descriptive (especially resource discovery)
 Preservation
 Technical
 Use
Life cycle view of metadata
 Appraisal/Inventory/Scheduling
 Creation and versioning
 Transfer/Authenticity
 Descriptive
 Use
 Rights management
 Preservation and disposition
Attributes of metadata items
 Source of metadata (internal or external)
 Method of metadata creation (auto or manual)
 Nature of metadata (lay or expert)
 Status (static or dynamic)
 Structure (structured or unstructured)
 Semantics (controlled or uncontrolled)
 Level (item or collection)
 Note: these attributes are relevant for all
metadata)
Major Archival Metadata Schemes
University of Pittsburgh metadata
reference model in six layers






Handle
Terms & Conditions
Structural
Contextual
Content
Use History
Example: Structural Layer
specifies technical details
•
•
•
•
•
•
File identification metadata
File encoding metadata
File rendering metadata
Record rendering metadata
Content structure metadata
Source metadata
InterPARES Project
Authenticity template
• Documentary form
•
•
Extrinsic elements
Intrinsic elements
• Annotations
• Medium
• Context
Dublin Core Metadata Initiative
•
•
•
•
Supported by OCLC
Primarily a surrogate/discovery metadata
scheme
Does not aim to document everything
Useful for management of active digital
objects
Basic Dublin Core elements








Title
Creator
Subject
Description
Publisher
Contributor
Date
Type







Format
Identifier
Source
Language
Relation
Coverage
Rights
Dublin Core development
•
•
•
•
•
Initial development of simple elements
Subelements and user communities
Warwick Framework
Qualified Dublin Core
RDF and XML
Metadata Encoding and
Transmission Standard (METS)
 Developed out of LoC’s MOA project
 Designed to support maintenance of libraries of
digital objects
 METS document is a “wrapper” containing
pointer to the object plus its metadata
 Three overall types of metadata (three segments
of METS document)



Descriptive
Administrative
Structural
METS Descriptive metadata
 External (e.g., finding aid that can be
pointed to via a URL)
 Internal (included in the document)
 Can include several different metadata sets
as relevant
METS Administrative metadata
 Technical metadata
 Intellectual property rights metadata
 Source metadata (for analog source)
 Digital provenance metadata


Relations between files
Migration/transformation data
METS Structural metadata
 File groups list
 Structural map (defines relations between
files and METS element structure)
 Behavior segment (associates executable
methods with specific files, e.g. for
display)
METS and XML
•
•
•
•
The METS XML schema
http://www.loc.gov/standards/mets/mets_x
sd/mets.html
Why is it all so complicated?
How can anyone ever keep track of all this
metadata?
XML in 10 Points
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
XML is for structuring
XML looks like HTML
XML is text for computers
XML is purposely verbose
XML is a family
XML is only partly new
XHTML->XML
XML is modular
XML is base for RDF, Semantic Web
XML is free, universal, supported
Creation Metadata
Metadata added at creation



By the creator
By the creating application program (note:
some of this is meant for system use)
Example of hybrid process: creation of
Word file
Example: Word processing
Digitization as creation






Preprocessing
Conversion
Quality control
Object manipulation
Surrogate outputs
(see handouts)
Appraisal / Inventory / Retention
Schedule Metadata
Digital Appraisal Decisions




Keep (costs of carrying into the future)
Allow to Die (keep but do nothing)
Repurpose (separating content and form)
Destroy (microwave the disk?)
Digital Appraisal: What to
Appraise


Content (as with paper?)
Technical support




System
Creating application
Display requirements
Functionality
What is a Retention Schedule?


Classic record statuses: active, semiactive,
inactive
Keep



Allow to Die



Alter function of custodian
Alter custodianship
Leave with creator?
Why not always do this?
Destroy


Determine when to destroy
Almost always a method for reprieve exists…
Record-level vs Group-level
Metadata

Record-level: Metadata orders 1-4





1 encoded (content)
2 written (content)
3 meaning (ontology)
4 function/purpose=type (form)
Group-level: Metadata order 5

5 Object grouping schemes (categories)


Record groups, record series (intellectual management)
Format, security concerns (physical management)
Transfer / Authenticity Metadata
The central problem: Security
guaranteeing Authenticity




Guarding the object (authenticity, integrity)
Tracking the object through its lifetime
Proving the identities of the people
responsible for transferring the object
(authentication, non-repudiation)
Transferring the object in a secure way
What is transfer about?

What is a digital copy? What qualifies?




Data compression issues
Data segmentation issues
Creating application vs file-management application
How can a digital copy be guaranteed?




Digital object as string of bits
Message digest of object as math on the bits
Ship the message digest with the object
Recalculate and compare at the other end
Guaranteeing the authenticity of
the object (Integrity)

Object as open or secret



Message digest



Must we disguise the object?
Can we move it around in clear?
Creates single number: “one-way hash”
Number will change with the slightest change in the
object on which it was calculated
Encryption (Confidentiality)


Asymmetric
Symmetric
Accession Metadata
What is the nature of the
accession task?





The object received has been uprooted from its
former context
Object is equipped with enough metadata to
reconstruct that context
Contextual metadata now is no longer functional
but descriptive of the old context
Object must be integrated into a new context
(which may mirror the old)
New functions must be provided for (metaactivities)
Validation of the object




Validation test suite
Validation tools
Formal validation process
Validation outcomes



Rejection
Re-transfer
Acceptance
Preparation of the object for
storage



Metadata as data and as processing
instructions
Digital object and use copy
Storage issues
Descriptive Metadata
Descriptive metadata for what?




Individual objects (Dublin Core, RDF)
Books and other chunks (MARC, MODS)
Multimedia objects (METS, MPEG 21)
Finding aids (EAD): collection-level
What about the single object?



Is Dublin Core enough?
What for?
Who will describe at the object level?




Zillions of archivists?
Automatic analysis?
Ad hoc analysis?
Taggers on the Internet?
Preservation Metadata
What
is
Preservation
Metadata?
 Object stability (OAIS “content data object”)




What elements of the object’s content should be
preserved? What is it? What is it for?
What functions of the object should be preserved?
(i.e., how can it remain itself into the future, and what
do we mean by “itself”?)
Environmental support (OAIS “environment”)


What kind of environmental characteristics does the
object need to stay alive (software, hardware)?
(i.e., how do we specify its life support system?)
Object Stability I: Content

Authenticity revisited: stability for what?





Access to genuine article
Historical truth
Guarantee of prior art
Intellectual property guarantee
Range of attributes needed for each

What does “content” mean?
Object Stability II: Functionality

Static objects (e.g. text)


Look and feel
Dynamic objects (e.g. computer game)



Look and feel
Connectivity
Interactivity
Environmental Support I:
Emulation



Making it possible to see the object as it was
originally seen
Making it possible for the object to function as it
originally did
Providing software support for that to happen


Running the original program (in an environment that
emulates the original environment)
Running something that looks like (emulates) the
original program
Environmental Support II:
Migration


Deciding what to migrate (deciding what to
lose)
Transformations to the object


If reversible, no need to keep original object
If not, retention of original object necessary
Documentation requirements for
preservation



What the object was
What the object is
What happened in between
OAIS metadata model I
OAIS metadata model II


SIP (send), AIP (archive), DIP (disseminate)
Parts of an object


Content
Preservation description






Reference (unique identifier)
Provenance (history in and out of repository)
Context (archival bond)
Fixity (message digest)
Packaging
Descriptive
OAIS metadata model III

What is “representation information”?



How much must be kept?
Monitoring changes
What is the “knowledge base”?


Designated user community
DUC as “the public”
PREservation Metadata
Implementation Strategies





Preservation metadata set, 2003-present
Assumes OAIS model
Maintaining viability, renderability, understandability,
authenticity, identity
Emphasis on provenance and relationships
Entity concept






[Intellectual entity: descriptive metadata]
Object
Event
Agent (MARC, MADS)
Rights
Technical/hardware metadata out of scope
PREMIS Example: Object













objectIdentifier
objectCategory
preservationLevel
significantProperties
objectCharacteristics
originalName
storage
environment
signatureInformation
relationship
linkingEventIdentifier
linkingIntellectualEntityIdentifier
linkingRightsStatementIdentifier
Usage Metadata
What is Usage Metadata?




Internal users (with respect to the creator)
External users (with respect to the creator)
Internal users (with respect to the
repository)
External users (with respect to the
repository)
Creator Usage

The creator’s actual use of the object


The creator’s colleagues’ use of the object



Version control
Object function
Object used for reference, model
The creator’s customers’ use of the object

Object function: mediates relationship
Repository Usage

Management usage



Object maintenance and preservation
Object analysis
Designated user community


Object viewing
Object acquisition
Rights Management Metadata
What is Rights Management?



Protection of copyright
Protection of patent
Protection of the integrity of the digital
object (and thereby reputation of the
author/creator herself)
What is being protected?


Object itself (integrity)
Uses of the object (access controls)


Limiting use (protecting rights of the owner)
Enabling use (protecting rights of the user)
Protection against theft




Threats of the law
Fully document with metadata and protect
the metadata
Authentication of users and user requests
Watermarking/steganography
What about integrity of the
digital object?



Relevant even in public domain
E.g. “copyleft” agreement:
http://www.gnu.org/copyleft/gpl.txt
See but not change, or change only with
notification
Metadata Conclusions?
Download