Lifecycle Metadata for Digital
Objects
September 25, 2006
Major archival and digital library
metadata schemes: How (or how not)
to go about scheming
NHPRC Initiatives, 1991-2003
Research Issues in Electronic Records publication (no
longer online)
http://www.archives.gov/nhprc_and_other_grants/ele
ctronic_records/research_issues_report.html#recom
mendations
1996 conference on electronic records research
http://web.archive.org/web/*/http://www.si.umich.e
du/e-recs/Report
2002-03 review of research agenda, MN Historical
Society
http://www.mnhs.org/preserve/records/eragenda.ht
ml
NHPRC Initiatives, cont.
For links to extant online NHPRC
project results, see:
http://www.gseis.ucla.edu/usinterpares/bibliography/NHPRC.htm
For a list of all funded NHPRC erecords projects, see:
http://www.archives.gov/nhprc/projects
/electronic-records/projects.html
University of Pittsburgh
Project
NHPRC funding, 1992-1996
Overlapped with “Camp Pitt”, 1991-93 (which
spread the word about need)
“Business Acceptable Communications”
assured by warrant from non-archival
contexts
Emphasis on evidence and on
postcustodial strategies for managing
records
Emphasis on transactions as records
Warrant by Functional
Requirements (for system)
Conscientious
organization

1 Compliant
Accountable
recordkeeping system



2 Responsible
3 Implemented
4 Consistent
Captured Records


5 Comprehensive
6 Identifiable
7 Complete
 8 Authorized
Maintained Records
 9 Preserved
 10 Removable
Usable Records
 11 Exportable
 12 Accessible
 13 Redactable
Major elements here are
organization, recordkeeping
system, records

Warrant by Practice
Lawyers
Auditors
Records Managers
Information Technologists
Managers (mostly ISO 9000, 9001)
Medical Professions
Pittsburgh metadata reference
model in six layers
Note now available (rescued from loss at Pittsburgh)
at
http://www.archimuse.com/papers/nhprc/meta96.ht
ml
Handle [URI]
Terms & Conditions [IP, privacy, etc]
Structural
Contextual
Content
Use History [entire life history]
Ordering depends on the assumption that metadata
will be encapsulated as part of the record
I. Handle layer
(ID + description)
Unique identifier



Record declaration (i.e., as record)
Transaction domain (creation context)
Transaction instance (date-stamp etc.)
Discovery metadata



Description standard (e.g. namespace)
Descriptors
Language
II. Terms & Conditions Layer
Restrictions status (any “holds” on data)
Access conditions (for restricted
records)
Use conditions (licenses, redactions,
etc.
Disposition requirements (retention,
destruction)
III. Structural Layer (technical
+ preservation)
File identification metadata (of constituent
files)
File encoding metadata (standards used)
File rendering metadata (standards required)
Record rendering metadata (standards
required)
Content structure metadata (for e.g.
databases)
Source metadata (creator + capture event)
IV. Contextual Layer
(provenance + evidence)
Transaction context metadata (people +
transaction)
Responsibility metadata (Organizational
information; org chart, etc.)
System accountability metadata (system
audit)
V. Content Layer
Actual data
Any constituent files
Any internal markup
VI. Use History Layer
Type
Instance
User
Consequences
Indiana University test of BAC
Funded by NHPRC
Evaluating administrative recordkeeping
systems at IU
Testing functional requirements
Mapping metadata requirements




Elimination of “metadata encapsulated objects”
(record separated from metadata)
Reduction in structural metadata
Pulled back from record-level metadata to record,
file, class levels in many cases
Influenced by MoReq (I.e., 5015.2)
InterPARES Project
Funded by NHPRC, SSHRC
Initially a University of British Columbia
project that led to DoD STD 5015.2
Aim to establish characteristics of a reliable
and authentic electronic record
InterPARES is international project funded by
NHPRC, SSRC, etc.
Aim to establish rest of record life cycle (i.e.,
after creation and classification)
InterPARES case studies
Examine digital recordkeeping systems
in wide variety of contexts worldwide
Qualitative methods used to discover
how records are used, carry out
functional analysis
Data used to provide basis for modeling
preservation processes
InterPARES basis in
diplomatics
16th-19th-century method for
establishing genuineness of documents
Defines four types of records:




Dispositive (form is essence of evidence)
Probative (written form part of evidence)
Supporting (written form discretionary,
procedurally linked to action)
Narrative (provides context)
InterPARES Authenticity
template
Documentary form


Extrinsic elements
Intrinsic elements
Annotations
Medium
Context
A quick example
http://www.thesmokinggun.com/archiv
e/1020051delay2.html
Do we believe this document? Why?
What metadata does it incorporate?
InterPARES findings, 2002
Hopes for a clear typology of record
forms dashed after four rounds
Contemporary systems too fluid for
model




No fixed form or content
No annotations
Embedded in social contexts
Managed procedurally
InterPARES 2
Follow-on from InterPARES 1
Addresses “new” file formats:



Experiential
Dynamic
Interactive
Description cross-domain



Metadata Schema Registry
Metadata Specification Model
Literary Warrant Database
Meanwhile, in Australia…
Bearman and Australia
(postcustodialism)
Specifics of Australian problems



Relatively young government
Relatively small government
Swift computerization
Context: Australian national
government recordkeeping + regulated
industries
Sue McKemmish, “Yesterday,
Today, and Tomorrow”
Application of Frank Upward’s records
continuum model
Integration of continuum process
Importance of continuum (compare to life
cycle)






Governance and accountability
Collective memory and identity
Records as assets
Can be expanded to the idea of any kind of
information, not just records
Does not require centralized custodianship
Heals records manager/archivist split
The Records Continuum
c.Frank Upward, all rights reserved
Evidentiality
Dimension 1
CREATE
Collective
Memory
Dimension 2
CAPTURE
Corporate /
Individual
Memory
evidence
Trace
Identity
Organisation
Unit(s)
Institution
Transaction
Actor(s)
Function
Activity
Purpose
Transactionality
[Archival]
Document
record(s)
Dimension 4
PLURALISE
archive
Archives
Recordkeeping
containers
Dimension 3
ORGANISE
“Yesterday” II
Four dimensions of the continuum (all active
through life of record):
Create: actors, acts, documents, trace
Capture: reliable recordkeeping systems
Organize: entire recordkeeping regime
Pluralize: social/archival context for access
Liberatory assumptions provide for multiple
views
“Create Once, Use Many
Times” project
Reuse and inheritance of metadata from
many contexts
Avoid retrospective description
Use standards in schema registries
Use web services
[Use ontology-matching]
“Metadata Broker” concept
Note: most archives and library activity so far
is concentrated at the attribute space/value
space level (see next slide)
“Create Once” metadata
layers concept
Layer
3
(a) Attribute Space (e.g.
LOM, Dublin Core, MES,
indecs)
(b) Value Space (e.g. ontologies, classifications, controlled vocabularies, taxonomies)
Layer
?
Left out of model: covers cultural and temporal conceptual change
Layer
2
Representation (e.g. XML, RDF, DAML-OIL)
Layer
1
Transport and Exchange (e.g. HTTP Get, OAI Protocol for Metadata Harvesting)
RKMS Australian Metadata
standard
Agent
Rights Management
Title
Subject
Description
Language
Relation
Coverage
Function
Date
Type
Aggregation Level
Format
Record Identifier
Management History
Use History
Preservation History
Location
Disposal
Mandate
What about EAD?
Supported by SAA
Archival description focuses on aggregates
of objects
EAD was created to mark up finding aids, not
actual objects
So far only experimental use has attempted
to add document-level records to EAD
descriptions
Addition of EAC for detailed name authority
control: http://xml.coverpages.org/eac.html
Dublin Core Metadata
Initiative
Supported by OCLC
Primarily a surrogate/discovery metadata
scheme
Does not aim to document everything
Goal to be easy to use: a “boundary object”
for many communities of practice
Especially useful for management of active
digital objects
Dublin Core elements
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
Dublin Core development
Initial development of simple elements
Subelements and user communities:
Qualified Dublin Core
Warwick Framework container
architecture (ca. 1996)
RDF and XML: toward namespaces
within DC or including DC
Dublin Core in HTML
environment
Example: MDAH
http://www.mdah.state.ms.us
Library of Congress Metadata
Efforts
See webpage: http://www.loc.gov/marc/
MARC (21, AMC, etc.; MARCXML)


FRBR multiple-manifestation issue
ISBD(G) insistence on nonrepeatable source fields
MODS and MADS (Metadata Object/Authority
Description Schema)
METS (Metadata Encoding and Transmission
Standard)
Metadata Encoding and
Transmission Standard (METS)
Developed out of LoC’s MOA project, available
2003
Initially designed to support maintenance of
library of digitized digital objects
Note importance of transmission emphasis:
not meant to be an internal format
Compare to the “manifest” idea for
structuring multimedia objects (MPEG-21)
Three overall types of metadata, introduced
by header for METS document creation
context
METS Descriptive metadata
External (e.g., finding aid)
Internal (part of the document)



Provides namespace reference for any
XML-encoded non-METS metadata
LoC is using MODS, DC, and MARCXML
Provides an XML “wrapper” for non-XML
metadata (e.g. MARC)
METS Administrative metadata
Technical metadata (including e.g. NISO
still/MIX)
Intellectual property rights metadata
Source metadata (re analog source)
Digital provenance metadata


Relations between files
Migration/transformation history data
METS Structural metadata
File group list (all the files included)
Structural map (defines hierarchical
relations between files and METS
element structure)
Behavior segment (associates
executable methods like viewers, render
engines, etc. with specific content
elements)
METS and XML
The METS XML schema
http://www.loc.gov/standards/mets/met
s_xsd/mets.html
Why is it all so complicated?
Existing library/archives professions have
lacked IT skills for substantive innovation


Little impetus to re-vision practice
Little awareness of successful external practice (cf.
data archives)
Existing professions have had different goals



Providing access to text blocks
Providing high-level access to undescribed
collections
Preserving physical objects
Why should we care about
library/archives schemes?
Long track record in descriptive metadata
Understanding of ontology construction
Indeed most of ALL activity on metadata at
the attribute space/value space level is based
on library/archives understandings, even if
restructured
Australian “clever recordkeeping” project is a
glimmer of what is needed to blend with
Semantic Web/web services/distributed
intelligence views of a worldwide information
appliance…