Integrating EAD and TEI: the resolution of metadata overlaps Anna Sexton ALLC/ACH 2003

advertisement
Integrating EAD and TEI: the
resolution of metadata overlaps
Anna Sexton
ALLC/ACH 2003
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Talk outline
•
•
•
•
•
Introduction to LEADERS
Representing archives using EAD and TEI
Problem: Overlaps between EAD and TEI
Solution: Enriching EAD
Implications for search and retrieval
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Introduction to LEADERS
• Aim of LEADERS:
– To enhance online access to archival material
• Linking of EAD, EAC, TEI: bringing together
finding aids, authority records, transcripts and
digital images of archive documents
• Re-usable toolkit for archivists
• Demonstrator application
– Selected documents from the Orwell Archive and
UCL’s own administrative archive used as test-bed
material
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Representing archives using
TEI and EAD
• Archives are collections of documents
selected and preserved because they
have long-term value
• Finding aids provide access to archive
collections through information that
identifies, locates and interprets the
material and explains the context and
record systems from which the material
has been selected
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Representing archives using
EAD and TEI
• EAD is an encoding standard used to
structure and exchange electronic finding
aids
• As a stand-alone tool cannot give users
access to the actual content of archive
documents
• Integration with TEI opens up many
possibilities for the user
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Representing archives using
TEI and EAD
• TEI is an encoding framework that enables
the creation of digital texts
• When EAD and TEI are brought together:
– Within a single environment the user can find
items in archive collections; learn about their
contexts; view representations of the items
themselves; and read, study, analyze and
manipulate their content
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Problem: overlaps between
EAD and TEI
• Overlap in relation to metadata that:
– Identifies, locates and gives details about the
creation of the original document
– Describes the physical characteristics of the
original document
– Provides contextual information about the
creator and participants within the original
document
– Interprets/describes the data in the original
document
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Possible solutions to overlaps
• Can deal with the overlaps in a number of ways:
– Use the EAD framework and the <teiHeader> as they
are intended and accept that metadata about the
original document will appear in both
– Use the EAD framework to hold metadata about the
original document and use the <teiHeader> to hold
information about the TEI transcript
– Use TEI for all metadata
– Use EAD for all metadata
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Solution: enriching EAD
• LEADERS is using EAD as the overall
metadata framework, whilst enriching at
the lowest level of description so that EAD
can act as an adequate holding place for
metadata relating to the original
documents and their derived digital
representations
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Metadata relationships between the
original, the transcript and the image
Content Metadata
Original Archive
Document
Contextual,
Identification and
Administrative
Metadata
Copyright, UCL
TEI Transcript
Image
Contextual,
Identification and
Administrative
Metadata
Contextual,
Identification and
Administrative
Metadata
LEADERS: Linking EAD to Electronically
Retrievable Sources
Enriching EAD’s <altformavail>
• According to the EAD tag library
<altformavail> should be used to hold:
– Information about copies of the [original]
materials being described, including the type
of alternative form, significant control
numbers, locations and source for ordering if
applicable. The additional formats are
typically microforms, photocopies or digital
reproductions
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
LEADERS finding aid Schema
Inputs
EAD DTD
Output
LEADERS EAD-based Schema
for encoding finding aids
Within EAD’s <altformavail>
at item level:
TEI
Encoding
Framework
NISO
MIX
Schema
Copyright, UCL
xmlns: TEI encoding
framework
xmlns: NISO MIX
Schema
LEADERS: Linking EAD to Electronically
Retrievable Sources
• Example item level description from the
finding aid for the Orwell Papers
• XML
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Recreating standalone digital
objects
• <teiHeader> made redundant
• Raises a question relating to the
reusability of the TEI transcript
– Solution: derive new xml file using XSLT
combining the relevant part of the finding aid
and the transcript to create a new TEI
conformant file
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Search and retrieval issues
• Overlap between content encoding in the
TEI text and <controlaccess> terms in
EAD
• Both could be used as index terms for
search and retrieval
• Combine TEI <name> tags and EAD
<controlaccess> tags to produce lists of
index terms
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Conclusion
• Our solution to overlaps:
– Avoids repetition of the same information
– Clearly differentiates between metadata relating to
original archive documents and metadata relating to
their derived digital forms
– Makes use of EAD’s comprehensive framework in
describing archival material as whole collections
made up of component parts
– Recognises the need to allow for re-use of resources
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Download