TEI, EAD and Integrated User Access to Archives: Towards a Generic Toolset

advertisement
TEI, EAD and Integrated User
Access to Archives:
Towards a Generic Toolset
Chris Turner
Anna Sexton
Susan Hockey
Geoffrey Yeo
LEADERS Project,
School of Library, Archive and Information Studies,
University College London
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Agenda
•
•
•
•
Introduction to LEADERS
User Testing
TEI for Archives
TEI/EAD Integration
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Linking EAD to
Electronically Retrievable Sources
• The LEADERS project aims to enhance remote
user access to archives by providing the means
to present archival source materials within their
context.
• Funded by the Arts and Humanities Research
Board (AHRB)
• Located in UCL SLAIS
• Completion March 2004
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Leaders Project Deliverables
• Encoding
– Encoded finding aids and source materials with images
• Tools
– A suite of tools for archivists to support the encoding and online presentation of archival source materials and finding
aids
• Application
– A demonstrator application for search, retrieval and
presentation of encoded materials in the Web environment
• User Testing
– User testing will provide on-going feedback for the
development process.
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Encoding
• Text Encoding Initiative (TEI)
– For archival source materials
• Encoded Archival Description (EAD)
– For finding aids
• NISO Metadata for Images in XML (MIX)
Schema
– For digital images metadata
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Tools: Design considerations
• Generic – should be usable on other
projects for encoding other resources
• Re-usability of encoded resources – to
facilitate maximum return on the encoding
effort
• Platform independent – to minimise
restrictions on deployment
• Where possible to simplify/reduce
encoding effort
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Tools: Technology
• XML Schema
– Use of namespaces
– Schema will provide a generic and re-usable means
to encode resources
• XSLT/CSS
– Style sheets will provide a means to manipulate,
transform and present the encoded resources, thus
supporting re-purposing of encoded materials
• WSDL/SOAP
– Incorporating ‘self-describing services’ will allow
multiple applications to be constructed to make
different use of the encoded materials
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Application: Objectives
• Demonstrator – a sample application to
show what can be produced/generated
from the encoded materials and the toolset
• Basic search and retrieval and alternative
presentations to show the possibilities of
TEI/EAD encoded resources
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Application: Technology
• Tools will allow use of Microsoft or
Apache/Java development environment
• Time/££ require choice of one
• Microsoft .Net Framework
– ASP.NET/C#
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
User Testing
•
•
•
User centred perspective
Gaining a representative sample of archive users
Categorisation of user types
–
–
–
Purpose of research
Primary area of interest
Familiarity with:
•
•
•
•
•
Area of interest
Archival finding aids and documents
The Internet
Qualitative Techniques
Administration and feedback to development process
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI for Archives: Research
Methodology
• Analysis of commonly occurring
structures, features and contents found
within a range of different types of archive
source material
• Material held in UCL’s Special Collections and
Record Office
• Expect to analyse material held in other archival
repositories to validate initial findings and uncover
as wide range of encoding challenges as possible
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI for Archives: Preliminary
findings
• ‘Transcription of Primary Sources’ tag set
in TEI can deal with a wide range of
encoding challenges inherent in archival
material:
– Complex additions, deletions and corrections
– Gaps within and damage to the text
– Changes in document hands, style and
character of writing
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI for Archives: Preliminary
findings
• Need to explore encoding options for:
– ‘layered data’
– Textual and numerical data presented in
complex tables
– Formulae and mathematical expressions
within the text
• Examining TEI and other DTDs specifically
built to handle these structures and
features
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Exploring solutions for
encoding ‘layered data’
• ‘Layered data’: when an underlying layer
of data is used as the basic structure onto
which further data [other layer(s)] is
applied
– Accounts and registers
– Address books
– Calendars
– Questionnaires and forms
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Example of ‘layered data’
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Encoding objectives for
‘layered data’
• Objective 1: layers within the document
should be explicitly differentiated and their
differences should be documented
• Objective 2: the relationships between
data in different layers must be explicit
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Example encoding
<tei.2>
<teiHeader>
<!--…--!>
<encodingDesc><layerDesc>
<layer id=“lay1”>Form printed by University College London
<layer id=“lay2”>Handwritten responses to form filled in by <name>Babut, Marie
<!--…--!>
<text><body>
<header layer=“lay1”>University College London
<instruction layer=“lay1”>Form to be filled up by person wishing to become a Student of
the College (so far as it may apply in his or her case)
<dataSegment layer=“lay1”>
<prompt layer=“lay1”>Name in full
<response layer=“lay2”><name reg=“Babut, Marie”>Marie Babut
<!--…--!>
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI/EAD Integration: What
is the purpose of EAD?
• EAD is a metadata standard for the
creation of tools (finding aids) that contain
information that identifies, manages,
locates and describes archive documents
within archive collections and explains the
contexts and records systems from which
the documents have been selected.
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
EAD/TEI Integration: What is
the purpose of TEI?
• TEI is a content encoding standard for the
creation of ‘objects of study’. TEI ‘objects of
study’ are usually derived from one or more
original ‘objects of study’ (e.g. an archive
document within an archive collection)
• TEI is also a metadata standard which seeks to
put the new object into the context of why and
how it has been created, what it has been
derived from (e.g. the original archive
document), and what the data within the object
represents
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI/EAD Integration: Overlaps
• Overlaps between EAD and TEI occur in
relation to metadata that:
– Identifies, locates and describes the creation of the
original archive document
– Describes the physical characteristics of the original
object
– Provides contextual information about the creator of
the original object and the participants within the
object
– Interprets/describes the data in the object
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI/EAD Integration: Detailed
Analysis of Overlaps
• Elements that interpret/describe the actual
data within the object
EAD Elements
Name of immediate
Name of child
parent element
elements
<controlaccess>
<genreform>
<geogname>
<persname>
<famname>
<corpname>
<occupation>
<subject>
<date>
<function>
N/A
<scopecontent>
Copyright, UCL
Overlapping TEI elements
Name of immediate
Name of child elements
parent element(s)
<profileDesc> within <keywords><classcode>
<teiHeader>
classref>
<textDesc> within
<profileDesc> within
<teiHeader>
<channel>
<constitution>
<domain>
<factuality>
<preparadness>
<purpose>
<text>
<name><date>
LEADERS: Linking EAD to Electronically
Retrievable Sources
TEI/EAD Integration:
Objectives
• Avoidance of repetition of information
within EAD and TEI
• The ability to make the EAD finding aid
and the TEI transcript stand-alone tools
• Effective search and retrieval across EAD
and TEI
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Summary
• Ground breaking work:
– Archive user categorisation
– Identification of generic structures, features and
physical characteristics in archival documents to
facilitate use of TEI
– Overlap/integration of EAD and TEI
– Use of Web-based technologies to create generic, reusable solutions
• Interim reports, designs and coding examples
Copyright, UCL
LEADERS: Linking EAD to Electronically
Retrievable Sources
Download