LEADERS: Linking EAD to Electronically Retrievable Sources Progress to Date

advertisement
LEADERS: Linking EAD to
Electronically Retrievable
Sources
Progress to Date
Anna Sexton
15 May 2002
Copyright UCL 2002
Talk Outline
• Introduction to LEADERS
• Project Background: unpacking the
concepts and technologies
underpinning the project
• Discussion of why LEADERS is needed
with overview of project aims, objectives
and deliverables
• Outline of progress with overview of key
project research questions
Copyright UCL 2002
Introduction to LEADERS
Copyright UCL 2002
Introduction to LEADERS
• Project funded by the Arts and Humanities Research
Board (AHRB)
• Based in School of Library, Archive and Information
Studies (SLAIS)
• Work began in October 2001 and will continue until
March 2004
• Project team:
– Susan Hockey (Project Director)
– Geoffrey Yeo (Project Manager)
– Anna Sexton (Project Assistant)
Copyright UCL 2002
Introduction to LEADERS
• LEADERS is developing a set of generic computer-based tools
that will use the Internet to deliver integrated user access to
archives
• The tools will be used to develop an on-line environment where
encoded archive finding aids are linked to encoded transcripts
and digitized images of paper-based archival materials
• The archive finding aids will be encoded using Encoded Archival
Description (EAD) and the transcripts of the archive material
itself will be encoded using the Text Encoding Initiative (TEI).
Therefore the tools must be capable of integrating these two
well-known XML- based encoding systems.
Copyright UCL 2002
Project background
Copyright UCL 2002
What are Archives?
• Archives are:
– ‘Groups’ or ‘collections’ of records
generated by a family, an individual, a
business or an institution in the course of
its daily activities, which are preserved,
either because of their importance to the
creator, or for their long-term value to
society. (Source: National Council on
Archives, 2002)
Copyright UCL 2002
What are Archive Finding
Aids?
• Archive finding aids are:
– the end product of the process of archival
description
– information tools containing metadata that
serves to identify, manage, locate and
interpret the records within an archive and
explain the contexts and record systems
from which the archive was selected
Copyright UCL 2002
What is XML?
• XML stands for Extensible Markup
Language. It is:
– a set of rules for designing text formats to structure data
– it is extensible and platform independent
– its syntax is made up of elements (words bracketed by < and
>) and attributes (of the form name=“value”) which can be
used to further modify the elements
– XML documents are tree structures composed of nested
elements (some of which will have attributes attached)
– the document structure can be defined and controlled as a
series of rules in a document type definition (DTD) or
schema
Copyright UCL 2002
What is Encoded Archival
Description (EAD)?
• Encoded Archival Description is:
– a Document Type Definition (DTD) written following the
syntactic rules of Extensible Markup Language (XML) for
producing archive finding aids
– it is a non-proprietary open data structure maintained by the
Library of Congress in partnership with the Society of
American Archivists
– it enables internet delivery of archive finding aids
– it defines and controls the structure of archive finding aids
– it designates the content of archive finding aids
Copyright UCL 2002
EAD and the structure of
finding aids
• the EAD DTD works on the principle that the document structure
of Archive finding aids is hierarchical where nested levels of
description are used to describe the content of the archive in a
whole to part relationship
HIGHEST LEVEL
Description of the
whole archive
Descriptions of component
parts within the archive
Descriptions of smaller component parts within the archive
Descriptions of each individual record within the archive
LOWEST LEVEL
Copyright UCL 2002
EAD and levels of description
Whole
<archdesc>
<dsc>
Part
Part
<c01>
<c01>
Part
Part
Part
Part
<c02>
<c02>
<c02>
<c02>
Part
Part
Part
<c01>
<c01>
<c03>
Part
Part
Part
Copyright UCL 2002
<c02>
<c02>
<c03>
EAD and levels of description
<archdesc
level=“fonds”>
<dsc>
<c01
level=“series”>
<c02
level=“file”>
<c02
level=“file”>
<c01
level=“series”>
<c02
level=“file”>
<c02
level=“file”>
<c03
<c03
<c03
level=“item”>
level=“item”>
level=“item”>
<c03
<c03
<c03
level=“item”>
level=“item”>
level=“item”>
Copyright UCL 2002
EAD and designation of
content
• EAD designates the content of the
finding aid through specified elements
which hold data about the archive and
can be combined at any level in order to
provide the description of the archive
Copyright UCL 2002
EAD and designation of
content
<did>
<bioghist>
<scopecontent>
<arrangement>
<admininfo>
<controlaccess>
<note>
<odd>
<add>
<dao> and <daogroup>
Copyright UCL 2002
Overview of EAD structure
<ead>
<eadheader>
<frontmatter>
<archdesc>
<did>
<bioghist>
<scopecontent>
<organization>
<arrangement>
<admininfo>
<controlaccess>
<note>
<odd>
<add>
<dao> and <daogrp>
<dsc>
<c01>
<did> and so forth……...
<c02>
<did> and so forth…….
Copyright UCL 2002
What is the Text Encoding
Initiative (TEI)?
• Is a standard for the encoding and interchange of
texts
• it is a non-proprietary open data structure sponsored
by the Association for Computers in the Humanities,
the Association for Computational Linguistics and the
Association for Literary and Linguistic Computing
• it enables internet delivery of texts
• it defines and controls the structure of texts
• it designates the content of texts
Copyright UCL 2002
What is the Text Encoding
Initiative (TEI)?
• The TEI consists of a number of Document Type Definition
(DTD) fragments or tag sets written following the syntactic rules
of XML. The DTD fragments can be classified as:
– core DTD fragment
– base DTD fragments
– additional DTD fragments
• In allowing the combination of the core DTD fragment with a
base and an additional DTD fragment of the encoders choice
the TEI DTD remains flexible and customizable according to
need
• There is also a TEILite DTD which contains a ‘light’ generic tag
set suitable for use with most types of texts where complex or
specialized encoding is not wanted/needed
Copyright UCL 2002
TEI and the structure of texts
• Like EAD, TEI uses nested elements to
replicate/create the structure of a text. The body of a
text can be hierarchically divided into its component
parts using the <div> elements.
• For example:
...
<text>
<body>
<div1 type=“act”></div1>
<div2 type=“scene”><div2>
</text>
<body>
...
Copyright UCL 2002
TEI and content markup
• Core content markup elements
available in TEI are:
– Names
– Dates
– Abbreviations
– Quotations
– Foreign languages
– notes
Copyright UCL 2002
TEI and dealing with
omissions, deletions and
additions
Copyright UCL 2002
Overview of TEI’s structure
• A minimal TEI document includes:
<TEI.2>
<teiheader></teiheader>
<text>
<body></body>
</text>
</TEI.2>
Copyright UCL 2002
Why is there a need for
LEADERS and what will the
project achieve?
Copyright UCL 2002
Why is there a need for
LEADERS?
• The LEADERS project is taking two significant leaps
forward in the provision of remote access to archives
for users:
– the provision of tools that can build an integrated system which
brings together the benefits of EAD and the benefits of TEI to allow
the user to find items in archival collections, learn about their
contexts, view representations of the material and analyze and
manipulate their content within one single interface
– the provision of tools that are generic and can cope with a wide
variety of types of archival material rather than being developed to
suit one particular archive or one particular type of document
Copyright UCL 2002
Project aims and objectives
•
•
•
•
•
•
•
•
•
LEADERS aims to:
Analyze user needs in relation to remote access to archives
Identify key functions which users apply to archival material
Implement a toolset that can create an environment which links
EAD encoded finding aids, TEI encoded transcripts and digital
representations of archive items that is sensitive to user needs
Implement tools to assist users to work with archival material
Provide recommendations for the use of TEI with archival
material
Provide recommendations for integration between EAD and TEI
Provide documentation for archivists so that they can use the
tools with their own archives
Provide documentation for users of the environment
Copyright UCL 2002
Project deliverables
• LEADERS will deliver:
• A methodology for linking finding aids to
transcriptions and images of archival material
• A set of tools for manipulation of archival material
• An on-line working model of the unified interface for
the linked finding aids, transcriptions and images
• Documentation for the entire set of specifications and
tools
• Training materials for archivists and users
Copyright UCL 2002
Outline of Project Progress
Copyright UCL 2002
Responding to user needs
• Developing a tool-set that is capable of
building an on-line environment for
accessing archival material that is
designed in-line with user needs
• Research questions raised:
– How do can we categorize users of
archives?
– How we can gather feedback from users?
– How can we analyze that feedback?
Copyright UCL 2002
Using TEI with archival
material
• Provide recommendations for the use of
TEI with archival material
• Research questions raised:
– What range of document types can be
found in archives?
– What features in archival material cannot
be represented using the current TEI tag
sets?
Copyright UCL 2002
Integrating EAD and TEI
• Provide recommendations for
integrating TEI and EAD
• Research questions raised:
– What information should be contained
within the TEIHEADER?
– Where should links occur between EAD
and TEI?
Copyright UCL 2002
Maximizing EAD’s potential in
an on-line environment
• Research questions raised:
– What happens when a researcher uses the
finding aid in a non-linear sequence via
search rather than browse options?
– How can we make the most of linking
technology to produce a user-friendly
finding aid
Copyright UCL 2002
Summary of progress
• Methodology for the categorization of
users
• Encoding for the pilot
• Project web-site
Copyright UCL 2002
Conclusion
Copyright UCL 2002
Download