LEADERS: Linking EAD to Electronically Retrievable Sources Progress to Date Anna Sexton 15 May 2002 Copyright UCL 2002 Talk Outline • Introduction to LEADERS • Project Background: unpacking the concepts and technologies underpinning the project • Discussion of why LEADERS is needed with overview of project aims, objectives and deliverables • Outline of progress with overview of key project research questions Copyright UCL 2002 Introduction to LEADERS Copyright UCL 2002 Introduction to LEADERS • Project funded by the Arts and Humanities Research Board (AHRB) • Based in School of Library, Archive and Information Studies (SLAIS) • Work began in October 2001 and will continue until March 2004 • Project team: – Susan Hockey (Project Director) – Geoffrey Yeo (Project Manager) – Anna Sexton (Project Assistant) Copyright UCL 2002 Introduction to LEADERS • LEADERS is developing a set of generic computer-based tools that will use the Internet to deliver integrated user access to archives • The tools will be used to develop an on-line environment where encoded archive finding aids are linked to encoded transcripts and digitized images of paper-based archival materials • The archive finding aids will be encoded using Encoded Archival Description (EAD) and the transcripts of the archive material itself will be encoded using the Text Encoding Initiative (TEI). Therefore the tools must be capable of integrating these two well-known XML- based encoding systems. Copyright UCL 2002 Project background Copyright UCL 2002 What are Archives? • Archives are: – ‘Groups’ or ‘collections’ of records generated by a family, an individual, a business or an institution in the course of its daily activities, which are preserved, either because of their importance to the creator, or for their long-term value to society. (Source: National Council on Archives, 2002) Copyright UCL 2002 What are Archive Finding Aids? • Archive finding aids are: – the end product of the process of archival description – information tools containing metadata that serves to identify, manage, locate and interpret the records within an archive and explain the contexts and record systems from which the archive was selected Copyright UCL 2002 What is XML? • XML stands for Extensible Markup Language. It is: – a set of rules for designing text formats to structure data – it is extensible and platform independent – its syntax is made up of elements (words bracketed by < and >) and attributes (of the form name=“value”) which can be used to further modify the elements – XML documents are tree structures composed of nested elements (some of which will have attributes attached) – the document structure can be defined and controlled as a series of rules in a document type definition (DTD) or schema Copyright UCL 2002 What is Encoded Archival Description (EAD)? • Encoded Archival Description is: – a Document Type Definition (DTD) written following the syntactic rules of Extensible Markup Language (XML) for producing archive finding aids – it is a non-proprietary open data structure maintained by the Library of Congress in partnership with the Society of American Archivists – it enables internet delivery of archive finding aids – it defines and controls the structure of archive finding aids – it designates the content of archive finding aids Copyright UCL 2002 EAD and the structure of finding aids • the EAD DTD works on the principle that the document structure of Archive finding aids is hierarchical where nested levels of description are used to describe the content of the archive in a whole to part relationship HIGHEST LEVEL Description of the whole archive Descriptions of component parts within the archive Descriptions of smaller component parts within the archive Descriptions of each individual record within the archive LOWEST LEVEL Copyright UCL 2002 EAD and levels of description Whole <archdesc> <dsc> Part Part <c01> <c01> Part Part Part Part <c02> <c02> <c02> <c02> Part Part Part <c01> <c01> <c03> Part Part Part Copyright UCL 2002 <c02> <c02> <c03> EAD and levels of description <archdesc level=“fonds”> <dsc> <c01 level=“series”> <c02 level=“file”> <c02 level=“file”> <c01 level=“series”> <c02 level=“file”> <c02 level=“file”> <c03 <c03 <c03 level=“item”> level=“item”> level=“item”> <c03 <c03 <c03 level=“item”> level=“item”> level=“item”> Copyright UCL 2002 EAD and designation of content • EAD designates the content of the finding aid through specified elements which hold data about the archive and can be combined at any level in order to provide the description of the archive Copyright UCL 2002 EAD and designation of content <did> <bioghist> <scopecontent> <arrangement> <admininfo> <controlaccess> <note> <odd> <add> <dao> and <daogroup> Copyright UCL 2002 Overview of EAD structure <ead> <eadheader> <frontmatter> <archdesc> <did> <bioghist> <scopecontent> <organization> <arrangement> <admininfo> <controlaccess> <note> <odd> <add> <dao> and <daogrp> <dsc> <c01> <did> and so forth……... <c02> <did> and so forth……. Copyright UCL 2002 What is the Text Encoding Initiative (TEI)? • Is a standard for the encoding and interchange of texts • it is a non-proprietary open data structure sponsored by the Association for Computers in the Humanities, the Association for Computational Linguistics and the Association for Literary and Linguistic Computing • it enables internet delivery of texts • it defines and controls the structure of texts • it designates the content of texts Copyright UCL 2002 What is the Text Encoding Initiative (TEI)? • The TEI consists of a number of Document Type Definition (DTD) fragments or tag sets written following the syntactic rules of XML. The DTD fragments can be classified as: – core DTD fragment – base DTD fragments – additional DTD fragments • In allowing the combination of the core DTD fragment with a base and an additional DTD fragment of the encoders choice the TEI DTD remains flexible and customizable according to need • There is also a TEILite DTD which contains a ‘light’ generic tag set suitable for use with most types of texts where complex or specialized encoding is not wanted/needed Copyright UCL 2002 TEI and the structure of texts • Like EAD, TEI uses nested elements to replicate/create the structure of a text. The body of a text can be hierarchically divided into its component parts using the <div> elements. • For example: ... <text> <body> <div1 type=“act”></div1> <div2 type=“scene”><div2> </text> <body> ... Copyright UCL 2002 TEI and content markup • Core content markup elements available in TEI are: – Names – Dates – Abbreviations – Quotations – Foreign languages – notes Copyright UCL 2002 TEI and dealing with omissions, deletions and additions Copyright UCL 2002 Overview of TEI’s structure • A minimal TEI document includes: <TEI.2> <teiheader></teiheader> <text> <body></body> </text> </TEI.2> Copyright UCL 2002 Why is there a need for LEADERS and what will the project achieve? Copyright UCL 2002 Why is there a need for LEADERS? • The LEADERS project is taking two significant leaps forward in the provision of remote access to archives for users: – the provision of tools that can build an integrated system which brings together the benefits of EAD and the benefits of TEI to allow the user to find items in archival collections, learn about their contexts, view representations of the material and analyze and manipulate their content within one single interface – the provision of tools that are generic and can cope with a wide variety of types of archival material rather than being developed to suit one particular archive or one particular type of document Copyright UCL 2002 Project aims and objectives • • • • • • • • • LEADERS aims to: Analyze user needs in relation to remote access to archives Identify key functions which users apply to archival material Implement a toolset that can create an environment which links EAD encoded finding aids, TEI encoded transcripts and digital representations of archive items that is sensitive to user needs Implement tools to assist users to work with archival material Provide recommendations for the use of TEI with archival material Provide recommendations for integration between EAD and TEI Provide documentation for archivists so that they can use the tools with their own archives Provide documentation for users of the environment Copyright UCL 2002 Project deliverables • LEADERS will deliver: • A methodology for linking finding aids to transcriptions and images of archival material • A set of tools for manipulation of archival material • An on-line working model of the unified interface for the linked finding aids, transcriptions and images • Documentation for the entire set of specifications and tools • Training materials for archivists and users Copyright UCL 2002 Outline of Project Progress Copyright UCL 2002 Responding to user needs • Developing a tool-set that is capable of building an on-line environment for accessing archival material that is designed in-line with user needs • Research questions raised: – How do can we categorize users of archives? – How we can gather feedback from users? – How can we analyze that feedback? Copyright UCL 2002 Using TEI with archival material • Provide recommendations for the use of TEI with archival material • Research questions raised: – What range of document types can be found in archives? – What features in archival material cannot be represented using the current TEI tag sets? Copyright UCL 2002 Integrating EAD and TEI • Provide recommendations for integrating TEI and EAD • Research questions raised: – What information should be contained within the TEIHEADER? – Where should links occur between EAD and TEI? Copyright UCL 2002 Maximizing EAD’s potential in an on-line environment • Research questions raised: – What happens when a researcher uses the finding aid in a non-linear sequence via search rather than browse options? – How can we make the most of linking technology to produce a user-friendly finding aid Copyright UCL 2002 Summary of progress • Methodology for the categorization of users • Encoding for the pilot • Project web-site Copyright UCL 2002 Conclusion Copyright UCL 2002