TEI, EAD and Integrated User Access to Archives: Towards a Generic Toolset Chris Turner Anna Sexton Susan Hockey Geoffrey Yeo LEADERS Project, School of Library, Archive and Information Studies, University College London Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Agenda • • • • Introduction to LEADERS User Testing TEI for Archives TEI/EAD Integration Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Linking EAD to Electronically Retrievable Sources • The LEADERS project aims to enhance remote user access to archives by providing the means to present archival source materials within their context. • Funded by the Arts and Humanities Research Board (AHRB) • Located in UCL SLAIS • Completion March 2004 Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Leaders Project Deliverables • Encoding – Encoded finding aids and source materials with images • Tools – A suite of tools for archivists to support the encoding and online presentation of archival source materials and finding aids • Application – A demonstrator application for search, retrieval and presentation of encoded materials in the Web environment • User Testing – User testing will provide on-going feedback for the development process. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Encoding • Text Encoding Initiative (TEI) – For archival source materials • Encoded Archival Description (EAD) – For finding aids • NISO Metadata for Images in XML (MIX) Schema – For digital images metadata Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Tools: Design considerations • Generic – should be usable on other projects for encoding other resources • Re-usability of encoded resources – to facilitate maximum return on the encoding effort • Platform independent – to minimise restrictions on deployment • Where possible to simplify/reduce encoding effort Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Tools: Technology • XML Schema – Use of namespaces – Schema will provide a generic and re-usable means to encode resources • XSLT/CSS – Style sheets will provide a means to manipulate, transform and present the encoded resources, thus supporting re-purposing of encoded materials • WSDL/SOAP – Incorporating ‘self-describing services’ will allow multiple applications to be constructed to make different use of the encoded materials Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Application: Objectives • Demonstrator – a sample application to show what can be produced/generated from the encoded materials and the toolset • Basic search and retrieval and alternative presentations to show the possibilities of TEI/EAD encoded resources Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Application: Technology • Tools will allow use of Microsoft or Apache/Java development environment • Time/££ require choice of one • Microsoft .Net Framework – ASP.NET/C# Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources User Testing • • • User centred perspective Gaining a representative sample of archive users Categorisation of user types – – – Purpose of research Primary area of interest Familiarity with: • • • • • Area of interest Archival finding aids and documents The Internet Qualitative Techniques Administration and feedback to development process Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI for Archives: Research Methodology • Analysis of commonly occurring structures, features and contents found within a range of different types of archive source material • Material held in UCL’s Special Collections and Record Office • Expect to analyse material held in other archival repositories to validate initial findings and uncover as wide range of encoding challenges as possible Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI for Archives: Preliminary findings • ‘Transcription of Primary Sources’ tag set in TEI can deal with a wide range of encoding challenges inherent in archival material: – Complex additions, deletions and corrections – Gaps within and damage to the text – Changes in document hands, style and character of writing Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI for Archives: Preliminary findings • Need to explore encoding options for: – ‘layered data’ – Textual and numerical data presented in complex tables – Formulae and mathematical expressions within the text • Examining TEI and other DTDs specifically built to handle these structures and features Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Exploring solutions for encoding ‘layered data’ • ‘Layered data’: when an underlying layer of data is used as the basic structure onto which further data [other layer(s)] is applied – Accounts and registers – Address books – Calendars – Questionnaires and forms Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Example of ‘layered data’ Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Encoding objectives for ‘layered data’ • Objective 1: layers within the document should be explicitly differentiated and their differences should be documented • Objective 2: the relationships between data in different layers must be explicit Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Example encoding <tei.2> <teiHeader> <!--…--!> <encodingDesc><layerDesc> <layer id=“lay1”>Form printed by University College London <layer id=“lay2”>Handwritten responses to form filled in by <name>Babut, Marie <!--…--!> <text><body> <header layer=“lay1”>University College London <instruction layer=“lay1”>Form to be filled up by person wishing to become a Student of the College (so far as it may apply in his or her case) <dataSegment layer=“lay1”> <prompt layer=“lay1”>Name in full <response layer=“lay2”><name reg=“Babut, Marie”>Marie Babut <!--…--!> Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI/EAD Integration: What is the purpose of EAD? • EAD is a metadata standard for the creation of tools (finding aids) that contain information that identifies, manages, locates and describes archive documents within archive collections and explains the contexts and records systems from which the documents have been selected. Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources EAD/TEI Integration: What is the purpose of TEI? • TEI is a content encoding standard for the creation of ‘objects of study’. TEI ‘objects of study’ are usually derived from one or more original ‘objects of study’ (e.g. an archive document within an archive collection) • TEI is also a metadata standard which seeks to put the new object into the context of why and how it has been created, what it has been derived from (e.g. the original archive document), and what the data within the object represents Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI/EAD Integration: Overlaps • Overlaps between EAD and TEI occur in relation to metadata that: – Identifies, locates and describes the creation of the original archive document – Describes the physical characteristics of the original object – Provides contextual information about the creator of the original object and the participants within the object – Interprets/describes the data in the object Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources TEI/EAD Integration: Detailed Analysis of Overlaps • Elements that interpret/describe the actual data within the object EAD Elements Name of immediate Name of child parent element elements <controlaccess> <genreform> <geogname> <persname> <famname> <corpname> <occupation> <subject> <date> <function> N/A <scopecontent> Copyright, UCL Overlapping TEI elements Name of immediate Name of child elements parent element(s) <profileDesc> within <keywords><classcode> <teiHeader> classref> <textDesc> within <profileDesc> within <teiHeader> <channel> <constitution> <domain> <factuality> <preparadness> <purpose> <text> <name><date> LEADERS: Linking EAD to Electronically Retrievable Sources TEI/EAD Integration: Objectives • Avoidance of repetition of information within EAD and TEI • The ability to make the EAD finding aid and the TEI transcript stand-alone tools • Effective search and retrieval across EAD and TEI Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Summary • Ground breaking work: – Archive user categorisation – Identification of generic structures, features and physical characteristics in archival documents to facilitate use of TEI – Overlap/integration of EAD and TEI – Use of Web-based technologies to create generic, reusable solutions • Interim reports, designs and coding examples Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources