Integrating EAD and TEI: the resolution of metadata overlaps Anna Sexton ALLC/ACH 2003 Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Talk outline • • • • • Introduction to LEADERS Representing archives using EAD and TEI Problem: Overlaps between EAD and TEI Solution: Enriching EAD Implications for search and retrieval Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Introduction to LEADERS • Aim of LEADERS: – To enhance online access to archival material • Linking of EAD, EAC, TEI: bringing together finding aids, authority records, transcripts and digital images of archive documents • Re-usable toolkit for archivists • Demonstrator application – Selected documents from the Orwell Archive and UCL’s own administrative archive used as test-bed material Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Representing archives using TEI and EAD • Archives are collections of documents selected and preserved because they have long-term value • Finding aids provide access to archive collections through information that identifies, locates and interprets the material and explains the context and record systems from which the material has been selected Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Representing archives using EAD and TEI • EAD is an encoding standard used to structure and exchange electronic finding aids • As a stand-alone tool cannot give users access to the actual content of archive documents • Integration with TEI opens up many possibilities for the user Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Representing archives using TEI and EAD • TEI is an encoding framework that enables the creation of digital texts • When EAD and TEI are brought together: – Within a single environment the user can find items in archive collections; learn about their contexts; view representations of the items themselves; and read, study, analyze and manipulate their content Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Problem: overlaps between EAD and TEI • Overlap in relation to metadata that: – Identifies, locates and gives details about the creation of the original document – Describes the physical characteristics of the original document – Provides contextual information about the creator and participants within the original document – Interprets/describes the data in the original document Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Possible solutions to overlaps • Can deal with the overlaps in a number of ways: – Use the EAD framework and the <teiHeader> as they are intended and accept that metadata about the original document will appear in both – Use the EAD framework to hold metadata about the original document and use the <teiHeader> to hold information about the TEI transcript – Use TEI for all metadata – Use EAD for all metadata Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Solution: enriching EAD • LEADERS is using EAD as the overall metadata framework, whilst enriching at the lowest level of description so that EAD can act as an adequate holding place for metadata relating to the original documents and their derived digital representations Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Metadata relationships between the original, the transcript and the image Content Metadata Original Archive Document Contextual, Identification and Administrative Metadata Copyright, UCL TEI Transcript Image Contextual, Identification and Administrative Metadata Contextual, Identification and Administrative Metadata LEADERS: Linking EAD to Electronically Retrievable Sources Enriching EAD’s <altformavail> • According to the EAD tag library <altformavail> should be used to hold: – Information about copies of the [original] materials being described, including the type of alternative form, significant control numbers, locations and source for ordering if applicable. The additional formats are typically microforms, photocopies or digital reproductions Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources LEADERS finding aid Schema Inputs EAD DTD Output LEADERS EAD-based Schema for encoding finding aids Within EAD’s <altformavail> at item level: TEI Encoding Framework NISO MIX Schema Copyright, UCL xmlns: TEI encoding framework xmlns: NISO MIX Schema LEADERS: Linking EAD to Electronically Retrievable Sources • Example item level description from the finding aid for the Orwell Papers • XML Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Recreating standalone digital objects • <teiHeader> made redundant • Raises a question relating to the reusability of the TEI transcript – Solution: derive new xml file using XSLT combining the relevant part of the finding aid and the transcript to create a new TEI conformant file Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Search and retrieval issues • Overlap between content encoding in the TEI text and <controlaccess> terms in EAD • Both could be used as index terms for search and retrieval • Combine TEI <name> tags and EAD <controlaccess> tags to produce lists of index terms Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Conclusion • Our solution to overlaps: – Avoids repetition of the same information – Clearly differentiates between metadata relating to original archive documents and metadata relating to their derived digital forms – Makes use of EAD’s comprehensive framework in describing archival material as whole collections made up of component parts – Recognises the need to allow for re-use of resources Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources