LIFECYCLE METADATA FOR DIGITAL OBJECTS Danielle Cunniff Plumer School of Information The University of Texas at Austin Summer 2014 Class Introductions • Identify • Yourself • Types of digital projects you have worked on or are interested in • Your role in these digital projects • Your experience with metadata Metadata • “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource.” • Key Concepts: • Structured information • Ease of use • Formal standards Source: NISO. (2004) Understanding Metadata. Bethesda, MD: NISO Press, p.1 Functions of Metadata • List: Metadata • A good object has associated metadata. • A good object will have descriptive and administrative metadata, and compound objects will have structural metadata to document the relationships between components of the object and ensure proper presentation and use of the components. Source: NISO. (2007). “Objects Principle 6” in A Framework of Guidance for Building Good Digital Collections. Bethesda, MD: NISO Press. Types of Metadata • Descriptive • Structural • Administrative • Technical • Rights Management • Preservation Source: NISO. (2004) Understanding Metadata. Bethesda, MD: NISO Press, p.1 Metadata Standards • Interoperability and object exchange require the use of established standards • Semantic: • Syntactic: • Format: • Exchange: Content Standards Metadata Schemas Machine encoding Interface protocols • XML is a common method for exchanging metadata descriptions on the Internet • Others include: JSON, RDF Metadata Components Human Content Standard Syntax/Schema Data Format Machine Data Exchange Insert Joke about Standards "Fortunately, the charging one has been solved now that we've all standardized http://xkcd.com/927/ on mini-USB. Or is it micro-USB? Shit." Plan of Course: Schedule Week Topic Lab 1 Introduction Command Line Tools (1) 2 Metadata in Digital Objects Command Line Tools (2) 3 Metadata and Markup XML and RDF 4 Content Standards Descriptive Metadata (ASpace) 5 Authority Records EAC-CPF (ASpace) 6 Controlled vocabularies Controlled vocabularies (ASpace) 7 Ontologies and Linked Data Linked Data 8 Digital Forensics BitCurator 9 Student Presentations 10 Student Presentations Plan of Course: Readings • Required: • Baca, Murtha, ed. 2008. Introduction to Metadata (Online Edition, Version 3.0). Los Angeles: Getty Publications. Retrieved from http://www.getty.edu/research/publications/electronic_publications/intrometadat a/setting.html • Society of American Archivists. 2013. Describing Archives: A Content Standard (DACS). Second Edition. Chicago: Society of American Archivists. Retrieved from http://files.archivists.org/pubs/DACS2E-2013.pdf • • Optional: • Dow, Elizabeth. 2005. Creating EAD-Compatible Finding Guides on Paper. Scarecrow Press. • Roe, Kathleen. 2005. Arranging and Describing Archives and Manuscripts (Archival Fundamentals Series II). Chicago: Society of American Archivists. • • Additional readings will be assigned for specific topics. Plan of Course: Assignments Assignment Due Date Class participation • Lab work Percent of Grade 25% Ongoing Markup Assignment 25% • Authorities (EAC-CPF) July 9 • Finding Aid (EAD) July 21 • Tutorial (ArchivesSpace) July 28 Seminar Paper 25% • Presentation August 4-11 • Paper August 4 Exams • Midterm July 16 10% • Final August 13 15% Questions? Using Metadata • Description • To (more or less) uniquely identify an item • To identify the parts of an item and their relationship to the whole Author Davis, Ellis A. Title The encyclopedia of Texas, compiled and edited by Ellis A. Davis and Edwin H. Grobe. Imprint Dallas, Texas Development Bureau [1922?] Using Metadata • Location • To show where to find the item • Call number • Archival container • Storage unit • Uniform Resource Indicator Library: Location: Identifier: UNT Libraries WILLIS 4FL TEXANA COLL 976.4 D292t V. 1 Using Metadata • Condition • To document the condition of an item at a given time • To record any actions taken with respect to the item’s condition Binding: Full-Leather Book Condition: Fair Jacket Condition: No Jacket Using Metadata • Use • To explain conditions of use for an item • Based on condition • Based on rights Status: LIB USE ONLY Rights: Public Domain based on publication date of 1922. Digital is Different • Does the metadata describe the physical item or the digital item, or both? • Physical item • Metadata as “surrogate” for the physical item • Metadata for inventory of and access to the physical item • Metadata aggregated for use in “union catalogs” • Digital item • Metadata as a component of the digital object itself • Metadata used as a way of pointing to the digital object from metadata aggregations Metadata and Digital Objects • Metadata can be: • In the digital object • File headers, e.g., TIFF, EXIF; EAD, TEI headers • Near the digital object • Same directory, hard drive, network • Thousands of miles from the digital object • On another network, in another state, in another country Email Metadata (1 of 3) Delivered-To: dcplumer@utmail.utexas.edu Received: by 10.220.251.201 with SMTP id mt9csp166864vcb; Mon, 9 Jun 2014 10:47:13 -0700 (PDT) X-Received: by 10.60.63.110 with SMTP id f14mr27889922oes.8.1402336033290; Mon, 09 Jun 2014 10:47:13 -0700 (PDT) Return-Path: <dcplumer@gmail.com> Received: from angband.mail.utexas.edu (angband.mail.utexas.edu. [146.6.25.8]) by mx.google.com with ESMTPS id eb3si28108314oeb.17.2014.06.09.10.47.12 for <dcplumer@utmail.utexas.edu> (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 09 Jun 2014 10:47:13 -0700 (PDT) Received-SPF: pass (google.com: domain of dcplumer@gmail.com designates 209.85.216.170 as permitted sender) client-ip=209.85.216.170; Authentication-Results: mx.google.com; spf=pass (google.com: domain of dcplumer@gmail.com designates 209.85.216.170 as permitted sender) smtp.mail=dcplumer@gmail.com; dkim=pass header.i=@gmail.com X-Utexas-Sender-Group: None X-IronPort-MID: 87878886 X-SBRS: 4.9 X-Utexas-Seen-Inbound: true Email Metadata (2 of 3) Received: from mail-qc0-f170.google.com ([209.85.216.170]) by angband.mail.utexas.edu with ESMTP/TLS/RC4-SHA; 09 Jun 2014 12:47:06 -0500 Received: by mail-qc0-f170.google.com with SMTP id l6so1040781qcy.1 for <dcplumer@utexas.edu>; Mon, 09 Jun 2014 10:47:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to:content-type; bh=Wwk4SOA8TwweO+8MVa0kj6m8+yIRwdAxIdQANX5SDiA=; b=jBD86rqYhja+NT1vAffPy1U5qHwzAGQ3PGy6ITs4hzkBUYATlSt65ebQQAb/3as2V8 gDBzcgcpDkJNnQEpwHwbqvkFfjTwquSJWXJVYOtbvefIkqFV/NpSfW8qda7NGcOsjT50 H8iGNnG1CZ3ijW9HvpelUmr7FkqeJX+1RF6q3R+15RsoB837cXmwhTahRXGpNt/hsrWF s/PMVXPR55JnOm8CQ3/uRhWmfceiSkvPUL/zYwfxw46TFDp7gY7ubZyICni6sbIyJ06t TnXdb6WBbfvkpnxa5pdvpcCi3YVB5tgEfVuub7+c75rez8P3hkb7aBzJ5P721GGwu4BT aasg== X-Received: by 10.224.162.212 with SMTP id w20mr5870278qax.50.1402336025270; Mon, 09 Jun 2014 10:47:05 -0700 (PDT) MIME-Version: 1.0 Email Metadata (3 of 3) Sender: dcplumer@gmail.com Received: by 10.229.39.72 with HTTP; Mon, 9 Jun 2014 10:46:25 -0700 (PDT) From: danielle plumer <danielle@dcplumer.com> Date: Mon, 9 Jun 2014 12:46:25 -0500 X-Google-Sender-Auth: 5r4G-BoZO-kUgBWPn7LKoFFOomo Message-ID: <CAAJtrZYB5rPf+2joUY=P5+C=HE9sWwSr=1QqV7VyG_GTH5GTMw@mail.gmail.com> Subject: Test message To: dcplumer@utexas.edu Content-Type: multipart/alternative; boundary=089e013cb7cea45fbb04fb6accb2 --089e013cb7cea45fbb04fb6accb2 Content-Type: text/plain; charset=UTF-8 This is a test message to see what kinds of metadata are embedded in an email. --089e013cb7cea45fbb04fb6accb2 Content-Type: text/html; charset=UTF-8 <div dir="ltr">This is a test message to see what kinds of metadata are embedded in an email.</div> --089e013cb7cea45fbb04fb6accb2-- Twitter Metadata http://online.wsj.com/public/resources/documents/TweetMetadata.pdf Identifiers as Metadata • A good object will be named with a persistent, globally unique identifier that can be resolved to the current address of the object. • Good identifiers will at minimum be locally unique, so that resources within the digital collection or repository can be unambiguously distinguished from each other. • Global uniqueness can then be achieved through the addition of a globally unique prefix element, such as a code representing the organization. Source: NISO. (2007). “Objects Principle 4” in A Framework of Guidance for Building Good Digital Collections. Bethesda, MD: NISO Press. Identifiers in Metadata • The description of a digital object must be specific enough to distinguish it from similar objects • Unique identifiers make this easier • The link between a digital object and its metadata must exist and be maintained over the entire lifespan of the object • Persistent identifiers are essential Exercise: Physical & Digital • Break into groups • Choose one of the objects listed (view online) • http://goo.gl/BpkS1U • For each object, create two records: • Surrogate record for the physical object in a library, museum, or archives. • Metadata record for the digitized object available online. • Describe the differences between the two records Plan for Thursday • Read: • Stephenson, Neal. (1999). In the beginning was the command line. Available from http://www.cryptonomicon.com/beginning.html