Lifecycle Metadata for Digital Objects November 6, 2006 Descriptive Metadata: “Modeling the World” Descriptive metadata for what? WWW: now seen as the ONE place to find everything Descriptive metadata provides: Unique identification for a resource Information permitting evaluation/selection of a resource Information describing all “essential properties” WWW: How to find things What does search mean on the WWW? How to support its multiple purposes? Failure of search engines to render precise results (problems of scale; is this true?) Failure of HTML metatags (spamming) Solution local expert cataloging (providing access points) making local cataloging available remote free-text searching (inferring access points) Dublin Core and its limitations Warwick Framework and RDF Universal Semantic Web Some metadata examples Individual objects (Dublin Core and its derivatives) Multimedia and/or complex objects (METS/MPEG21) Books and other chunks of information (MARC) Finding aids (EAD) Semantic Web Berners-Lee’s vision for the Web: basically machine-understandable metadata about meaning for everything on the Web http://www.semaview.com/d/Semweb_ Illustrated.pdf Aside on cataloging Cataloging systems as relatively static: relationships remained tacit and externally specified Classification systems Controlled vocabularies Name authorities Note all of these can be represented in XML as specific namespaces (MARC, MODS, etc.) New methods aren’t that different: ontologies for the Semantic Web are also namespaces--but ones that are much more specific about actions Ontologies Like previous classification systems, they are being built by hand General (Cyc) and domain-specific (especially for B to B, web services) Ontologies establish a joint terminology between members of a community of interest Ontologies specify domain knowledge in terms of formal logic that includes actions by and among entities Ontologies will be used to guide extraction of semantic content from texts (and perhaps automatic generation of metadata) Topic Maps Representation of information using topics, associations, and occurrences Note how this “triples” representation fits well with RDF (entity, relationship, entity) An XML representation: XTM An (older) ISO standard: ISO/IEC 13250:2003 Related to ontologies and mind maps; designed to “map” semantic regions Web Services How to provide processing services over the WWW: XML and HTTP infrastructure passing remote procedure calls UDDI (Universal Description Discovery and Information) is the registry of services WSDL (Web Services Description Language) allows “advertisement” of services (in XML, of course) in the UDDI registry SOAP (Simple Object Access Protocol) is the XML wrapper for requests sent to services Example: DC metadata registry: http://dublincore.org/dcregistry/ Does what we know fit into this? DC and derivatives are aimed at the single object (though not always used for it) and are frequently used in WWW contexts (cf. Warwick Framework ≈ RDF namespaces) EAD describes descriptions of aggregate chunks of information (chunked in terms of “series” or “collections”) but can describe single objects MARC/MODS describes aggregate chunks of information (chunked in the form of “books”) METS and MPEG21 are frames for multiple and multimedia objects Granularity Granularity governs the level at which metadata can be descriptive Metadata granularity tends to be finer for digital objects Digital objects cannot be managed without individual granularity (thank you David Bearman) EAD: Describing descriptions What is a finding aid? Describing a finding aid so it can be searched Expanding a finding aid to accommodate individual granularity Is it efficient to drill down through a finding aid to individual objects? Can EAD be searched from the bottom up? EAD Schizophrenia Because it describes finding aids, it has retained concern with look and feel Mixes granular conceptual description with box/folder lists for physical (and contingent) object arrangements Lack of granularity is expressed in the possibility of writing narrative with <p> tags everywhere MARC: Chunked packages International Standard Bibliographic Description (ISBD) as parent of MARC, TEI MODS: User-friendly MARC? subset of MARC elements (20), language-based tags MARC as descriptive metadata Bibliographical detail for the work Bibliographical detail for the specific instance of the work (cf. FRBR) Places the work within one or many classificatory systems (ontologies, controlled vocabularies, authority lists) But alas! Not consistent! METS: Multimedia/Multiversion METS developed to express “archival bond” among objects related to one another as a single work (cf. FRBR, Warwick Framework, RDF) Reflects concerns of digital librarians who want to make a wide range of versions available Standard form: General descriptive metadata for package Object link Object type Specific descriptive metadata set(s) for specific kinds of objects What about the single object? Is Dublin Core enough? Outdated? (15 elements) What about derivatives? Qualified DC, DC profiles Australian elements (20) Why describe the single object? Who will describe at the object level? Zillions of archivists? Authors? Automatic analysis (ontology-driven)? Wisdom of Crowds vs Long Tail The wisdom of crowds: tagging as democratized subject catalogin The long tail: specialist cataloging for small niche groups, now visible online