Metadata Bridget Jones Information Architecture I February 23, 2009 Metadata •Terms •Meta Tag Birth •Schema •Implications Terms Metadata Dr. Warwick Cathro http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1019/1289 Terms Metadata “Metadata is structured data which describes the characteristics of a resource. It shares many similar characteristics to the cataloguing that takes place in libraries, museums and archives…The term "meta" derives from the Greek word denoting a nature of a higher order or more fundamental kind. ” University of Queensland http://www.library.uq.edu.au/iad/ctmeta4.html Terms Metadata Schemes Each metadata schema will usually have the following characteristics: – a limited number of elements – the name of each element – the meaning of each element University of Queensland http://www.library.uq.edu.au/iad/ctmeta4.html Terms XML Designed to structure, store and transport data, it’s a container for other data standards and schema. It is a query language. Tags are defined by either a DTD or XSD file. IT Lab Short Course META Tag Birth 1996 W3C Distributed Indexing and Searching Workshop • Reps from Dublin Core, Lycos, Microsoft, WebCrawler, IEEE and Verity. • They wanted to embed metadata information in HTML without changing browser software or the way robots collected data. www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2 META Tag Birth Robots? META Tag Birth The convention agreed upon is as follows: < META NAME = “schema_identifier.element_name" CONTENT = "string data" > www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2 META Tag Birth <HEAD profile="http://www.acme.com/profiles/core"> <TITLE>How to complete Memorandum cover sheets</TITLE> <META name="author" content="John Doe"> <META name="copyright" content="&copy; 1997 Acme Corp."> <META name="keywords" content="corporate,guidelines,cataloging"> <META name="date" content="1994-1106T08:49:37+00:00"> </HEAD> http://www.w3.org/TR/REC-html40/struct/global.html META Tag Birth Thus, a partial Dublin Core citation might be encoded as follows: < META NAME = "DC.title" CONTENT = "HTML 2.0 Specification" > < META NAME = "DC.author" CONTENT = "Tim Berners-Lee" > www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2 META Tag Birth A sequence of META tags describing a resource is incomplete without one such LINK tag for each different prefix appearing in the sequence. The previous example could be considered complete with the addition of these two LINK tags: <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.0/"> <link rel = "schema.AC" href = "http://metadata.net/ac/2.0/"> Kunze (1999) http://www.ietf.org/rfc/rfc2731.txt META Tag Birth In general, the association takes the form <link rel = "schema.PREFIX" href = "LOCATION_OF_DEFINITION"> Kunze (1999) http://www.ietf.org/rfc/rfc2731.txt META Tag Birth Metadata with no scheme attached <meta name=“keywords” content=“gossip, celebrities, fashion week, reality television”> META Tag Birth Articles Appear • Bremser, W. (1997). Gain fame with META tags. Internet World, 8(10), 94-96. • Rapoza, J. (1997). How to improve site searches. PC Week, 14(19), 27. • Raeder, A. (1997). Promoting your Web site. Searcher, 5(7), 63-66. Schema MARC - 1970 Schema Dublin Core Metadata Initiative “DCMI promotes the adoption of standardized approaches to metadata and the architectures which support the creation and exchange of interoperable metadata. As evidence of this, DCMI metadata has successfully pursued standardization in various national and international standards venues, including the International Organization for Standardization (ISO), the Internet Engineering Task Force (IETF), the European Committee for Standardization (CEN) and the US National Information Standards Organization (NISO). ” http://dublincore.org/about/ Schema Dublin Core Metadata Initiative “DCMI registers controlled vocabularies and encoding schemes to promote their use and to facilitate consistent identification within DC metadata…It is important to note that DCMI 'registers' controlled vocabularies, rather than 'approving' them.” http://dublincore.org/resources/faq/index.shtml Schema Dublin Core Metadata Initiative Elements: Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights http://dublincore.org/resources/faq/index.shtml Schema METS - 1996 Metadata Encoding & Transmission Standard “a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library” • Based on XML • Maintained by Library of Congress http://www.loc.gov/standards/mets/ Schema MODS - 2002 • • • • • • • • • Metadata Object Description Schema An extension schema to METS More elements are repeatable Maintained by Library of Congress Based on XML Carries data from MARC21 Language-based tags Richer element set than Dublin Core May not convert back to MARC21 well http://www.loc.gov/standards/mods/mods-overview.html Schema Crosswalks “A crosswalk is a specification for mapping one metadata standard to another.” Pierre, M., LaPlant, W. (1998) http://www.niso.org/publications/white_papers/crosswalk/ Schema Crosswalk Examples CanCore to SCORM ONIX to MARC21 LOM to Dublin Core GEM to MARC OCLC http://www.oclc.org/research/projects/mswitch/1_crosswalks.htm Implications • Self-documentation is one way to protect the life of digital objects (JPEG2000) http://memory.loc.gov/ammem/techdocs/digform/Formats_IS T05_paper.pdf • Language Ambiguity/Controlled Vocabularies • Who does this? Authors of pages? Indexing staff? Robots Robots-NoContent Yahoo! also introduced in May 2007 the attribute value: class="robots-nocontent".[13] This is not a meta tag, but an attribute and value, which can be used throughout Web page tags where needed. Content of the page where this attribute is being used will be ignored by the Yahoo! crawler and not included in the search engine's index. http://en.wikipedia.org/wiki/Meta_tag Robots Examples for the use of the robots-nocontent tag: <div class="robots-nocontent">excluded content</div> <span class="robots-nocontent">excluded content</span> <p class="robots-nocontent">excluded content</p> http://en.wikipedia.org/wiki/Meta_tag