Metadata Bridget Jones Information Architecture I February 23, 2009

advertisement
Metadata
Bridget Jones
Information Architecture I
February 23, 2009
Metadata
•Terms
•Meta Tag Birth
•Schema
•Implications
Terms
Metadata
Dr. Warwick Cathro
http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1019/1289
Terms
Metadata
“Metadata is structured data which describes the
characteristics of a resource. It shares many
similar characteristics to the cataloguing that
takes place in libraries, museums and
archives…The term "meta" derives from the
Greek word denoting a nature of a higher order or
more fundamental kind. ”
University of Queensland
http://www.library.uq.edu.au/iad/ctmeta4.html
Terms
Metadata Schemes
Each metadata schema will usually have the
following characteristics:
– a limited number of elements
– the name of each element
– the meaning of each element
University of Queensland
http://www.library.uq.edu.au/iad/ctmeta4.html
Terms
XML
Designed to structure, store and transport data,
it’s a container for other data standards and
schema. It is a query language. Tags are defined
by either a DTD or XSD file.
IT Lab Short Course
META Tag Birth
1996
W3C Distributed Indexing and Searching Workshop
• Reps from Dublin Core, Lycos, Microsoft,
WebCrawler, IEEE and Verity.
• They wanted to embed metadata information in
HTML without changing browser software or the
way robots collected data.
www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2
META Tag Birth
Robots?
META Tag Birth
The convention agreed upon is as follows:
< META NAME =
“schema_identifier.element_name"
CONTENT = "string data" >
www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2
META Tag Birth
<HEAD
profile="http://www.acme.com/profiles/core">
<TITLE>How to complete Memorandum
cover sheets</TITLE> <META
name="author" content="John Doe"> <META
name="copyright" content="© 1997
Acme Corp."> <META name="keywords"
content="corporate,guidelines,cataloging">
<META name="date" content="1994-1106T08:49:37+00:00"> </HEAD>
http://www.w3.org/TR/REC-html40/struct/global.html
META Tag Birth
Thus, a partial Dublin Core citation might be
encoded as follows:
< META NAME = "DC.title" CONTENT =
"HTML 2.0 Specification" >
< META NAME = "DC.author" CONTENT =
"Tim Berners-Lee" >
www.w3.org/Search/9605-Indexing-Workshop/ReportOutcomes/S6Group2
META Tag Birth
A sequence of META tags describing a
resource is incomplete without one such
LINK tag for each different prefix appearing
in the sequence. The previous example
could be considered complete with the
addition of these two LINK tags:
<link rel = "schema.DC" href =
"http://purl.org/DC/elements/1.0/">
<link rel = "schema.AC" href =
"http://metadata.net/ac/2.0/">
Kunze (1999)
http://www.ietf.org/rfc/rfc2731.txt
META Tag Birth
In general, the association takes the form
<link rel = "schema.PREFIX" href =
"LOCATION_OF_DEFINITION">
Kunze (1999)
http://www.ietf.org/rfc/rfc2731.txt
META Tag Birth
Metadata with no scheme attached
<meta name=“keywords” content=“gossip, celebrities,
fashion week, reality television”>
META Tag Birth
Articles Appear
• Bremser, W. (1997). Gain fame with META tags.
Internet World, 8(10), 94-96.
• Rapoza, J. (1997). How to improve site searches.
PC Week, 14(19), 27.
• Raeder, A. (1997). Promoting your Web site.
Searcher, 5(7), 63-66.
Schema
MARC - 1970
Schema
Dublin Core Metadata Initiative
“DCMI promotes the adoption of standardized
approaches to metadata and the architectures
which support the creation and exchange of
interoperable metadata. As evidence of this,
DCMI metadata has successfully pursued
standardization in various national and
international standards venues, including the
International Organization for Standardization
(ISO), the Internet Engineering Task Force (IETF),
the European Committee for Standardization
(CEN) and the US National Information Standards
Organization (NISO). ”
http://dublincore.org/about/
Schema
Dublin Core Metadata Initiative
“DCMI registers controlled vocabularies and
encoding schemes to promote their use and to
facilitate consistent identification within DC
metadata…It is important to note that DCMI
'registers' controlled vocabularies, rather than
'approving' them.”
http://dublincore.org/resources/faq/index.shtml
Schema
Dublin Core Metadata Initiative
Elements:
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
Format
Identifier
Source
Language
Relation
Coverage
Rights
http://dublincore.org/resources/faq/index.shtml
Schema
METS - 1996
Metadata Encoding & Transmission Standard
“a standard for encoding descriptive,
administrative, and structural metadata
regarding objects within a digital library”
• Based on XML
• Maintained by Library of Congress
http://www.loc.gov/standards/mets/
Schema
MODS - 2002
•
•
•
•
•
•
•
•
•
Metadata Object Description Schema
An extension schema to METS
More elements are repeatable
Maintained by Library of Congress
Based on XML
Carries data from MARC21
Language-based tags
Richer element set than Dublin Core
May not convert back to MARC21 well
http://www.loc.gov/standards/mods/mods-overview.html
Schema
Crosswalks
“A crosswalk is a specification for mapping one
metadata standard to another.”
Pierre, M., LaPlant, W. (1998)
http://www.niso.org/publications/white_papers/crosswalk/
Schema
Crosswalk Examples
CanCore to SCORM
ONIX to MARC21
LOM to Dublin Core
GEM to MARC
OCLC
http://www.oclc.org/research/projects/mswitch/1_crosswalks.htm
Implications
• Self-documentation is one way to protect the life of digital
objects (JPEG2000)
http://memory.loc.gov/ammem/techdocs/digform/Formats_IS
T05_paper.pdf
• Language Ambiguity/Controlled Vocabularies
• Who does this? Authors of pages? Indexing staff?
Robots
Robots-NoContent
Yahoo! also introduced in May 2007 the attribute
value: class="robots-nocontent".[13] This is not a
meta tag, but an attribute and value, which can be
used throughout Web page tags where needed.
Content of the page where this attribute is being
used will be ignored by the Yahoo! crawler and not
included in the search engine's index.
http://en.wikipedia.org/wiki/Meta_tag
Robots
Examples for the use of the robots-nocontent tag:
<div class="robots-nocontent">excluded
content</div>
<span class="robots-nocontent">excluded
content</span>
<p class="robots-nocontent">excluded content</p>
http://en.wikipedia.org/wiki/Meta_tag
Download