Metadata.ppt

advertisement
metadata
considerations for
digital libraries
© Tefko Saracevic, Rutgers University
1
the Web
• fastest growing technology in history
• explosive growth of WWW provided
– ubiquity of information and access
– but also information chaos & anarchy
• growing difficulty in identifying, searching
& retrieving
• ‘lost in an ocean’ metaphors
© Tefko Saracevic, Rutgers University
2
problem
• to organize & search the Web needed:
knowledge about the structure of data
– but Web data & databases fuzzy
– structures vary widely; no consistency
– constantly evolve over time
– lack of agreement about meaning of even
simple terms & concepts in structure
© Tefko Saracevic, Rutgers University
3
solution
• some standardized description or
language to increase functionality
– a mechanism for a more precise
description of things on the Web
• going from machine-readable to
machine-understandable
– missing in original Web architecture
METADATA !
© Tefko Saracevic, Rutgers University
4
metadata
© Tefko Saracevic, Rutgers University
5
what?
• metadata: ‘data about data’
– machine understandable information
for the Web - emphasis on machine
– description of what a text (or any
object) part is all about
• e.g. labeling title, author, source …
• many evolving standards suggested
to be applied in various domains
© Tefko Saracevic, Rutgers University
6
where?
• in volatile digital environments
– metadata describe electronic
resources, texts & multimedia
– metadata exist or have meaning only
in relation to the referenced
document or object
• provide information about the object
© Tefko Saracevic, Rutgers University
7
why?
• to standardize description of what is
what in electronic resources in order
• to aid in identification, organization, &
location of a great variety
• to enable effective search of variety of
objects (documents) distributed all over
• sometimes also to provide controls (e.g.
validation, rights, provenance, ratings ...)
© Tefko Saracevic, Rutgers University
8
importance
• standard metadata descriptions
are a prerequisite to
– common use
– effective searching
– ‘intelligent’ roaming by agents
– validation, ratings,
© Tefko Saracevic, Rutgers University
9
markup languages
• SGML - granddaddy
(standard in 1986)
– marks elements within documents
• derived from old markups for typesetting
• adapted by communities producing
electronic documents
• machine independent - reason for success
– transportable from one hardware & software to
another; substitutes strings
• many extensions & specific applications
© Tefko Saracevic, Rutgers University
10
principles
• ALL markup language must specify
• what markup means
• what markup is allowed
• what markup is required
• how markup is distinguished from text
• all markup languages & applications
follow these principles
• underlying concepts are fairly simple but
they get very confusing real fast.
© Tefko Saracevic, Rutgers University
11
specifications
• types of documents defined by DTD
Document Type Definitions
– many types & applications formulated
• vary greatly in complexity and use
• RDF -
Resource Description Framework
– a common syntax, data model &
scheme for describing
© Tefko Saracevic, Rutgers University
12
extensions
• HTML - most famous & successful
– allows for metatags in the Head
• not used much, even discouraged
• in the body could be indirect
• XML - the next big thing
(hopefully)
• data format for structured document
interchange & interoperability on WWW
• increases functionality of SGML &
combines with ease of use of HTML
© Tefko Saracevic, Rutgers University
13
who specifies standards?
• formal groups
– national & international standards
organizations - ISO, ANSI, NISO
• informal groups
– WWW Consortium (W3C)
– Dublin Core
– Library of Congress
© Tefko Saracevic, Rutgers University
14
proliferation
• currently: proliferation of metadata
standards activities -many domains
– a lot of confusion & incompatibility
– in document description & libraries
• coordination through liaisons & a number
of projects in the U.S & internatioanly
– strength: domain experts involvement
– weakness: limited perspective; re-invention
© Tefko Saracevic, Rutgers University
15
libraries
• in libraries metadata has a very long
tradition long preceding the Web (but
not called metadata)
– cataloging rules, standards
• MARC (Machine Readable Cataloging)
• enabled worldwide exchange of cataloging
records
• but long standing problems with searching
© Tefko Saracevic, Rutgers University
16
sample of projects
• Encoded Archival Description (EAD)
• Text Encoding Initiative (TEI)
• Federal Geographic Data
Committee (FGDC) - geospacial data
• Z39.50 standards - searching
• crosswalks: mapping e.g. DC to MARC
© Tefko Saracevic, Rutgers University
17
Dublin Core (DC)
• international initiative to describe a
core set of Web resources
– a set of 15 elements
 Title; Creator; Subject; Description;
Publisher; Contributor; Date; Type;
Format; Identifier; Source; Language;
Relation; Coverage; Rights
• wide interest & a lot of work
 but not widely applied on the Web
© Tefko Saracevic, Rutgers University
18
library interoperability
• library catalogs bound by
proprietary software & hardware
• middleware needed
– protocols (based on Z39.50) provide
for interaction of clients with many
servers (catalogs)
• problems remain with semantic
interoperability
© Tefko Saracevic, Rutgers University
19
digitization
• metadata assignment (cataloging) a
key component in digitization or
electronic publishing
• choices: a spectrum of possibilities to
select & apply metadata
• search for automation - e.g. templates
• connection with cataloging, indexing
© Tefko Saracevic, Rutgers University
20
decisions, decision
– how & what to plan for metadata
creation in conjunction with dl?
– target audience?
– scope and depth?
– what to adopt? plug-in in a scheme?
– how to integrate metadata projects?
– needed skills? training? staffing?
© Tefko Saracevic, Rutgers University
21
$$$$
• costs of metadata: HUGE
– involved operations
– time, personnel, effort
– learning many new things included
– making decisions complex & involved
• cooperative activities essential
• libraries pushed out of libraries
© Tefko Saracevic, Rutgers University
22
© Tefko Saracevic, Rutgers University
23
Download