Information Science601.ppt

advertisement
Information Science
2005
Tefko Saracevic, PhD
School of Communication, Information and
Library Studies
Rutgers University
New Brunswick, New Jersey USA
http://www.scils.rutgers.edu/~tefko
© Tefko Saracevic
1
Information science:
a short definition
“the science dealing with the
efficient collection, storage, and
retrieval of information”
Webster
© Tefko Saracevic
2
Organization of presentation
1.
2.
3.
4.
5.
6.
7.
8.
Big picture – problems, solutions, social place
Structure – main areas in research & practice
Technology – information retrieval – largest part
Information – representation; bibliometrics
People – users, use, seeking, context
Digital libraries – whose are they anyhow?
Paradigm shift – distancing of areas
Conclusions – big questions for the future
© Tefko Saracevic
3
Scope
α Evolution and state of the field in
the last decade of the old and
first decade of the new century
© Tefko Saracevic
4
1.
The big picture
Problems addressed
α Bit of history: Vannevar Bush
(1945):
β Defined problem as “... the massive
task of making more accessible of a
bewildering store of knowledge.”
β Problem still with us & growing
© Tefko Saracevic
5
… solution
α Bush suggested a machine:
“Memex ... association of ideas ...
duplicate mental processes
artificially.”
α Technological fix to problem
α Still with us:
technological determinant
© Tefko Saracevic
6
At the base of information
science:
Problem
Trying to control content in
α Information explosion
β exponential growth of information
artifacts, if not of information itself
PLUS today
α Communication explosion
β exponential growth of means and ways by
which information is communicated,
transmitted, accesses, used
© Tefko Saracevic
7
technological solution, BUT …
applying technology
to solving
problems of
effective use of
information
BUT:
from a
HUMAN & SOCIAL
and not only
TECHNOLOGICAL
perspective
© Tefko Saracevic
8
or a symbolic model
People
Information
Technology
© Tefko Saracevic
9
Problems & solutions:
SOCIAL CONTEXT
α Professional practice AND scientific
inquiry related to:
Effective communication of knowledge
records - ‘literature’ - among humans in
the context of social, organizational, &
individual need for and use of
information
α Taking advantage of modern
information technology
© Tefko Saracevic
10
or as White & McCaine put it:
“modeling the
world of
publications
with a practical
goal of being
able to deliver
their content to
inquirers
[users] on
demand.”
© Tefko Saracevic
11
Elaboration
α Knowledge records = texts, sounds, images,
multimedia, web ... ‘literature’ in given
domains
β content-bearing structures – central to information
science
α Communication = human-computer-literature
interface
β study of information science is the interface
between people & literatures
α Information need, seeking, and use = reason
d'être
α Effectiveness = relevance, utility
© Tefko Saracevic
12
General characteristics
α Interdisciplinarity - relations
with a number of fields, some more
or less predominant
α Technological imperative - driving
force, as in many modern fields
α Information society - social
context and role in evolution shared with many fields
© Tefko Saracevic
13
2.
Structure
Composition of the field
α As many fields, information science
has different areas of
concentration & specialization
α They change, evolve over time
β grow closer, grow apart
β ignore each other, less or more
© Tefko Saracevic
14
most importantly different areas…
α receive more or less in funding &
emphasis
β producing great imbalances in work &
progress
β attracting different audiences & fields
α this includes
β vastly different levels of support for
research and
β huge commercial investments &
applications
© Tefko Saracevic
15
How to view structure?
by decomposing areas & efforts in
research & practice emphasizing
Technology
or
Informatio
n
© Tefko Saracevic
People
or
16
Part 3.
Technology
α Identified with information
retrieval (IR)
β by far biggest effort and investment
β international & global
β commercial interest large & growing
© Tefko Saracevic
17
Information Retrieval –
definition & objective
“ IR: ... intellectual aspects of
description of information, ... search,
... & systems, machines...”
Calvin Mooers, 1951
α How to provide users with relevant
information effectively?
For that objective:
1. How to organize information
intellectually?
2. How to specify the search &
interaction intellectually?
3. What techniques & systems to use
effectively?
© Tefko Saracevic
18
Streams in IR Res. & Dev.
1.
Information science:
Services, users, use;
β Human-computer interaction;
β Cognitive aspects
β
2.
Computer science:
β Algorithms, techniques
β Systems aspects
3.
Information industry:
Products, services, Web
β Market aspects
β
α Problem:
β relative isolation – discussed later
© Tefko Saracevic
19
Contemporary IR research
α Now mostly done within computer
science
β e.g Special Interest Group on IR,
Association for Computing Machinery
(SIGIR,ACM)
α Spread globally
β e.g. major IR research communities
emerged in China, Korea, Singapore
α Branched outside of information
science - “everybody does information retrieval”
β data mining, machine learning, natural
language processing, artificial
intelligence, computer graphics …
© Tefko Saracevic
20
Text REtrieval Conference (TREC)
α Started in 1992, now probably ending
β “support research within the IR community by
providing the infrastructure necessary for largescale evaluation”
α Methods
β provides large test beds, queries, relevance
judgments, comparative analyses
β essentially using Cranfield 1960’s methodology
β organized around tracks
γ various topics – changing over years
© Tefko Saracevic
21
TREC impact
α International – big impact on creating
research communities
α Annual conferences
β report. exchange results, foster cooperation
α Results
β mostly in reports, available at
http://trec.nist.gov/
β overviews provided as well
β but, only a fraction published in journals or
books
© Tefko Saracevic
22
TREC tracks 2004
103 groups from 21 countries
α Genomics with 4 sub
α HARD (High Accuracy
tracks
Retrieval from Documents)
α Novelty
(new, nonredundant
information)
α Question answering
α Robust (improving poorly
performing topics)
α Terabyte
(very large
collections)
α Web track
© Tefko Saracevic
α Previous tracks:
β
β
β
β
β
β
β
β
β
β
β
ad-hoc (1992-1999)
routing (92–97)
interactive (94-02)
filtering (95-02)
cross language (97-02)
speech (97-00)
Spanish (94-96)
video (00-01)
Chinese (96-97)
query (98-00)
and a few more run for
two years only
23
Broadening of IR –
ever changing, ever new areas added
α
α
α
α
α
α
α
α
α
α
α
α
Cross language IR (CLIR)
Natural language processing (NLP IR)
Music IR (MIR)
Image, video, multimedia retrieval
Spoken language retrieval
IR for bioinformatics and genomics
Summarization; text extraction
Question answering
Many human-computer interactions
XML IR
Web IR; Web search engines
DB and IR integration – structured and
unstructured data
© Tefko Saracevic
24
Commercial IR
α Search engines based on IR
α But added many elaborations &
significant innovations
β dealing with HUGE numbers of pages fast
β countering spamming & page rank games –
adversarial IR
γ never ending combat of algorithms
α Spread & impact worldwide
β about 2000 engines in over 160 countries
β English was dominant, but not any more
© Tefko Saracevic
25
Commercial IR: brave new world
α Large investments & economic sector
β hope for big profits, as yet questionable
α Leading to proprietary, secret IR
β also aggressive hiring of best talent
β new commercial research centers in
different countries (e.g. MS in China)
α Academic research funding is changing
β brain drain from academe
© Tefko Saracevic
26
IR successfully effected:
α Emergence & growth of the INFORMATION
INDUSTRY
α Evolution of IS as a PROFESSION &
SCIENCE
α Many APPLICATIONS in many fields
β including on the Web – search engines
α Improvements in HUMAN - COMPUTER
INTERACTION
α Evolution of INTEDISCIPLINARITY
IR has a long, proud history
© Tefko Saracevic
27
Part 4.
Information
α Several areas of investigation;
β as basic phenomenon – not much progress
γ measures as Shannon's not successful
γ concentrated on manifestations and effects
β information representation
γ large area connected with IR, librarianship
γ metadata
β bibliometrics
γ structures of literature
Covered in separate lectures
© Tefko Saracevic
28
Part 5.
People
α Professional services
β in organization – moving toward knowledge
management, competitive intelligence
β in industry – vendors, aggregators, Internet,
α Research
β user & use studies
β interaction studies
β broadening to information seeking studies,
social context, collaboration
β relevance studies
β social informatics
© Tefko Saracevic
29
User & use studies
α Oldest area
β covers many topics, methods,
orientations
β many studies related to IR
γ e.g. searching, multitasking, browsing,
navigation
α Branching into Web use studies
β quantitative & qualitative studies
β emergence of webmetrics
© Tefko Saracevic
30
Interaction
α Traditional IR model concentrates
on matching not user side &
interaction
α Several interaction models
suggested
γ Ingwersen’s cognitive, Belkin’s episode,
Saracevic’s stratified model
β hard to get experiments &
confirmation
α Considered key to providing
γ basis for better design
γ understanding of use of systems
α Web interactions a major new area
© Tefko Saracevic
31
Information seeking
α Concentrates on broader context not
only IR or interaction, people as
they move in life & work
α Based on concept of social
construction of information
α Most active area, particularly in
Europe, with annual conferences
© Tefko Saracevic
32
Information seeking
Sampling of theories, models
α Why people seek information:
β
β
β
β
Taylor’s stages of information need
Dervin’s Sense-Making – gap, bridge
Belkin’s Anomalous State of Knowledge
Chatman’s life in the round – inf. poverty
α How people seek information:
β
β
β
β
β
Wilson’s General Model of inf. seeking
Bates’ berrypicking – acts in searching
Kuhlthau’s information search process
Chang’s browsing model
Benoit’s communicative action - Habermas
© Tefko Saracevic
33
Paradigm split in
technology - people
Part 7.
α Split from early 80’s to date into
two orientations
 System-centered
γ algorithms, TREC
γ continue traditional IR model
 Human-(user)-centered
γ cognitive, situational, user studies
γ interaction models, some started in TREC
α These became almost separate
universes – one based in computer
science, the other in information
science & libraianship
© Tefko Saracevic
34
Critiques, cultures
α Number of critiques (e.g. Dervin & Nilan)
about isolated systems approach
β calls for user-centered
evaluation
approaches, designs &
α But user-centered studies did not deliver
very useful design pointers, guides
α Very different cultures:
computer science has own, more science &
technology oriented
β information science more humanities oriented
β C.P. Snow’s two cultures
β
© Tefko Saracevic
35
Human vs. system
α Human (user) side:
β often highly critical, even one-sided
β mantra of implications for design
β but does not deliver concretely
α System side:
β mostly ignores user side & studies
β ‘tell us what to do & we will’
α Issue NOT
H or
S
approach
β even less H vs. S
β but how can H AND S work together
β major challenge for the future
© Tefko Saracevic
36
Reconciliation?
α Several efforts to provide humancentered design
β but more discussion than real application
α Integration of information seeking and
information retrieval in context
(Ingwersen & Järvelin)
α Research & development toward
β using search context, improving user
search experiences & search quality
β machine learning, incorporating semantics
© Tefko Saracevic
37
Funding
α Most funding
goes toward
systems side &
computer science
β most (very large %)
support for system
work
α In the digital
age support is
for digital
α True globally
© Tefko Saracevic
38
6.
Digital libraries
LARGE & growing area
α “Hot” area in R&D
β a number of large grants & projects
in the US, European Union, & other
countries up to now;
β will it continue? It is not growing
β but “DIGITAL” big & “libraries“ small
α “Hot” area in practice
β building digital collections, hybrid
libraries,
β many projects throughout the world
β growing at a high rate
© Tefko Saracevic
39
Technical problems
α Substantial - larger & more complex
than anticipated:
β representing, storing
library objects
& retrieving of
γ particularly if originally designed to be
printed & then digitized
β operationally managing large collections issues of scale
β dealing with diverse & distributed
collections
γ interoperability
β assuring preservation & persistence
β incorporating rights management
© Tefko Saracevic
40
Digital Library Initiatives in
the US (DLI)
α Research consortia under National Science
Foundation
β DLI 1: 1994-98, 3 agencies, $24M, six large
projects
β DLI 2: 1999-2006, 8 agencies, $60+M, 77 large &
small projects in various categories
α ‘digital library’ not defined to cover many
topics & stretch ideas
β not constrained by practice
© Tefko Saracevic
41
European Union
α DELOS Network of Excelence on
Digital Libraries
β many projects throughout European Union
γ heavily technological
β many meetings, workshops
β resembles DLIs in the US
β well funded, long range
© Tefko Saracevic
42
Research issues
β understanding objects in DL
γ representing in many formats
γ non-textual materials
β
β
β
β
β
β
β
β
metadata, cataloging, indexing
conversion, digitization
organizing large collections
federated searching over distributed (various)
collections
managing collections, scaling
preservation, archiving
interoperability, standardization
accessing, using,
© Tefko Saracevic
43
DL projects in practice
α Heavily oriented toward a variety
of institutions – primarily
libraries
β but also museums, professional
societies, specific domains, etc etc
α Main orientation: institutional
missions, contexts, finances
β sustainability, preservation in real
world
β managing growth, rights, access
© Tefko Saracevic
44
Agendas
α Most DL research agenda is set from top
down
β from funding agencies to projects
β imprint of the computer science community's
interest & vision
α Most DL practice agendas are set from
bottom up
β from institutions, incl. many libraries
β imprint of institutional missions, interests
vision
&
γ providing access to specialized materials and
collections from an institution (s) that are
otherwise not accessible
γ covering in an integral way a domain with a range
of sources
© Tefko Saracevic
45
Connection?
α DL research & DL
practice presently are
conducted
β mostly independent of
each other,
β minimally informing each
other,
β & having slight, or no
connection
α Parallel universes with
little connections &
interaction
© Tefko Saracevic
46
8.
Conclusions
IS contributions
α IS effected handling of inf. in society
α Developed an organized body of
knowledge & professional competencies
α Applied interdisciplinarity
α IR reached a mature stage
α IR penetrated many fields & human
activities
α Stressed HUMAN in human-computer
interaction
© Tefko Saracevic
47
Challenges
α Adjust to the growing & changing social &
organizational role of inf. & related inf.
infrastructure
α Play a positive role in globalization of
information
α Respond to technological imperative in human
terms
α Respond to changes from inf. to
communication explosion - bringing own
experiences to resolutions, particularly to
the INTERNET
α Join competition with quality
α Join DIGITAL with LIBRARIES
© Tefko Saracevic
48
Juncture
α IS is at a critical juncture in its
evolution
α Many fields, groups ... moving into
information
β big competition
β entrance of powerful players
β fight for stakes
α To be a major player IS needs to progress
in its:
β
β
β
β
research & development
professional competencies
educational efforts
interdisciplinary relations
α Reexamination necessary
© Tefko Saracevic
49
© Tefko Saracevic
50
© Tefko Saracevic
51
Download