Information Science601.ppt

advertisement
Information Science
2005
Tefko Saracevic, PhD
School of Communication, Information and
Library Studies
Rutgers University
New Brunswick, New Jersey USA
http://www.scils.rutgers.edu/~tefko
© Tefko Saracevic
1
Information science:
a short definition
“the science dealing with the
efficient collection, storage, and
retrieval of information”
Webster
© Tefko Saracevic
2
Organization of presentation
1.
2.
3.
4.
5.
6.
7.
8.
Big picture – problems, solutions, social place
Structure – main areas in research & practice
Technology – information retrieval – largest part
Information – representation; bibliometrics
People – users, use, seeking, context
Paradigm shift – distancing of areas
Digital libraries – whose are they anyhow?
Conclusions – big questions for the future
© Tefko Saracevic
3
Scope
α Evolution and state of the field in
the last decade of the old and
first decade of the new century
© Tefko Saracevic
4
1.
The big picture
Problems addressed
α Bit of history: Vannevar Bush
(1945):
β Defined problem as “... the massive
task of making more accessible of a
bewildering store of knowledge.”
β Problem still with us & growing
[to Table of Content]
© Tefko Saracevic
5
… solution
α Bush suggested a machine:
“Memex ... association of ideas ...
duplicate mental processes
artificially.”
α Technological fix to problem
α Still with us:
technological determinant
© Tefko Saracevic
6
At the base of information
science:
Problem
Trying to control content in
α Information explosion
β exponential growth of information
artifacts, if not of information itself
PLUS today
α Communication explosion
β exponential growth of means and ways by
which information is communicated,
transmitted, accesses, used
© Tefko Saracevic
7
technological solution, BUT …
applying technology
to solving
problems of
effective use of
information
BUT:
from a
HUMAN & SOCIAL
and not only
TECHNOLOGICAL
perspective
© Tefko Saracevic
8
or a symbolic model
People
Information
Technology
© Tefko Saracevic
9
Problems & solutions:
SOCIAL CONTEXT
α Professional practice AND scientific
inquiry related to:
Effective communication of knowledge
records - ‘literature’ - among humans in
the context of social, organizational, &
individual need for and use of
information
α Taking advantage of modern
information technology
© Tefko Saracevic
10
or as White & McCaine put it:
“modeling the
world of
publications
with a practical
goal of being
able to deliver
their content to
inquirers
[users] on
demand.”
© Tefko Saracevic
11
Elaboration
α Knowledge records = texts, sounds, images,
multimedia, web ... ‘literature’ in given
domains
β content-bearing structures – central to information
science
α Communication = human-computer-literature
interface
β study of information science is the interface
between people & literatures
α Information need, seeking, and use = reason
d'être
α Effectiveness = relevance, utility
© Tefko Saracevic
12
General characteristics
α Interdisciplinarity - relations
with a number of fields, some more
or less predominant
α Technological imperative - driving
force, as in many modern fields
α Information society - social
context and role in evolution shared with many fields
© Tefko Saracevic
13
2.
Structure
Composition of the field
α As many fields, information science
has different areas of
concentration & specialization
α They change, evolve over time
β grow closer, grow apart
β ignore each other, less or more
[to Table of Content]
© Tefko Saracevic
14
most importantly different areas…
α receive more or less in funding &
emphasis
β producing great imbalances in work &
progress
β attracting different audiences & fields
α this includes
β vastly different levels of support for
research and
β huge commercial investments &
applications
© Tefko Saracevic
15
How to view structure?
by decomposing areas & efforts in
research & practice emphasizing
Technology
or
Informatio
n
© Tefko Saracevic
People
or
16
Part 3.
Technology
α Identified with information
retrieval (IR)
β by far biggest effort and investment
β international & global
β commercial interest large & growing
[to Table of Content]
© Tefko Saracevic
17
Information Retrieval –
definition & objective
“ IR: ... intellectual aspects of
description of information, ... search,
... & systems, machines...”
Calvin Mooers, 1951
α How to provide users with relevant
information effectively?
For that objective:
1. How to organize information
intellectually?
2. How to specify the search &
interaction intellectually?
3. What techniques & systems to use
effectively?
© Tefko Saracevic
18
Streams in IR Res. & Dev.
1.
Information science:
Services, users, use;
β Human-computer interaction;
β Cognitive aspects
β
2.
Computer science:
β Algorithms, techniques
β Systems aspects
3.
Information industry:
Products, services, Web
β Market aspects
β
α Problem:
β relative isolation – discussed later
© Tefko Saracevic
19
Contemporary IR research
α Now mostly done within computer
science
β e.g Special Interest Group on IR,
Association for Computing Machinery
(SIGIR,ACM)
α Spread globally
β e.g. major IR research communities
emerged in China, Korea, Singapore
α Branched outside of information
science - “everybody does information retrieval”
β data mining, machine learning, natural
language processing, artificial
intelligence, computer graphics …
© Tefko Saracevic
20
Text REtrieval Conference (TREC)
α Started in 1992, now probably ending
β “support research within the IR community by
providing the infrastructure necessary for largescale evaluation”
α Methods
β provides large test beds, queries, relevance
judgments, comparative analyses
β essentially using Cranfield 1960’s methodology
β organized around tracks
γ various topics – changing over years
© Tefko Saracevic
21
TREC impact
α International – big impact on creating
research communities
α Annual conferences
β report. exchange results, foster cooperation
α Results
β mostly in reports, available at
http://trec.nist.gov/
β overviews provided as well
β but, only a fraction published in journals or
books
© Tefko Saracevic
22
TREC tracks 2004
103 groups from 21 countries
α Genomics with 4 sub
α HARD (High Accuracy
tracks
Retrieval from Documents)
α Novelty
(new, nonredundant
information)
α Question answering
α Robust (improving poorly
performing topics)
α Terabyte
(very large
collections)
α Web track
© Tefko Saracevic
α Previous tracks:
β
β
β
β
β
β
β
β
β
β
β
ad-hoc (1992-1999)
routing (92–97)
interactive (94-02)
filtering (95-02)
cross language (97-02)
speech (97-00)
Spanish (94-96)
video (00-01)
Chinese (96-97)
query (98-00)
and a few more run for
two years only
23
Broadening of IR –
ever changing, ever new areas added
α
α
α
α
α
α
α
α
α
α
α
α
Cross language IR (CLIR)
Natural language processing (NLP IR)
Music IR (MIR)
Image, video, multimedia retrieval
Spoken language retrieval
IR for bioinformatics and genomics
Summarization; text extraction
Question answering
Many human-computer interactions
XML IR
Web IR; Web search engines
DB and IR integration – structured and
unstructured data
© Tefko Saracevic
24
Commercial IR
α Search engines based on IR
α But added many elaborations &
significant innovations
β dealing with HUGE numbers of pages fast
β countering spamming & page rank games –
adversarial IR
γ never ending combat of algorithms
α Spread & impact worldwide
β about 2000 engines in over 160 countries
β English was dominant, but not any more
© Tefko Saracevic
25
Commercial IR: brave new world
α Large investments & economic sector
β hope for big profits, as yet questionable
α Leading to proprietary, secret IR
β also aggressive hiring of best talent
β new commercial research centers in
different countries (e.g. MS in China)
α Academic research funding is changing
β brain drain from academe
© Tefko Saracevic
26
IR successfully effected:
α Emergence & growth of the INFORMATION
INDUSTRY
α Evolution of IS as a PROFESSION &
SCIENCE
α Many APPLICATIONS in many fields
β including on the Web – search engines
α Improvements in HUMAN - COMPUTER
INTERACTION
α Evolution of INTEDISCIPLINARITY
IR has a long, proud history
© Tefko Saracevic
27
Part 4.
Information
α Several areas of investigation;
β as basic phenomenon – not much progress
γ measures as Shannon's not successful
γ concentrated on manifestations and effects
β information representation
γ large area connected with IR, librarianship
γ metadata
β bibliometrics
γ structures of literature
Covered in separate lecture:
What_is_information.ppt
[to Table of Content]
© Tefko Saracevic
28
Part 5.
People
α Professional services
β in organization – moving toward knowledge
management, competitive intelligence
β in industry – vendors, aggregators, Internet,
α Research
β user & use studies
β interaction studies
β broadening to information seeking studies,
social context, collaboration
β relevance studies
β social informatics
[to Table of Content]
© Tefko Saracevic
29
User & use studies
α Oldest area
β covers many topics, methods,
orientations
β many studies related to IR
γ e.g. searching, multitasking, browsing,
navigation
α Branching into Web use studies
β quantitative & qualitative studies
β emergence of webmetrics
© Tefko Saracevic
30
Interaction
α Traditional IR model concentrates
on matching not user side &
interaction
α Several interaction models
suggested
γ Ingwersen’s cognitive, Belkin’s episode,
Saracevic’s stratified model
β hard to get experiments &
confirmation
α Considered key to providing
γ basis for better design
γ understanding of use of systems
α Web interactions a major new area
© Tefko Saracevic
31
Information seeking
α Concentrates on broader context not
only IR or interaction, people as
they move in life & work
α Based on concept of social
construction of information
α Most active area, particularly in
Europe, with annual conferences
© Tefko Saracevic
32
Information seeking
Sampling of theories, models
α Why people seek information:
β
β
β
β
Taylor’s stages of information need
Dervin’s Sense-Making – gap, bridge
Belkin’s Anomalous State of Knowledge
Chatman’s life in the round – inf. poverty
α How people seek information:
β
β
β
β
β
Wilson’s General Model of inf. seeking
Bates’ berrypicking – acts in searching
Kuhlthau’s information search process
Chang’s browsing model
Benoit’s communicative action - Habermas
© Tefko Saracevic
33
Paradigm split in
technology - people
Part 6.
α Split from early 80’s to date into
two orientations
 System-centered
γ algorithms, TREC
γ continue traditional IR model
 Human-(user)-centered
γ cognitive, situational, user studies
γ interaction models, some started in TREC
α These became almost separate
universes – one based in computer
science, the other in information
science & librarianship
© Tefko Saracevic
[to Table of Content]
34
Critiques, cultures
α Number of critiques (e.g. Dervin & Nilan)
about isolated systems approach
β calls for user-centered
evaluation
approaches, designs &
α But user-centered studies did not deliver
very useful design pointers, guides
α Very different cultures:
computer science has own, more science &
technology oriented
β information science more humanities oriented
β C.P. Snow’s two cultures
β
© Tefko Saracevic
35
Human vs. system
α Human (user) side:
β often highly critical, even one-sided
β mantra of implications for design
β but does not deliver concretely
α System side:
β mostly ignores user side & studies
β ‘tell us what to do & we will’
α Issue NOT
H or
S
approach
β even less H vs. S
β but how can H AND S work together
β major challenge for the future
© Tefko Saracevic
36
Reconciliation?
α Several efforts to provide humancentered design
β but more discussion than real application
α Integration of information seeking and
information retrieval in context
(Ingwersen & Järvelin)
α Research & development toward
β using search context, improving user
search experiences & search quality
β machine learning, incorporating semantics
© Tefko Saracevic
37
Funding
α Most funding
goes toward
systems side &
computer science
β most (very large %)
support for system
work
α In the digital
age support is
for digital
α True globally
© Tefko Saracevic
38
Part 7. Digital libraries
LARGE & growing area
α “Hot” area in R&D
β a number of large grants & projects
in the US, European Union, & other
countries up to now;
β will it continue? It is not growing
β but “DIGITAL” big & “libraries“ small
α “Hot” area in practice
β building digital collections, hybrid
libraries,
β many projects throughout the world
β growing at a high rate
[to Table of Content]
© Tefko Saracevic
39
Technical problems
α Substantial - larger & more complex
than anticipated:
β representing, storing
library objects
& retrieving of
γ particularly if originally designed to be
printed & then digitized
β operationally managing large collections issues of scale
β dealing with diverse & distributed
collections
γ interoperability
β assuring preservation & persistence
β incorporating rights management
© Tefko Saracevic
40
Digital Library Initiatives in
the US (DLI)
α Research consortia under National Science
Foundation
β DLI 1: 1994-98, 3 agencies, $24M, six large
projects
β DLI 2: 1999-2006, 8 agencies, $60+M, 77 large &
small projects in various categories
α ‘digital library’ not defined to cover many
topics & stretch ideas
β not constrained by practice
© Tefko Saracevic
41
European Union
α DELOS Network of Excelence on
Digital Libraries
β many projects throughout European Union
γ heavily technological
β many meetings, workshops
β resembles DLIs in the US
β well funded, long range
© Tefko Saracevic
42
Research issues
β understanding objects in DL
γ representing in many formats
γ non-textual materials
β
β
β
β
β
β
β
β
metadata, cataloging, indexing
conversion, digitization
organizing large collections
federated searching over distributed (various)
collections
managing collections, scaling
preservation, archiving
interoperability, standardization
accessing, using,
© Tefko Saracevic
43
DL projects in practice
α Heavily oriented toward a variety
of institutions – primarily
libraries
β but also museums, professional
societies, specific domains, etc etc
α Main orientation: institutional
missions, contexts, finances
β sustainability, preservation in real
world
β managing growth, rights, access
© Tefko Saracevic
44
Agendas
α Most DL research agenda is set from top
down
β from funding agencies to projects
β imprint of the computer science community's
interest & vision
α Most DL practice agendas are set from
bottom up
β from institutions, incl. many libraries
β imprint of institutional missions, interests
vision
&
γ providing access to specialized materials and
collections from an institution (s) that are
otherwise not accessible
γ covering in an integral way a domain with a range
of sources
© Tefko Saracevic
45
Connection?
α DL research & DL
practice presently are
conducted
β mostly independent of
each other,
β minimally informing each
other,
β & having slight, or no
connection
α Parallel universes with
little connections &
interaction
© Tefko Saracevic
46
8.
Conclusions
IS contributions
α IS effected handling of inf. in society
α Developed an organized body of
knowledge & professional competencies
α Applied interdisciplinarity
α IR reached a mature stage
α IR penetrated many fields & human
activities
α Stressed HUMAN in human-computer
interaction
[to Table of Content]
© Tefko Saracevic
47
Challenges
α Adjust to the growing & changing social &
organizational role of inf. & related inf.
infrastructure
α Play a positive role in globalization of
information
α Respond to technological imperative in human
terms
α Respond to changes from inf. to
communication explosion - bringing own
experiences to resolutions, particularly to
the INTERNET
α Join competition with quality
α Join DIGITAL with LIBRARIES
© Tefko Saracevic
48
Juncture
α IS is at a critical juncture in its
evolution
α Many fields, groups ... moving into
information
β big competition
β entrance of powerful players
β fight for stakes
α To be a major player IS needs to progress
in its:
β
β
β
β
research & development
professional competencies
educational efforts
interdisciplinary relations
α Reexamination necessary
© Tefko Saracevic
49
Thank you Miró!
© Tefko Saracevic
50
© Tefko Saracevic
51
Bibliography
Bates, M. J. (1999). Invisible Substrate of Information Science.
Journal of the American Society for Information Science,50, 10431050.
Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101108. Available:
http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
Hjørland, B. (2000). Library and Information Science: Practice,
Theory, and Philosophical Basis. Information Processing &
Management, 36 (3), 501-531.
Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in
information science research. Journal of the American Society for
Information Science and Technology, 52 (1), 62 - 73.
Saracevic, T. (1999). Information Science. Journal of the American
Society for Information Science, 50 (9) 1051-1063. Available:
http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf
Saracevic, T. (2005). How were digital libraries evaluated?
Presentation at the course and conference Libraries in the Digital
Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available:
http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf
Webber, S. (2003) Information Science in 2003: A Critique. Journal of
Information Science, 29, (4), 311-330.
White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author
Co-citation Analysis of Information Science 1972-1995. Journal of
the American Society for Information Science, 49 (4), 327-355.
© Tefko Saracevic
52
Download