Information Science: Where does it come from and where is it going?

advertisement
Information Science:
Where does it come from
and where is it going?
Tefko Saracevic, PhD
School of Communication, Information and
Library Studies
Rutgers University
New Brunswick, New Jersey USA
http://www.scils.rutgers.edu/~tefko
Gutenberg
1397-1468
© Tefko Saracevic
1
Information science:
a short definition
“the collection, classification,
storage, retrieval, and
dissemination of recorded knowledge
treated both as a pure and as an
applied science”
Merriam-Webster
© Tefko Saracevic
2
Organization of presentation
1.
2.
3.
4.
5.
6.
7.
8.
9.
Big picture – problems, solutions, social place
Structure – main areas in research & practice
Technology – information retrieval – largest part
Information – representation; bibliometrics
People – users, use, seeking, context
Paradigm split – distancing of areas
Relations – librarianship, computer science
Digital libraries – whose are they anyhow?
Conclusions – big questions for the future
© Tefko Saracevic
3
Part 1. The big picture
Problems addressed
Bit of history: Vannevar Bush
(1945):
Defined problem as “... the massive
task of making more accessible of a
bewildering store of knowledge.”
Problem still with us & growing
1890-1974
© Tefko Saracevic
4
… solution
Bush suggested a machine:
“Memex ... association of ideas ...
duplicate mental processes
artificially.”
Technological fix to problem
Still with us:
technological determinant
© Tefko Saracevic
5
At the base of information
science:
Problem
Trying to control content in
 Information explosion
exponential growth of information
artifacts, if not of information itself
PLUS today
 Communication explosion
exponential growth of means and ways by
which information is communicated,
transmitted, accesses, used
© Tefko Saracevic
6
technological solution, BUT …
applying technology
to solving
problems of
effective use of
information
BUT:
from a
HUMAN & SOCIAL
and not only
TECHNOLOGICAL
perspective
© Tefko Saracevic
7
or a symbolic model
People
Information
Technology
© Tefko Saracevic
8
Problems & solutions:
SOCIAL CONTEXT
 Professional practice AND scientific
inquiry related to:
Effective communication of knowledge
records - ‘literature’ - among humans in
the context of social, organizational, &
individual need for and use of
information
 Taking advantage of modern
information technology
© Tefko Saracevic
9
or as White & McCaine
(1998)
put it:
“modeling the
world of
publications
with a practical
goal of being
able to deliver
their content to
inquirers
[users] on
demand.”
© Tefko Saracevic
10
General characteristics
 Interdisciplinarity - relations
with a number of fields, some more
or less predominant
 Technological imperative - driving
force, as in many modern fields
 Information society - social
context and role in evolution shared with many fields
Table of content
© Tefko Saracevic
11
Part 2. Structure
Composition of the field
 As many fields, information science
has different areas of
concentration & specialization
 They change, evolve over time
grow closer, grow apart
ignore each other, less or more
sometimes fight
© Tefko Saracevic
12
most importantly different areas…
 receive more or less in funding &
emphasis
producing great imbalances in work &
progress
attracting different audiences & fields
 this includes
vastly different levels of support for
research and
huge commercial investments &
applications
© Tefko Saracevic
13
How to view structure?
by decomposing areas & efforts in
research & practice emphasizing
Technology
or
Informatio
n
© Tefko Saracevic
People
or
Table of content
14
Part 3.
Technology
 Identified with information
retrieval (IR)
by far biggest effort and investment
international & global
commercial interest large & growing
© Tefko Saracevic
15
Information Retrieval –
definition & objective
“ IR: ... intellectual aspects of
description of information, ... search,
... & systems, machines...”
Calvin Mooers, 1951
 How to provide users with relevant
information effectively?
For that objective:
1. How to organize information
intellectually?
2. How to specify the search &
interaction intellectually?
3. What techniques & systems
to use effectively?
1919-1994
© Tefko Saracevic
16
Streams in IR Res. & Dev.
1.
Information science:
Services, users, use;
 Human-computer interaction;
 Cognitive aspects

2.
Computer science:
 Algorithms, techniques
 Systems aspects; evaluation
3.
Information industry:
Products, services, Web
 search engines – BIG!
 Market aspects
Problem:

 relative isolation – discussed later
© Tefko Saracevic
17
IR research
 Started in the US
through government
support & in
information science
 Now mostly done
within computer
science
 e.g Special Interest
Group on IR,
Association for
Computing Machinery
(SIGIR,ACM)
© Tefko Saracevic
Gerard Salton
1927-1995
18
Contemporary IR research
 Spread globally
e.g. major IR research communities
emerged in China, Korea, Singapore
 Branched outside of information
science - “everybody does information
retrieval”
search engines, data mining, natural
language processing, artificial
intelligence, computer graphics …
© Tefko Saracevic
19
Testing in IR
 Major component of
IR made it strong &
affected innovation
 Long history –
started with
Cranfield tests in
late 1950’s
 Measures –
precision & recall
based on relevance
Cyril Cleverdon
1914-1997
© Tefko Saracevic
20
Text REtrieval Conference (TREC)
 Major research, laboratory effort
 Started in 1992,
 “support research within the IR community by
providing the infrastructure necessary for largescale evaluation”
 Methods
 provides large test beds, queries, relevance
judgments, comparative analyses
 essentially using Cranfield 1960’s methodology
 organized around tracks
 various topics – changing over years
© Tefko Saracevic
21
TREC impact
 International – big impact on creating
research communities
 Annual conferences
 reports, exchange results, foster cooperation
 Results
 mostly in reports, available at
http://trec.nist.gov/pubs.html
 overviews provided as well
 but, only a fraction published in journals
 Book (2005):
TREC: Experiment and Evaluation in Information
Retrieval
Edited by Ellen M. Voorhees and Donna K. Harman
© Tefko Saracevic
22
TREC tracks 2007
116 groups from 20 countries







Genomics
Spam
Blog
Question answering
Enterprise
Million query (new)
Legal
© Tefko Saracevic
 Previous tracks:











ad-hoc (1992-1999)
routing (92–97)
interactive (94-02)
filtering (95-02)
cross language (97-02)
speech (97-00)
Spanish (94-96)
video (00-01)
Chinese (96-97)
query (98-00)
and a few more run for
two years only
23
Broadening of IR – sample
ever changing, ever new areas added












Cross language IR (CLIR)
Natural language processing (NLP IR)
Music IR (MIR)
Image, video, multimedia retrieval
Spoken language retrieval
IR for bioinformatics and genomics
Summarization; text extraction
Question answering
Many human-computer interactions
XML IR
Web IR; Web search engines
IR in context – big area for major
search engines & newer research
© Tefko Saracevic
24
Commercial IR
 Search engines based on IR
 But added many elaborations &
significant innovations
dealing with HUGE number of pages fast
countering spamming & page rank games –
adversarial IR - combat of algorithms
adding context for searching
 Spread & impact worldwide
about 2000 engines in over 160 countries
English was dominant, but not any more
© Tefko Saracevic
25
Commercial IR: brave new world
 Large investments & economic sector
hope for big profits, as yet questionable
 Leading to proprietary, secret IR
also aggressive hiring of best talent
new commercial research centers in
different countries (e.g. MS in China)
 Academic research funding is changing
brain drain from academe
 Commercial search engines facing many
challenges – hiring best talent
 and providing brain-drain for academics
© Tefko Saracevic
26
IR successfully effected:
 Emergence & growth of the INFORMATION
INDUSTRY
 Evolution of IS as a PROFESSION &
SCIENCE
 Many APPLICATIONS in many fields
 including on the Web – search engines
 Improvements in HUMAN - COMPUTER
INTERACTION
 Evolution of INTEDISCIPLINARITY
IR has a long, proud history
© Tefko Saracevic
Table of content
27
Part 4.
Information
 Several areas of investigation;
as basic phenomenon – not much progress
measures as Shannon's not successful
concentrated on manifestations and effects
no recent progress in this basic research
information representation
large area connected with IR, librarianship
metadata
bibliometrics
structures of literature
© Tefko Saracevic
28
What is information?
Intuitively well understood, but
formally not well stated
Several viewpoints, models emerged
 Shannon: source-channel-destination
signals not content – not really
applicable, despite many tries
 Cognitive: changes in cognitive
structures
content processing & effects
 Social: context, situation
information seeking, tasks
© Tefko Saracevic
29
Information in information science:
Three senses
(from narrowest to broadest)
1. Information in terms of decision involving
little or no cognitive processing

signals, bits, straightforward data - e.g.. inf.
theory (Shanon), economics,
2. Information involving cognitive processing
& understanding

understanding, matching texts, Brookes
3. Information also as related to context,
situation, problem-at-hand

USERS, USE,TASK
For information science
(including information retrieval):
third, broadest interpretation necessary
© Tefko Saracevic
30
Bibliometrics
“… the quantitative treatment of the properties of
recorded discourse and behavior pertaining to it.”
Fairthorne, 1969
 Many quantitative studies & some laws
 Bradford’s law, Lotka’s law – regularities
 quantity/yield distributions of journals, authors
 also related areas:
Scientometrics
covering science in general, not just
publications
Infometrics
all information objects
Webmetrics or cybermetrics
using bibliometric techniques to study the web
© Tefko Saracevic
Table of content
31
Part 5.
People
 Professional services
 in organization – moving toward knowledge
management, competitive intelligence
 in industry – vendors, aggregators, Internet,
 Research
 user & use studies
 interaction studies
 broadening to information seeking studies,
social context, collaboration
 relevance studies
 social informatics
© Tefko Saracevic
32
User & use studies
 Oldest area
covers many topics, methods,
orientations
many studies related to IR
e.g. searching, multitasking, browsing,
navigation
 theoretical & experimental studies on
relevance
 Branching into Web use studies
quantitative & qualitative studies
emergence of webmetrics
© Tefko Saracevic
33
Interaction
 Traditional IR model concentrates
on matching but not on user side &
interaction
 Several interaction models
suggested
Ingwersen’s cognitive, Belkin’s episode,
Saracevic’s stratified model
hard to get experiments & confirmation
 Considered key to providing
basis for better design
understanding of use of systems
 Web interactions: a major new area
© Tefko Saracevic
34
Information seeking
 Concentrates on broader context not only
IR or interaction, people as they move in
life & work
 Number of models provided
 e.g. Kuhlthau’s information search process,
Järvelin’s information seeking
 Includes studies of ‘life in the round,’
making sense, information encountering,
work life, information discovery
 Based on concept of social construction
of information
© Tefko Saracevic
Table of content
35
Paradigm split in
technology - people
Part 6.
 Split from early 80’s to date into:
System-centered
algorithms, TREC, search engines
continue traditional IR model
Human-(user)-centered
cognitive, situational, user studies
interaction models, some started in
TREC
relevance studies
© Tefko Saracevic
36
Human vs. system
 Human (user) side:
 often highly critical, even one-sided
 mantra of implications for design
 but does not deliver concretely
 System side:
 mostly ignores user side & studies
 ‘tell us what to do & we will’
 Issue NOT
H or
S
approach
 even less H vs. S
 but how can H AND S work together
 major challenge for the future
© Tefko Saracevic
37
Great separation
 IR in computer
science
 completely technology
oriented
 VERY international
 not aware at all of
the other side
 SIGIR growing a lot:
 2007 subm. 490,
accept. 85, 17%
 2006 subm. 399,
accept. 74, 19%
 1999 subm. 135,
accept. 33, 24%
© Tefko Saracevic
 IR, user studies,
services in
information science
 mostly people
oriented
 aware, but
participating less
with other side
 only a few LIS
people come to
SIGIR, even fewer
SIGIR to ASIST, none
to ALA
38
Calls vs support
 Many calls for user-centered or humancentered design, approaches & evaluation
 Number of works discussing it, but few
proposing concrete solutions
 But: most support for system work
 in the digital age support is for digital
 Recent attempt at combining two views:
Book: Ingerwersen, P. and Järvelin, K. (2005). The
Turn: Integration of information seeking and retrieval
in context. Springer.
Table of content
© Tefko Saracevic
39
Relations,
alliances, competition
Part 7.
 With a number of fields...
 Strongest:
1. Librarianship
2. Computer science
© Tefko Saracevic
40
Common grounds
IS & librarianship share:
 Social role in information
society
 Concern with effective
utilization of graphic & other
types of records
 Research problems related to a
number of topics
 Transfer to & from information
retrieval
© Tefko Saracevic
41
Differences
IS & librarianship differ in:
 Selection & definition of many
problems addressed
 Theoretical questions & framework
 Nature & degree of
experimentation
 Tools and approaches used
 Nature & strength of
interdisciplinary relations
© Tefko Saracevic
42
One field or two?
 Point of many debates
 Suggest: TWO fields in strong
interdisciplinary relations
 Not a matter of “better” or “worse” matters little
 common arguments between many fields
 Differences matter in:
 problem selection & definition
 agenda, paradigms
 theory, methodology
 practical solutions, systems
 Best example: IR & library automation
© Tefko Saracevic
43
Which?
 Librarianship. Information science
 Library and information science
 Libraryandinformationscience
 Michael Buckland’s suggestion
 Information science
 Information sciences
 Information
like in the “Information School”
© Tefko Saracevic
44
IS & computer science
 CS primarily about algorithms
 IS primarily about information and its
users and use
 Not in competition, but complementary
 Growing number of computer scientists
active in IS – particularly in IR and
digital libraries
 Concentrating on
 advanced IR algorithms & techniques
 digital library infrastructure & various
domains
 human computer interaction
© Tefko Saracevic
45
Interaction and IS
 Two streams:
computer-human interaction
human-computer interaction
 Many studies on:
machine aspects of interaction
human variables in interaction
Problems: little feedback between
very hard to evaluate
 Web interactions: a major area
 Another interdisciplinary area
computers sc., cognitive sc.,
ergonomics,
Table of content
© Tefko Saracevic
46
Part 8.
Digital libraries
 LARGE & growing area
 “Hot” area in R&D
a number of large grants & projects in
the US, European Union, & other
countries
but “DIGITAL” big & “libraries“ small
 “Hot” area in practice
building digital collections, hybrid
libraries,
many projects throughout the world
but in the US funding drying out
© Tefko Saracevic
47
Technical problems
 Substantial - larger & more complex
than anticipated:
 representing, storing
library objects
& retrieving of
 particularly if originally designed to be
printed & then digitized
 operationally managing large collections issues of scale
 dealing with diverse & distributed
collections
 interoperability; federated searching
 assuring preservation & persistence
 incorporating rights management
© Tefko Saracevic
48
Research issues
 understanding objects in DL
representing in many formats
 metadata, cataloging, indexing
 conversion, digitization
 organizing large collections
 managing collections, scaling
 preservation, archiving
 interoperability, standardization
 accessing, using, searching
 federated searching of distributed collections
 evaluation of digital libraries
© Tefko Saracevic
49
DL projects in practice
 Heavily oriented toward
institutions & their missions
in libraries, but also others
museums, societies, government, commercial
come in many varieties
 Spread globally
including digitization
 U California, Berkeley’s Libweb
“lists over 7700 pages from libraries in over
145 countries”
 Spending increasing significantly
often a trade-off for other resources
© Tefko Saracevic
50
Connection?
 DL research & DL
practice presently are
conducted
 mostly independently
of each other
 minimally informing
each other
 and having slight, or
no connection
 Parallel universes with
little connections &
interaction, at present
 not good for either
research or practice
© Tefko Saracevic
Table of content
51
Part 9. Conclusions
IS contributions
 IS effected handling of information in
society
 Developed an organized body of
knowledge & professional competencies
 Applied interdisciplinarity
 IR reached a mature stage
 penetrated many fields & human activities
 Stressed HUMAN in human-computer
interaction
© Tefko Saracevic
52
Challenges
 Adjust to the growing & changing social &
organizational role of inf. & related inf.
infrastructure
 Play a positive role in globalization of
information
 Respond to technological imperative in human
terms
 Respond to changes from inf. to
communication explosion - bringing own
experiences to resolutions, particularly to
the web
 Join competition with quality
 Join DIGITAL with LIBRARIES
© Tefko Saracevic
53
Juncture
 IS is at a critical juncture in its
evolution
 Many fields, groups ... moving into
information
 big competition
 entrance of powerful players
 fight for stakes
 To be a major player IS needs to progress
in its:
 research & development
 professional competencies
 educational efforts
 interdisciplinary relations
 Reexamination necessary
© Tefko Saracevic
54
Thank you Miró!
Thank you Picasso!
© Tefko Saracevic
55
Thank you Javier
&
for inviting me!
© Tefko Saracevic
56
Bibliography
Bates, M. J. (1999). Invisible Substrate of Information Science.
Journal of the American Society for Information Science,50, 10431050.
Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101108. Available:
http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm
Hjørland, B. (2000). Library and Information Science: Practice,
Theory, and Philosophical Basis. Information Processing &
Management, 36 (3), 501-531.
Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in
information science research. Journal of the American Society for
Information Science and Technology, 52 (1), 62 - 73.
Saracevic, T. (1999). Information Science. Journal of the American
Society for Information Science, 50 (9) 1051-1063. Available:
http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf
Saracevic, T. (2005). How were digital libraries evaluated?
Presentation at the course and conference Libraries in the Digital
Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available:
http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf
Webber, S. (2003) Information Science in 2003: A Critique. Journal of
Information Science, 29, (4), 311-330.
White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author
Co-citation Analysis of Information Science 1972-1995. Journal of
the American Society for Information Science, 49 (4), 327-355.
© Tefko Saracevic
57
Download