Information Seeking Behavior

advertisement
Information Seeking Behavior
LIS 510
Introduction
• Every day we are deluged by data
– It is received through our five senses, which are
continuously at work
• Wide variety of input sources
–
–
–
–
Written material (hard copy and electronic)
Auditory (speech, radio, CDs, etc.)
Imagery (photographs, graphs, etc.)
Video (TV, movies, etc.)
Information Overload
• “The greatest problem of today is how to
teach people to ignore the irrelevant, how to
refuse to know things, before they are
suffocated. For too many facts are as bad as
none at all.” (W.H. Auden)
Information Theory
• Claude Shannon, 1940’s, studying communication
• Ways to measure information
– Communication: producing the same message at its destination as
that seen at its source
– Problem: a “noisy channel” can distort the message
• Between transmitter and receiver, the message must be
encoded
• Semantic aspects are irrelevant
Noise
Message
source
Transmitter
Receiver
Channel
Destination
Information Theory
• Better called “Communication Theory”
• Communication may be over time and space
Message
Source
Message
Encoding
Decoding
Destination
Channel
Message
Source
Message
Encoding
(writing/indexing)
Storage
Decoding
(Retrieval/Reading)
Destination
What kinds of information are
there?
• Text
– books, periodicals, WWW, memos, ads
– published/refeered
•
•
•
•
•
Film
Photos, other Images
Broadcast TV, Radio
Telephone Conversations
Databases
How much information is there?
Gigabyte
10^9 bytes
1000 megabytes
Terabyte
10^12 bytes
1000 gigabytes
Petabyte
10^15 bytes
1000 terabytes
Exabyte
10^18 bytes
1000 petabytes
How Much Information?
• Stored Information
–
–
–
–
Print
Film
Optical
Magnetic
• Communicated
–
–
–
–
Internet
Broadcast
Phone
Mail
Print
• Annual Production
– Books 968,735 =
8 Terabytes (compressed
image)
–
–
–
–
Newspapers 22643 = 25 Terabytes
Journals 40000 =
2 Terabytes
Magazines 80000 = 10 Terabytes
Office Documents 12x10^9 pages = 312
Terabytes
– TOTAL 357 Terabytes (1824 scanned, 35 text)
Print
• Library of Congress Printed book collection
– About 18 Million books
– About 130 Terabytes (compressed image)
– For all of LC we should also assume
•
•
•
•
•
13M photographs, 5MB each = 65 TB
4M maps, say 200 TB
500K files, 1GB each = 500 TB
3.5M sound recordings, ~2000 TB
Grand total: 3 petabytes (~3000 terabytes)
• Books in Print
– 3.2 Million titles
– About 26 Terabytes
Film and Image
• Film
– Photographs = 410 Petabytes per year
– Movies = 16 Terabytes (Commercial
Production of about 4000 films)
– X-Rays = 12 Petabytes
Optical Media
•
•
•
•
CD-Music 90,000 items = 58 TB
CD-ROM 3,000 items = 3 TB
DVD-Video 5,000 items = 22 TB
Total
83 TB
Magnetic Media
•
•
•
•
•
Audio Tape 184,200,000 = 184.2 Petabytes
Video Tape 355,000,000 = 1420
Floppy disks
= 0.07
Removable disks
= 1.69
Hard Disks
= 500
Medium Type of content Terabytes/Year Terabytes/Year
Upper Bound
Lower Bound
Paper
Books
8
7
Newspapers
25
20
Periodicals
12
12
Office documents 312
312
SUBTOTAL
357
351
Film
Photographs
410,000
100,000
Cinema
16
16
X-Rays
12,000
12,000
SUBTOTAL 422,000
112,016
Optical
Music CDs
58
40
Data CDs
3
3
DVDs
22
22
SUBTOTAL
83
65
Magnetic
Camcorder
300,000
300,000
Disk drives
2,555,000
1,000,20
SUBTOTAL
2,855,000
1,300,200
TOTAL
3,277,440
1,412,632
Current Size of Web
• There are an estimated 2.1 Billion pages on the
Web
– About 21 Terabytes
– About 7500 further Terabytes in web-accessed DBs.
• 610 Billion email messages per year = 11285 TB
• Internet Traffic is doubling every 100 days - An
estimated 62 Million Americans now use the
internet Radio took 38 years to get 50 M listeners,
TV took 13 years, the Net took 4 years...
Internet Hosts: 1989-2005
1000000
900000
800000
700000
600000
500000
400000
300000
200000
hosts
100000
0
2005
2003
2001
1999
1997
1995
1993
1991
1989
Projected Voice and Data Traffic
30000
25000
20000
Voice
Data
15000
10000
5000
0
1996
1997
1998
1999
2000
2001
2002
Language Distribution of Web
Content
English
German
Chinese
Italian
Malay
Portuguese
Danish
Finnish
Polish
Norwegian
Greek
Croatian
Thai
Arabic
Others & Unknown
Japanese
French
Spanish
Swedish
Korean
Dutch
Czech
Russian
Hungarian
Estonian
Bulgarian
Basque
Turkish
Albanian
Language Distribution on a 634 Million Web
Pages Corpus
Language
English
Japanese
German
French
Chinese
Spanish
Italian
Swedish
Malay
Korean
Portuguese
Dutch
Danish
Czech
Finnish
Russian
Polish
Hungarian
Norwegian
Estonian
Greek
Bulgarian
Croatian
Basque
Thai
Turkish
Arabic
Albanian
Others & Unknown
Tota l
Number of Docs
453,685,690
43,271,080
32,253,563
11,107,994
9,642,450
6,965,560
5,638,827
4,392,709
3,619,227
3,200,762
3,014,294
2,745,610
1,911,677
1,428,385
1,312,932
1,150,127
952,716
760,162
607,211
456,613
393,360
392,777
310,237
258,074
99,691
81,218
38,167
17,779
44,561,062
634,269,953
Percentage
71.5288%
6.8222%
5.0851%
1.7513%
1.5202%
1.0982%
0.8890%
0.6926%
0.5706%
0.5046%
0.4752%
0.4329%
0.3014%
0.2252%
0.2070%
0.1813%
0.1502%
0.1198%
0.0957%
0.0720%
0.0620%
0.0619%
0.0489%
0.0407%
0.0157%
0.0128%
0.0060%
0.0028%
7.0256%
100%
Human Memory
– Landauer 86: Human brain holds 200MB
• looked at rate of information intake and rate of
forgetting, and amount of information adults need
for normal tasks
– 6B people on earth implies total memory of all
people alive about 1,200 petabytes
– Another way:
• estimate that people take in a byte/sec
• lifetime 250,000 days or 2B sec
• result is 2 GB (doesn’t count synthesizing new info)
Data and Information
• These two terms are quite often used
interchangeably
– used without any definitions or explanation
• There are no standard definitions for these
two terms
• Two possible definitions:
Data and Information (cont.)
• Data
– items such as text, facts, numbers, images or
sounds that may or may not be useful for a
particular purpose
• Information
– data which has been processed so that its form
and content are appropriate for a particular
purpose
Intuitive Notion
• Information must
– Be something, although the exact nature (substance,
energy, or abstract concept) is not clear;
– Be “new”: repetition of previously received messages is
not informative
– Be “true”: false or counterfactual information is “misinformation”
– Be “about” something
• This human-centered approach emphasizes
meaning and use of message
Knowledge
• Quite often the terms information and
knowledge are used interchangeably
• One possible definition of knowledge
– a combination of information, instincts, rules,
ideas, procedures and experience that guide
actions and decisions
Knowledge (cont.)
• Two types of knowledge
– Tacit
• also called implicit, private or personal knowledge
• knowledge held by an individual; may not have been
articulated or may not be articulatable
– For example, how does Michael Jordan accomplish his
“slam dunks”
Knowledge (cont.)
– Explicit
• also called public or social knowledge
• expressed in a form that makes it available to others
– usually in a written form, but may be in other forms such
as verbal
Continuum
• Quite often data, information and
knowledge are expressed as a continuum:
• Data => Information => Knowledge
Pyramid
• Data, information and knowledge are also
depicted as a pyramid
– a distillation occurs as we move up the pyramid
• data is “raw material”
• as data is processed, information is distilled from it
and the resulting amount is smaller in size; the same
result is experienced in going from information to
knowledge
Wisdom
Long term goal should be the acquisition of
wisdom
– but there is not much discussion in the literature or
in the media
The current situation was aptly described by
T.S. Eliot:
– “Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in
information?”
Wisdom (cont.)
Wisdom connotes the ability to acquire and use
knowledge and information judiciously,
possessing the power of judging rightly and
following the soundest course of action based
on knowledge, skill, experience and
understanding.
Information Hierarchy
• Data
– The raw material of information
• Information
– Data organized and presented by someone
• Knowledge
– Information read, heard or seen and understood
• Wisdom
– Distilled and integrated knowledge and
understanding
What is Data?
• Represented by shapes
or symbols that
require cognitive skill
to decipher
• May not provide a
context to fully
understand its
meaning
e.g.
10,000,000
5,000,000
What is Information?
• Involves process of
reception, recognition and
conversion
• May involve a ‘novelty’
factor--a new piece of data
• May have multiple
interpretations resulting in
‘public’ and ‘private’
information
e.g.
Joe won
$10,000,000 in
the lottery last
year and
$5,000,000 more
this year.
What is Knowledge?
• Is created/acquired
from a collection of
information
• Knowledge builds on
a foundation of
accurate information
and can be passed on
to others
e.g.
Joe has been paying a
lot of taxes because
of his lottery
winnings and the
brand new mansion
he bought.
What is Wisdom/Insight?
• Represents highest
level of complexity in
chain of concepts
• Difficult to impart via
a storage medium
• Argued to exist only
within an individual
e.g.
He who has money
has friends.
Where is the Life we have lost in living?
Where is the wisdom we have lost in knowledge?
Where is the knowledge we have lost in
information?
-- T.S. Eliot, “The Rock”
Where is the information we have lost in data?
Whom Do People Ask for
Information?
• People immediately present
• People they know
• People they trust
– “Gatekeepers”
– People in authority generally
– People with cognitive authority
• Teachers
• Librarians
How Do People Ask for
Information?
• At the moment of need
• By the easiest available route
• By what they expect will give them the
most suitable answer
• By what they expect will give them the
most accessible answer
Information and People
• Information reinforces social bonds
• People exchange familiar information
• People continue to believe erroneous
information
• People say they value information (more
than they use it)
• People want a known available source
Limits to Information
• People do not want information that will upset
them
• People do not want information that might upset
them
• People do not want more information than they
can store
• People do not want more information than they
can process
• People must eventually stop getting information
and act on what they know
Dangers of Information
•
•
•
•
•
•
•
Information might be erroneous
Information might be deliberately misleading
Information might be contradictory
Information might be so excessive as to paralyze action
Information may cost more than its worth
Relying on authority may be better than information
Possessing information may make one a too conspicuous
social figure
• Possessing information may make one a challenge to
authority
Storing Information
• People do not want more information than they
can store
• Immediate storage:
– Short and long term memory
– Active knowledge
• People need more information than they can store
immediately:
– “At hand”
– “In the library”
– “On the web”
Information Wants and Needs
•
•
•
•
•
•
What people truly need
What people recognize they need
What people are willing to admit they need
What people truly want now
What people think they want now
What people say they want now
The Standard Retrieval
Interaction Model
Standard Model Assumptions
• Maximizing precision and recall
simultaneously
• The information need remains static
• The value is the resulting document set
Problems with Standard Model
• Users learn during the search process
–
–
–
–
Scanning titles of retrieved documents
Reading retrieved documents
Viewing lists of related topics
Navigating hyperlinks
• Some users don’t like long disorganized
lists of documents
Berry-Picking as an Information
Seeking Strategy
• Standard IR model
– Assumes the information need remains the
same throughout the search process
• Berry-picking model
– Interesting information is scattered like berries
among bushes
– The query is continually shifting
Berry-Picking Model (cont.)
• The query is continually shifting
• New information may yield new ideas and
new directions
• The information need
– Is not satisfied by a single, final retrieved set
– Is satisfied by a series of selections and bits of
information found along the way
Systems View (cont.)
Data enters the system and are converted into
information through a process of formatting,
filtering and summarizing.
– knowledge is used to determine how to format, filter
and summarize data
Guided by knowledge, the resulting
information is interpreted
– this leads to decisions and actions
Systems View (cont.)
The actions generate results.
Comparison of actions and results helps
accumulate new knowledge
– this improves the process of interpreting
information, making decisions and taking new
actions.
The information search process
• the user’s constructive activity of finding
meaning from information in order to
extend his or her state of knowledge
• the process of sense-making within a
personal frame of reference
The user and information-seeking
behavior
• There is a long history of studying human
behavior in seeking and using information
• Systems-oriented studies and informationas-object-oriented systems
• User-oriented studies and user-oriented
systems
Systems orientation
• Information is viewed as:
– an external objective entity
– having a content-based reality
– existing independently of users or social
systems.
User-centered
• Information is viewed as:
– a subjective construction
– that is created internally in the minds of the users
• User orientation
– Users, looking for information to aid
problem solving and decision-making, have
inadequacies in their state of knowledge - gaps
or uncertainties
• sometimes they know what they need to find out;
sometimes they don’t
User-centered (cont.)
– Information systems should be designed to
assist users in discovering and representing
their knowledge of a problem situation
• User model
– A general user model of information seeking
behavior must encompass both the user and his
context, i,e., the information behavior of the
user and the environment in which this
behavior occurs.
Information Behavior
• Information needs
• Information seeking
• Information use
Dervin’s Sense-Making Model
• Dervin’s sense-making model focuses on
the user’s cognitive needs
– the user moves through space and time
– making sense of his/her actions, the
environment and the information system’s
inputs
• As long as everything is meaningful,
movement ahead is possible.
Dervin’s Sense-Making Model
• But, movement ahead may be blocked by
stops or cognitive gaps
• And user must define the nature of the gap
or the cause of the stop
• Based on user’s assessment, he selects
tactics and information to bridge the gap.
Kuhlthau’s model
• Distinguished stages in the information
search and use process -each stage
characterized by the user’s behavior in three
realms of experience
– the affective (feelings)
– the cognitive (thought)
– the physical (action)
Kuhlthau’s model (cont.)
• Six stages of the information search
process:
–
–
–
–
–
–
initiation
selection
exploration
formulation
collection
presentation
Communication between the user and the
information retrieval system
• each has its own language (concepts vs.
symbols)
• user must “translate” his information
need into one the information system
will understand OR
• the information system must interpret the
information need of the user and translate
the user’s request into one that the system
can process
How is this communication
accomplished?
• Different ways of searching
– controlled vocabulary
– natural language
Information ecology
• User behavior and user environments
– part of what Davenport calls the “information
ecology” of an organization
• internal environments and
• external environments
Information ecology (Davenport)
• Davenport views an information ecology as
encompassing six components:
–
–
–
–
–
–
Information strategy
Information politics
Information behavior and culture
Information staff
Information processes (use)
Information architecture
Information Systems
• An information system is a combination of
work practices, information, people, and
information technology organized to
accomplish goals in an organization
– goals are actually outside the information
system
Forms of Information Systems
•
•
•
•
•
•
database systems
information storage and retrieval systems
transaction processing systems
management information systems
decision support systems
knowledge management systems
Components of information
systems
• Work practices
• Information
• People
• Information technology
Information Life Cycle
• A useful way to envision information is in
terms of its life cycle
– the life cycle identifies the phases through
which information passes from creation to final
disposition
• Life cycle phases
– Creating (Authoring)
– Distribution (Networking)
Information Life Cycle (cont.)
• Life cycle phases (cont.)
–
–
–
–
–
–
Use
Organizing/Indexing
Storing/Retrieving
Accessing/Retrieving
Reusing/Modifying
Disposition
Download