Understanding Web Searching

advertisement
WIRED Week 7
• Quick review of Information Seeking
• Readings Review
- Questions & Comment
- How does this affect IR system use?
- How would this change evaluating IR systems?
• Topic Discussions
• Web search lab game!
What Is Information Seeking?
• “a process in which humans purposefully
engage in order to change their state of
knowledge.” p. 5
• “a process driven by human’s need for
information so that they can interact with
the environment.” p. 28
• “begins with recognition and acceptance
of the problem and continues until the
problem is resolved or abandoned” p. 49
Marchionini
• more than just representation, storage and
systematic retrieval
Information Seeking in Context
Learning
Information Seeking
Information Retrieval
Analytical
Strategy
Browsing
Strategy
How do we search?
• Analytical
•
•
•
•
•
careful planning
recall of query terms
iterative query reformulations
examination of results
batched
• Browsing
•
•
•
•
heuristic
opportunistic
recognizing relevant information
interactive (as can be)
Iseek - WebTracker study
• Corporate IT and knowledge workers
- In work environment
- Own browser and network connection
•
•
•
•
•
Long-term study (weeks)
Overall Web use analyzed
Bookmarks, printed pages
How sites/pages found
Frequency of page visits
Web Study Methodology
• Surveys
• Interviews
• Web Use Data*
- History Files
- WebTracker
- Server Logs
• Bookmarks*
• Printouts
Study Elements
- Research Design
• Field Work
• Field Workers
- Data Collection
1. Questionnaire survey
2. WebTracker application (and Proxy
Server)
3. Personal interviews
Collecting Web Client Data
•
Modified client
-
•
•
Pitkow and Catledge 1995
Bookmarks
•
Chosen Web sites are personal information space
•
Most valuable data file on user’s system
•
Automatically organizing bookmarks
History logs
•
The history mechanism
•
Most promising source for usage data
WebTracker Expanded Window
WebTracker Log
User Browser Action Date and
Time
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
DT
STARTUP
LINK_TO
LINK_TO
Button back
LINK_TO
Key add bkmk
Key open page
LINK_TO
Key back
LINK_TO
Menu save as
LINK_TO
Menu print
Button back
LINK_TO
Button reload
LINK_TO
SHUTDOWN
URL Visited
Page
Title
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/test1.html
http://donturn.fis.utoronto.ca/test/test1.html
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/index.html
http://www8.org
http://www8.org
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/printme.html
http://donturn.fis.utoronto.ca/test/printme.html
http://donturn.fis.utoronto.ca/test/printme.html
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/index.html
http://donturn.fis.utoronto.ca/test/index.html
Home
Page 1
Page 1
Home
Home
Home
WWW8
WWW8
Home
Home
Print
Print
Print
Home
Home
Home
11/1/98 16:29:44
11/1/98 16:34:59
11/1/98 16:35:08
11/1/98 16:35:09
11/1/98 16:35:17
11/1/98 16:35:17
11/1/98 16:35:28
11/1/98 16:35:35
11/1/98 16:35:41
11/1/98 16:35:44
11/1/98 16:35:46
11/1/98 16:36:11
11/1/98 16:36:15
11/1/98 16:36:46
11/1/98 16:36:56
11/1/98 16:36:59
11/1/98 16:37:14
11/1/98 16:41:11
Data Analysis
• Log files tabulated into spreadsheets
• Examined for clusters or patterns of
behavior
• Selection of episodes of Information
Seeking behavior
- a highlighting of the episode by the participant during the
personal interview;
- evidence of the episode having consumed a relatively substantial
amount of time and effort;
- evidence that the episode was a recurrent activity.
• Determined the modes of scanning &
moves exercised by the participants
Behavioral Model
• Recurring Web behavioral patterns
that relate people’s browser actions
(Web moves) to their
browsing/searching context (Web
modes)
• Modes of scanning: Aguilar (1967) &
Weick & Daft (1983, 1984)
• Moves in information seeking
behavior: Ellis (1989) & Ellis et. al.
(1993, 1997)
Modes of Scanning
Scanning
Modes
Undirected
Viewing
Conditioned
Viewing
Information Need
Information Use
General areas of
interest;
specific need to be
revealed
Serendipitous
discovery
Able to recognize
topics of interest
Increase
understanding
Amount of
Targeted
Effort
Number
of
Sources
Minimal
Many
“Sensing”
Tactics
• Scan broadly a diversity of
sources, taking advantage
of what’s easily accessible
• “Touring”
Low
Few
“Sensemaking”
• Browse in pre-selected
sources on pre-specified
topics of interest
• “Tracking”
Informal
Search
Able to formulate
queries
Increase
knowledge within
narrow limits
Medium
Few
“Learning”
• Search is focused on an
issue or event, but a goodenough search is
satisfactory
• “Satisficing”
Formal
Search
Able to specify
targets
Formal use of
information for
planning, acting
“Deciding”
High
Many
• Systematic gathering of
information on a target,
following some method or
procedure
• “Retrieving”
Modes of Scanning for Information
Scanning
Modes
Information
Need
Information
Seeking
Information
Use
Undirected
Viewing
General areas of
interest
“Sweeping”
“Browsing”
Conditioned
Viewing
Able to recognize
topics of interest
“Discriminating” “Learning”
Informal
Search
Able to formulate
simple queries
“Satisficing”
“Selecting”
Formal
Search
Able to specify
targets in detail
“Optimizing”
“Retrieving”
ISeek Behaviors & Web Moves
Modes & Moves Model
Undirected
Viewing
Conditioned
Viewing
Informal
Search
Formal
Search
Starting
Chaining
Identifying
selecting
starting
pages, sites
Following
links on
initial
pages
Browsing
Browsing
entry
pages,
headings,
site maps
Differentiating
Monitoring
Extracting
Bookmarking, Revisiting
printing,
‘favorite’ or
copying
bookmarked
sites for new
Going directly information
to known site
Bookmarking, Revisiting
printing,
‘favorite’ or
copying
bookmarked
sites for new
Going directly information
to known site
Using
(local)
search
engines to
extract
information
Revisiting
‘favorite’ or
bookmarked
sites for new
info
Using
search
engines to
extract
information
Behavioral Model Verification
Starting
Undirected
Viewing
Chaining
Browsing
Differentiating
Monitoring
Extracting
12 Episodes
Conditioned
Viewing
Informal
Search
Formal
Search
• 61 identifiable episodes
18 Episodes
23 Episodes
8 Episodes
Behavioral Model Results
• People who use the Web engage in 4
complementary modes of information
seeking
• Certain browser based actions & events
indicate a particular mode of
information seeking
• Surprises
- No Explicit Instances of Monitoring to Support
Formal Searching
- Very Few Instances of “Push” Monitoring
- Extracting Involved Basic Search Strategies Only
Interview Highlights
• Most useful work-related sites:
1.
2.
3.
4.
Resource sites by associations & user groups
News sites
Company sites
Search engines
• Most people do not avidly search for new Web
sites
• Criteria to bookmark is largely based on a site
providing relevant & up-to-date information
• Learning about new Web sites:
1. Search engines
2. Magazines & newsletters
3. Other people/colleagues
Survey Highlights
• The Web was the 3rd most frequently used
source
• Participants spent about 20% of their work
hours using the Web
• Majority looked for technical information on the
Web
• Quality of Web information was perceived to be
“very high” (reliable)
• Web was perceived as accessible as other
“internal” sources however less accessible
than mass media sources
• Few participants deliberately set out to search
for new sites
Study 1 Summary
• Behavioral model of information seeking on
the Web
• People who use the Web engage in
complementary modes of information seeking
• Certain browser based actions & events
indicate particular moves in information
seeking
• The study suggests:
- that a behavioral framework that relates user motivations and
Web moves may be helpful in analyzing Web-based
Information Seeking
- that multiple, complementary methods of collecting
qualitative and quantitative data may help compose a richer
portrayal of how individuals use Web-based information in
their natural work settings
Study Recommendations
Web Use Mode
Undirected
viewing:
starting and
chaining
Enhancing Web Use
• Introduce systems that search/recommend jump sites
• Encourage groups to share bookmarks, Web pages, URLs
• Design portals to support undirected, serendipitous viewing
Conditioned
• Train users to evaluate and escalate priority or importance of info
viewing:
• Develop ways of sharing Web-based info via email / online forums
browsing,
differentiating • Provide ways of telling users about new content on Web pages
monitoring
Informal
• Pre-select sources & search engines for quick, informal searches
search:
• Prepackage search strategies developed by subject matter experts
differentiating
• Educate users on how to evaluate info provenance and quality
monitoring,
extracting
Formal
search:
extracting
• Use multiple info sources for comprehensive searching
• Educate users about when to use different information services
• Train users on advanced search techniques
Iseek Expanded Study (2)
•
•
•
•
•
•
Larger Dataset
One Organization
Longer Duration
Open-ended Interviews
IT Survey
More Quantitative Modeling
- Glassman (1994);
- Catledge & Pitkow (1995);
- Tauscher & Greenberg (1997a, 1997b);
- Huberman, Pirolli, Pitkow, & Lukose (1998)
New Types Data Collection
• Sources
- Modified Logs
- Interviews (More Focused)
- Survey (Broader Focus)
- Field Observation (Cube Work)
• Volume
- Over 1400 Consistent Users
- Over a Month of Web Use
- 8+ GB of data
Collecting Web Server Data
-
Web Server Log Accuracy
•
•
•
-
Proxy Server Logs
•
•
•
-
Hit - a single file is requested from the Web server
View - all of the information contained on a single
Web page
Visit - one series of views at a particular Web site.
Day sampling - stop caching and analyzing data.
IP sampling - cancel caching of particular Web
users and measuring these results only
Continuous sampling - use cookie files to track a
particular user(s)
KDD
Survey Highlights
• Users not motivated to change/update
browser versions or startup page
• IT made no modifications of browser
until recently, primarily for system
access testing
• Most of most frequent users from
technical departments
• All IT system work now Web-specific
Interview Highlights
• Corporate adoption of Internet access driven
by Intranet development
• Local portrayals of successful Web work drove
rapid adoption
• Use of Intranet viewed as both resource
conservation and expanded work
• Logging of Web use data not a high concern
• Open to recommendations to improve Web use
• “Webify”ing Everything seen as good
KDD Highlights
• Extremely High Data Collection Reliability
• Tightly-focused Web Use (business sites)
• Very Small (Determinable) Inappropriate Use
( >.001%)
• Lower than Expected Search Engine Use
- Influenced by Startup Page
- Internal Search Results Pages Used
• Higher than Expected (Average) Use of
Intranet
KDD Use Highlights
• 40,000+ episodes
• 11:15 average episode length
• Search term mode of 1
- Not dominantly work-related terms
- Use of intranet search results
influential
Updated Behavioral Model
Starting
Undirected
Viewing
Conditioned
Viewing
Chaining
Browsing
Differentiating
Monitoring
Extracting
3079 Episodes
Episodes
1924 Episodes
Episodes
5170 Episodes
Informal
Search
Formal
Search
• 32,512 identifiable episodes
13,992 Episodes
8347 Episodes
Behaviors Breakdown
Other Studies
• Tend to focus on server logs, a broad
range of Web users, general Web seeking
activity, quantitative methods
- Glassman (1994): Proxy Study
- Catledge & Pitkow (1995): Surveys and Client
tool;
- Tauscher & Greenberg (1997a, 1997b): The
Back button;
- Ingwersen (1995 & 1997): Informetrics
- Huberman, Pirolli, Pitkow, & Lukose (1998):
Information Foraging, “Law of Surfing”
- Huberman “Laws of the Web” (2001)
Study 2 Summary
• Behavioral Model Scales Up
• Server Logs Provide Significant
Gains in Quantity
• Server Logs Provide Challenges in
Deriving Quality
• Organizations Provide Focused View
of Overall Web Use
• Knowledge Workers Collaborate (But
Not Enough)
Summary
• (New) Methodology
• Provide new ideas for data collection &
cleaning tools
• Verify models of Information Seeking
and Web Use
• Discover models of Web usage
• Find different types of Web users
• Gain rich descriptions of perception of
Web & Web use
• Evoke new system & interface designs
Other Tools for Web Studies
• Pete Pirolli, Rob Reeder, Ed Chi, et. al (UIR
Group Xerox PARC) Web Logger
• Eytan Adgar, Bernardo Huberman (Web
Ecology Group @ PARC, now HP)
• Andy Edmonds – Uzilla.net
• Vividence
• Web Evaluation Tool (WET)
• Eye Tracking (*)
Improving Web Use
•
•
•
•
•
Expert Systems - SNLP
Multimedia Databases & Metadata
Display Technology
Better GUIs
Better, More Available Search Engines/query
Syntax
- Desktop Search
- Ranking
- Relevance
• Help expert users get more expert
Web Activities Taxonomies
• What types of activities on the Web have impact?
• What we do vs. what seems significant
• Purpose of people’s search
- Find
• Get a fact or document
• Download information
• Find out about a product
- Compare/Choose: 51%
• Methods used to find information
- Explore, Monitor, Find, Collect: 71%
• Content for which they are searching
- Medical: 18%, People: 13%, …
Berrypicking & IR Flexibility
• IR systems are rational, users aren’t (always)
• We don’t search in a linear model
- Single query, one good result
• We gradually build on what we know, how we find it
-
Footnote chasing (backward chaining)
Citation searching (forward chaining)
Journal run (favorite sites)
Area scanning (browsing)
Subject searches in bibliographies, abstracts & indices
Author searching
• We combine all of these when searching
• Interface support for each & combinations
Berrypicking Paths
Web Search Studies Framework
• Web IR is still relatively new
- Differences in users & information
- Changes in IR systems are rapid
• Who doesn’t search now?
• “A Web searching study focuses on isolating searching characteristics
of searchers using a Web IR system via analysis of data, typically
gathered from transaction logs.” p 3
• Studying Search Engine use
- AltaVista, Excite
• Web Searching Studies
- Single & Multiple Web sites
Characterizing Browsing
• Modifed XMosiac to learn Web browser
behavior
• Path lengths key (but changed)
• Types of users:
- Serendipitous browsers – little repetition, short
sequences
- General purpose browsers – average, repeated
actions
- Searchers – long navigational sequences
Cognitive Strategies in Web Search
• Systems help with:
• re-representation - different external representations,
that have the same abstract structure, make problemsolving easier or more difficult. It also refers to how
different strategies and representations, varying in
their efficiency for solving a problem.
• graphical constraining - constrain the kinds of
inferences that can be made about the underlying
represented concept.
• temporal and spatial constraining - different
representations make relevant aspects of processes
and events more salient when distributed over time
and space.
Cognitive Strategies
• Searching Conditions
- Dispersed or Category Structures
•
•
•
•
Fact finding
Exploratory searching
Novice & Experiences users
Top-down, bottom-up & mixed
Reading Time, Scrolling & Interaction
• Can implicit feedback improve relevancy?
- 561 documents, 6 subjects
- Read documents & score them
• Better than reading, saving & printing?
- Measure use now vs. later
- Focused on document, not activity
•
•
•
•
•
How do you know the user is reading?
Is saving a relevance measure?
No differences noted in scrolling (4.28)
What about following links?
Finding, highlighting, copying?
How do we really use the Web?
•
•
•
•
People don’t read, they scan Web pages
We move quickly, we know we can go back
Quick experimentation & short memory
Behaviors that work are reinforced &
continued
• Satificing makes measures of quality difficult
• Web pages as Billboards?
• What’s billboard information for IR systems?
Revisitation Patterns on WWW
•
•
•
•
Mostly Re-Visits (58%)
Continually Visit New Pages
Access Only A Few Pages Frequently
Clusters (Sets) & Short Paths of URLs
- Frequency
- Recency
- “Distance”
• Types of Navigation
- Hub and Spoke
- Depth Searching (lots of links before returning, if at all)
- Guided Tour (Tasks)
Revisitation Patterns 2
• Back Button Use Affects Everything (Even More Since
Study)
• Navigation Methods Differ
• Reasons for Revisiting
- Explore Further
- Use Feature (Search or Home Page)
- “On the Way” to another Page (IA Problem)
• Users Don’t Understand Browser History Very Well or
Do They Misunderstand Page/Site Navigation?
• Provide Navigation Support
• Work with the Back Button – Don’t Break its
Functionality
Web search lab game
• Break into groups
• Answer a set of questions
• Different rules for each search
1.
2.
3.
4.
5.
Search as you would
Talk & decide before each move
No typing this time!
Search as you would again
Fast as possible
Download