- Courses

advertisement
Interfaces for
Information Retrieval
Ray Larson & Warren Sack
IS202: Information Organization and Retrieval
Fall 2001
UC Berkeley, SIMS
lecture authors: Marti Hearst, Ray Larson, Warren Sack
10/4/01
IS202: Information Organization & Retrieval
Today
• What is HCI?
• Interfaces for IR using the standard
model of IR
• Interfaces for IR using new models of IR
and/or different models of interaction
10/4/01
IS202: Information Organization & Retrieval
Human-Computer Interaction
(HCI)
• Human
– the end-user of a program
• Computer
– the machine the program runs on
• Interaction
– the user tells the computer what they want
– the computer communicates results
(slide adapted What is HCI?
from James Landay)
IS202: Information Organization & Retrieval
What is HCI?
Task
Organizational &
Social Issues
Design
Technology
Human
IS202: Information Organization & Retrieval
(slide by James Landay)
10/4/01
IS202: Information Organization & Retrieval
Shneiderman on HCI
• Well-designed interactive computer
systems promote:
– Positive feelings of success, competence,
and mastery.
– Allow users to concentrate on their work,
rather than on the system.
10/4/01
IS202: Information Organization & Retrieval
Usability Design Goals
• Ease of learning
– faster the second time and so on...
• Recall
– remember how from one session to the next
• Productivity
– perform tasks quickly and efficiently
• Minimal error rates
– if they occur, good feedback so user can recover
• High user satisfaction
– confident of success
(slide by James Landay)
IS202: Information Organization & Retrieval
Who builds UIs?
• A team of specialists
– graphic designers
– interaction / interface designers
– technical writers
– marketers
– test engineers
– software engineers
(slide by James Landay)
IS202: Information Organization & Retrieval
•
•
•
•
How to Design and Build UIs
Task analysis
Rapid prototyping
Evaluation
Implementation
Iterate at every
stage!
Design
Evaluate
Prototype
IS202: Information Organization & Retrieval
(slide adapted from James Landay)
Task Analysis
• Observe existing work practices
• Create examples and scenarios of
actual use
• Try out new ideas before building
software
IS202: Information Organization & Retrieval
Task = Information Access
• The standard interaction model for
information access
–
–
–
–
–
–
–
–
10/4/01
(1) start with an information need
(2) select a system and collections to search on
(3) formulate a query
(4) send the query to the system
(5) receive the results
(6) scan, evaluate, and interpret the results
(7) stop, or
(8) reformulate the query and go to step 4
IS202: Information Organization & Retrieval
HCI Interface questions using the
standard model of IR
• Where does a user start? Faced with a
large set of collections, how can a user
choose one to begin with?
• How will a user formulate a query?
• How will a user scan, evaluate, and
interpret the results?
• How can a user reformulate a query?
10/4/01
IS202: Information Organization & Retrieval
Interface design: Is it always
HCI or the highway?
• No, there are other ways to design interfaces,
including using methods from
–
–
–
–
–
–
10/4/01
Art
Architecture
Sociology
Anthropology
Narrative theory
Geography
IS202: Information Organization & Retrieval
Information Access: Is the
standard IR model always the
model?
• No, other models have been proposed and
explored including
–
–
–
–
–
–
–
10/4/01
Berrypicking (Bates, 1989)
Sensemaking (Russell et al., 1993)
Orienteering (O’Day and Jeffries, 1993)
Intermediaries (Maglio and Barrett, 1996)
Social Navigation (Dourish and Chalmers, 1994)
Agents (e.g., Maes, 1992)
And don’t forget experiments like (Blair and
Maron, 1985)
IS202: Information Organization & Retrieval
IR+HCI
Question 1: Where does the user start?
10/4/01
IS202: Information Organization & Retrieval
Dialog box for choosing sources in old lexis-nexis interface
10/4/01
IS202: Information Organization & Retrieval
Where does a user start?
• Supervised (Manual) Category Overviews
– Yahoo!
– HiBrowse
– MeSHBrowse
• Unsupervised (Automated) Groupings
– Clustering
– Kohonen Feature Maps
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
Incorporating Categories into
the Interface
• Yahoo is the standard method
• Problems:
– Hard to search, meant to be navigated.
– Only one category per document (usually)
10/4/01
IS202: Information Organization & Retrieval
More Complex Example:
MeSH and MedLine
• MeSH Category Hierarchy
– Medical Subject Headings
–
–
–
–
~18,000 labels
manually assigned
~8 labels/article on average
avg depth: 4.5, max depth 9
• Top Level Categories:
10/4/01
anatomy
animals
disease
drugs
diagnosis
related disc
psych
technology
biology
humanities
physics
IS202: Information Organization & Retrieval
MeshBrowse (Korn & Shneiderman95)
Only the relevant subset of the hierarchy is
shown at one time.
10/4/01
IS202: Information Organization & Retrieval
HiBrowse (Pollitt 97)
Browsing several different subsets of category
metadata simultaneously.
10/4/01
IS202: Information Organization & Retrieval
Large Category Sets
• Problems for User Interfaces
• Too many categories to browse
• Too many docs per category
• Docs belong to multiple categories
• Need to integrate search
• Need to show the documents
10/4/01
IS202: Information Organization & Retrieval
Text Clustering
• Finds overall similarities among groups
of documents
• Finds overall similarities among groups
of tokens
• Picks out some themes, ignores others
10/4/01
IS202: Information Organization & Retrieval
Scatter/Gather
Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95
• How it works
– Cluster sets of documents into general “themes”, like a
table of contents
– Display the contents of the clusters by showing topical
terms and typical titles
– User chooses subsets of the clusters and re-clusters
the documents within
– Resulting new groups have different “themes”
• Originally used to give collection overview
• Evidence suggests more appropriate for
displaying retrieval results in context
10/4/01
IS202: Information Organization & Retrieval
Another use of clustering
• Use clustering to map the entire huge
multidimensional document space into a
huge number of small clusters.
• “Project” these onto a 2D graphical
representation
– Group by doc: SPIRE/Kohonen maps
– Group by words: Galaxy of
News/HotSauce/Semio
10/4/01
IS202: Information Organization & Retrieval
Clustering Multi-Dimensional
Document Space
(image from Wise et al 95)
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
Kohonen Feature Maps on Text
(from Chen et al., JASIS 49(7))
Summary: Clustering
• Advantages:
– Get an overview of main themes
– Domain independent
• Disadvantages:
– Many of the ways documents could group together
are not shown
– Not always easy to understand what they mean
– Different levels of granularity
10/4/01
IS202: Information Organization & Retrieval
IR+HCI
Question 2: How will a user formulate a query?
10/4/01
IS202: Information Organization & Retrieval
Query Specification
• Interaction Styles (Shneiderman 97)
– Command Language
– Form Fill
– Menu Selection
– Direct Manipulation
– Natural Language
• What about gesture, eye-tracking, or
implicit inputs like reading habits?
10/4/01
IS202: Information Organization & Retrieval
Command-Based Query
Specification
• command attribute value connector …
– find pa shneiderman and tw user#
• What are the attribute names?
• What are the command names?
• What are allowable values?
10/4/01
IS202: Information Organization & Retrieval
Form-Based Query Specification
(Altavista)
10/4/01
IS202: Information Organization & Retrieval
Form-Based Query Specification
(Melvyl)
10/4/01
IS202: Information Organization & Retrieval
Form-based Query Specification
(Infoseek)
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
Menu-based Query Specification
(Young & Shneiderman 93)
10/4/01
IS202: Information Organization & Retrieval
IR+HCI
Question 3: How will a user scan,
evaluate, and interpret the results?
10/4/01
IS202: Information Organization & Retrieval
Display of Retrieval Results
Goal: minimize time/effort for deciding
which documents to examine in detail
Idea: show the roles of the query terms in
the retrieved documents, making use of
document structure
10/4/01
IS202: Information Organization & Retrieval
Putting Results in Context
• Interfaces should
– give hints about the roles terms play in the
collection
– give hints about what will happen if various
terms are combined
– show explicitly why documents are
retrieved in response to the query
– summarize compactly the subset of
interest
10/4/01
IS202: Information Organization & Retrieval
Putting Results in Context
• Visualizations of Query Term Distribution
– KWIC, TileBars, SeeSoft
• Visualizing Shared Subsets of Query Terms
– InfoCrystal, VIBE, Lattice Views
• Table of Contents as Context
– Superbook, Cha-Cha, DynaCat
• Organizing Results with Tables
– Envision, SenseMaker
• Using Hyperlinks
– WebCutter
10/4/01
IS202: Information Organization & Retrieval
KWIC (Keyword in Context)
• An old standard, ignored by internet search engines
– used in some intranet engines, e.g., Cha-Cha
10/4/01
IS202: Information Organization & Retrieval
TileBars
Graphical Representation of Term
Distribution and Overlap
Simultaneously Indicate:
– relative document length
– query term frequencies
– query term distributions
– query term overlap
10/4/01
IS202: Information Organization & Retrieval
TileBars Example
Query terms:
DBMS (Database Systems)
Reliability
What roles do they play in retrieved documents?
Mainly about both DBMS
& reliability
Mainly about DBMS, discusses
reliability
Mainly about, say, banking, with
a subtopic discussion on
DBMS/Reliability
Mainly about high-tech layoffs
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
SeeSoft: Showing Text Content using a linear
representation and brushing and linking (Eick &
Wills 95)
10/4/01
IS202: Information Organization & Retrieval
David Small:
Virtual Shakespeare
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
Other Approaches
Show how often each query term
occurs in retrieved documents
– VIBE (Korfhage ‘91)
– InfoCrystal (Spoerri ‘94)
10/4/01
IS202: Information Organization & Retrieval
VIBE (Olson et al. 93, Korfhage 93)
10/4/01
IS202: Information Organization & Retrieval
InfoCrystal (Spoerri 94)
10/4/01
IS202: Information Organization & Retrieval
Problems with InfoCrystal
– can’t see overlap of terms within docs
– quantities not represented graphically
– more than 4 terms hard to handle
– no help in selecting terms to begin with
10/4/01
IS202: Information Organization & Retrieval
Cha-Cha (Chen & Hearst 98)
• Shows “table-of-contents”-like view, like Superbook
• Takes advantage of human-created structure within
hyperlinks to create the TOC
10/4/01
IS202: Information Organization & Retrieval
IR+HCI
Question 4: How can a user reformulate a
query?
10/4/01
IS202: Information Organization & Retrieval
Information
need
Collections
Pre-process
text input
Parse
Query
Index
Rank
Query
Modification
Query Modification
• Problem: how to reformulate the query?
– Thesaurus expansion:
• Suggest terms similar to query terms
– Relevance feedback:
• Suggest terms (and documents) similar to retrieved
documents that have been judged to be relevant
10/4/01
IS202: Information Organization & Retrieval
Using Relevance Feedback
• Known to improve results
– in TREC-like conditions (no user involved)
• What about with a user in the loop?
10/4/01
IS202: Information Organization & Retrieval
10/4/01
IS202: Information Organization & Retrieval
Terms
available for
relevance
feedback
made visible
(from Koenemann &
Belkin, 1996)
10/4/01
IS202: Information Organization & Retrieval
How much of the guts should
the user see?
• Opaque (black box)
– (like web search engines)
• Transparent
– (see available terms after the r.f. )
• Penetrable
– (see suggested terms before the r.f.)
• Which do you think worked best?
10/4/01
IS202: Information Organization & Retrieval
Effectiveness Results
• Subjects with R.F. did 17-34% better
performance than no R.F.
• Subjects with penetration case did 15%
better as a group than those in opaque
and transparent cases.
10/4/01
IS202: Information Organization & Retrieval
Summary: HCI Interface questions
using the standard model of IR
• Where does a user start? Faced with a
large set of collections, how can a user
choose one to begin with?
• How will a user formulate a query?
• How will a user scan, evaluate, and
interpret the results?
• How can a user reformulate a query?
10/4/01
IS202: Information Organization & Retrieval
Standard Model
• Assumptions:
– Maximizing precision and recall
simultaneously
– The information need remains static
– The value is in the resulting document set
10/4/01
IS202: Information Organization & Retrieval
Problem with Standard Model:
• Users learn during the search process:
– Scanning titles of retrieved documents
– Reading retrieved documents
– Viewing lists of related topics/thesaurus
terms
– Navigating hyperlinks
• Some users don’t like long disorganized
lists of documents
10/4/01
IS202: Information Organization & Retrieval
“Berrypicking” as an Information
Seeking Strategy (Bates 90)
• Standard IR model
– assumes the information need remains the
same throughout the search process
• Berrypicking model
– interesting information is scattered like
berries among bushes
– the query is continually shifting
– People are learning as they go
10/4/01
IS202: Information Organization & Retrieval
A sketch of a searcher… “moving through
many actions towards a general goal of
satisfactory completion of research related to an
information need.” (after Bates 89)
Q2
Q4
Q3
Q1
Q5
Q0
10/4/01
IS202: Information Organization & Retrieval
Implications
• Interfaces should make it easy to store
intermediate results
• Interfaces should make it easy to follow
trails with unanticipated results
10/4/01
IS202: Information Organization & Retrieval
Information Access: Is the
standard IR model always the
model?
• No, other models have been proposed and
explored including
–
–
–
–
–
–
–
10/4/01
Berrypicking (Bates, 1989)
Sensemaking (Russell et al., 1993)
Orienteering (O’Day and Jeffries, 1993)
Intermediaries (Maglio and Barrett, 1996)
Social Navigation (Dourish and Chalmers, 1994)
Agents (e.g., Maes, 1992)
And don’t forget experiments like (Blair and
Maron, 1985)
IS202: Information Organization & Retrieval
Next Time
• Abbe Don, Guest speaker
– Information architecture and novel interfaces for
information access.
– See Apple Guides paper listed on IS202
assignments page, along with other readings
– Also, here is a request from Abbe:
• look at the following websites
– www.disney.com
– www.sony.com
– www.nickelodeon.com
• go at least "3 levels" deep to get a sense of how the sites
are organized.
10/4/01
IS202: Information Organization & Retrieval
Download