clustering04 - UC Berkeley School of Information

advertisement
The Failure of Clustering in Search
Interfaces …
or
When/How/Why Clustering can be
Successful in Search Interfaces
Marti Hearst
UC Berkeley
Oct 6, 2004
http://www.sims.berkeley.edu/~hearst
1
Main Points
• Grouping search results is desirable
• However, getting good groups is difficult
• Furthermore, incorporation of groups into
interfaces has not been done well
• Good news: improvements are happening
2
Talk Outline
• Why search interfaces are difficult to define
• Definition of categories and clusters
• Studies showing failure of clustering in
interfaces
• A new development in clustering in web
search
• How to remedy these problems
3
Clustering Interface Problems
• Big problem:
– Clusters used primarily as part of a visualization
• This just doesn’t work
– Every usability study says so
– Lots of dots scattered about the screen is meaningless to
users
– There is no inherent spatial relationship among the
documents
– Need text to understand content
• Another big problem:
– Clustering images according to an approximation of visual
similarity
• This just doesn’t work
– What limited studies have been done say so
– Instead: group according to textual categories
4
Search interfaces are
difficult to design
• Content and queries are hugely varying
– The scope of what people search for is all of human
knowledge and experience (!)
– Interfaces must accommodate human differences in
•
•
•
•
Knowledge / life experience
Cultural background and expectations
Reading / scanning ability and style
Methods of looking for things (pilers vs. filers)
5
Abstractions Are Difficult to
Represent
• Text describes abstract concepts
– Difficult to show the contents of text in a visual or
compact manner
• Exercise:
– How would you show the preamble of the US
Constitution visually?
– How would you show the contents of Joyce’s Ulysses
visually? How would you distinguish it from Homer’s
The Odyssey or McCourt’s Angela’s Ashes?
• The point: it is difficult to show text without
using text
6
Lack of Technical Understanding
• Most people don’t understand the underlying
methods by which search engines work.
– Without appropriate explanations, most of 14 people
had strong misconceptions about:
• ANDing vs ORing of search terms
– Some assumed ANDing search engine indexed a smaller
collection; most had no explanation at all
• For empty results for query “to be or not to be”
– 9 of 14 could not explain in a method that remotely
resembled stop word removal
• For term order variation “boat fire” vs. “fire boat”
– Only 5 out of 14 expected different results
Muramatsu & Pratt, “Transparent Queries: Investigating Users’
Mental Models of Search Engines, SIGIR 2001.
7
Other Issues
• Vocabulary Disconnect
– If you ask a set of people to describe a set of things
there is little overlap in the results.
• If one person assigns a name, the probability of it NOT
matching with another person’s is about 80%
• It is difficult to represent content compactly
• Small details matter
• People are reluctant to change search
interfaces
Furnas, et al: The Vocabulary Problem in Human-System
Communication. Commun. ACM 30(11): 964-971 (1987)
8
The Need to Group
• Interviews with lay users often reveal a desire
for better organization of retrieval results
• Useful for suggesting where to look next
– People prefer links over generating search terms
– But only when the links are for what they want
• Two main approaches for text and images:
– Group items according to pre-defined categories
– Group items into automatically-created clusters
Ojakaar and Spool, Users Continue After Category Links, UIETips
Newsletter, http://world.std.com/~uieweb/Articles/, 2001
9
Categories
• Human-created
– But often automatically assigned to items
• Arranged in hierarchy, network, or facets
– Can assign multiple categories to items
– Or place items within categories
• Usually restricted to a fixed set
– So help reduce the space of concepts
• Intended to be readily understandable
– To those who know the underlying domain
– Provide a novice with a conceptual structure
• There are many already made up!
• However, until recently, their use in interfaces has been
– Under-investigated
– Not met their promise
10
Category System Examples
11
Category
System
Examples
12
Category
System
Examples
eat.epicurious.com
13
Category
System
Examples
eat.epicurious.com
14
Example of Faceted Metadata:
Medical Subject Headings (MeSH)
Facets
1. Anatomy [A]
2. Organisms [B]
3. Diseases [C]
4. Chemicals and Drugs [D]
5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]
6. Psychiatry and Psychology [F]
7. Biological Sciences [G]
8. Physical Sciences [H]
9. Anthropology, Education, Sociology and Social Phenomena [I]
10. Technology and Food and Beverages [J]
11. Humanities [K]
12. Information Science [L]
13. Persons [M]
14. Health Care [N]
15. Geographic Locations [Z]
15
Each Facet Has Hierarchy
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
9. [I]
10. [J]
11. [K]
12. [L]
13. [M]
16
Clustering
• “The art of finding groups in data”
– Kaufman and Rousseeuw
• Groups are formed according to associations
and commonalities among the data’s features.
– There are dozens of algorithms, more all the time
– Most need a way of determing similarity or difference
between a pair of items
– In text clustering, documents usually represented as
a vector of weighted features which are some
transformation on the words
– Similarity between documents is a weighted measure
of feature overlap
17
Clustering
• Potential benefits:
– Find the main themes in a set of documents
• Potentially useful if the user wants a summary of the
main themes in the subcollection
• Potentially harmful if the user is interested in less
dominant themes
– More flexible than pre-defined categories
• There may be important themes that have not been
anticipated
– Disambiguate ambiguous terms
• ACL
– Clustering retrieved documents tends to group those
relevant to a complex query together
Hearst, Pedersen, Revisiting the Cluster Hypothesis, SIGIR’96
18
Scatter/Gather Clustering
• Developed at PARC in the late 80’s/early 90’s
• Top-down approach
– Start with k seeds (documents) to represent k clusters
– Each document assigned to the cluster with the most similar
seeds
• To choose the seeds:
– Cluster in a bottom-up manner
– Hierarchical agglomerative clustering
• Start with n documents, compare all by pairwise similarity,
combine the two most similar documents to make a cluster
• Now compare both clusters and individual documents to find the
most similar pair to combine
• Continue until k clusters remain
• Use the centroid of each of these as seeds
– Centroid: average of the weighted vectors
• Can recluster a cluster to produce a hierarchy of clusters
Pedersen, Cutting, Karger, Tukey, Scatter/Gather: A Cluster-based
Approach to Browsing Large Document Collections, SIGIR 1992
19
query
Collection
Rank
Cluster
The Scatter/Gather Interface
S/G Example: query on “star”
Encyclopedia text
8 symbols
68 film, tv (p)
97 astrophysics
67 astronomy(p)
10 flora/fauna
14 sports
47 film, tv
7 music
12 steller phenomena
49 galaxies, stars
29 constellations
7 miscelleneous
Clustering and re-clustering is entirely automated
22
S/G Example: query on “star”
Newspaper/Magazine text
22
41
58
98
31
products / business
software / computers
restaurants / food (reviews)
movies / tv (reviews)
wall street / finance
35 hollywood
54 astronomers/movies
9 film mini-reviews
Topics quite different from encyclopedia text
25
Two Queries: Two Clusterings
AUTO, CAR, ELECTRIC
8 control drive accident …
AUTO, CAR, SAFETY
6 control inventory integrate …
25 battery california technology … 10 investigation washington …
48 import j. rate honda toyota …
12 study fuel death bag air …
16 export international unit japan
61 sale domestic truck import …
3 service employee automatic …
11 japan export defect unite …
The main differences are the clusters that are central to the query
26
Clustering Example:
Medical Text
• Query: “mastectomy” on a breast cancer collection
• 250 documents retrieved
• Summary of cluster themes (subjective):
– prophylactic mastectomy (preventative)
– prostheses and reconstruction
– conservative vs radical surgery
– side effects of surgery
– psychological effects of surgery
• The first two clusters found themes for which there was
no corresponding MESH category
Hearst, The Use of Categories and Clusters for Organizing Retrieval
Results, in Natural Language Information Retrieval, Kluwer, 1999
27
A Clustering Failure
• Query: “implant” and “prosthesis”
• Four clusters returned:
–
–
–
–
use of implants to administer radiation dosages
complications resulting from breast implants
other issues surrounding breast implants
other kinds of prostheses
• Reclustering clusters 2 and 3 does not find cohesive
subgroups
– An examination of the documents indicates that a valid
subdivision was possible
• type of surgical procedure
• risk factors
– This seems to happen when there are too many features in
common
– Perhaps a better clustering algorithm can help in this case
28
Clustering Algorithm Problems
• Doesn’t work well if data is too homogenous
or too heterogeneous
• Often is difficult to interpret quickly
– Automatically generated labels are unintuitive and
occur at different levels of description
• Often the top-level can be ok, but the
subsequent levels are very poor
• Need a better way to handle items that fall
into more than one cluster
29
Visualizing Clustering Results
• Use clustering to map the entire huge
multidimensional document space into a huge
number of small clusters.
• User dimension reduction and then project
these onto a 2D/3D graphical representation
30
Clustering Multi-Dimensional
Document Space
(image from Wise et al 95)
31
Clustering Multi-Dimensional
Document Space
(image from Wise et al 95)
32
33
(from Chen et al., JASIS 49(7))
Kohonen Feature Maps on Text
Is it useful?
• 4 Clustering Visualization Usability Studies
34
Clustering for Search Study 1
• This study compared
– a system with 2D graphical clusters
– a system with 3D graphical clusters
– a system that shows textual clusters
• Novice users
• Only textual clusters were helpful (and they
were difficult to use well)
Kleiboemer, Lazear, and Pedersen. Tailoring a retrieval system for naive
users. SDAIR’96
35
Clustering Study 2:
Kohonen Feature Maps
• Comparison: Kohonen Map and Yahoo
• Task:
– “Window shop” for interesting home page
– Repeat with other interface
• Results:
– Starting with map could repeat in Yahoo (8/11)
– Starting with Yahoo unable to repeat in map (2/14)
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
36
37
(Lin 92, Chen et al.
97)
Kohonen Feature Maps
Study 2 (cont.)
• Participants liked:
–
–
–
–
–
Correspondence of region size to # documents
Overview (but also wanted zoom)
Ease of jumping from one topic to another
Multiple routes to topics
Use of category and subcategory labels
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
38
Study 2 (cont.)
• Participants wanted:
–
–
–
–
–
–
–
–
–
•
hierarchical organization
other ordering of concepts (alphabetical)
integration of browsing and search
correspondence of color to meaning
more meaningful labels
labels at same level of abstraction
fit more labels in the given space
combined keyword and category search
multiple category assignment (sports+entertain)
(These can all be addressed with faceted hierarchical categories)
Chen, Houston, Sewell, Schatz, Internet Browsing and Searching: User Evaluations of Category Map and Concept
Space Techniques. JASIS 49(7): 582-603 (1998)
39
Clustering Study 3: NIRVE
Each rectangle is a cluster. Larger clusters closer to the “pole”. Similar clusters near one
another. Opening a cluster causes a projection that shows the titles.
40
Study 3
This study compared:
– 3D graphical clusters
– 2D graphical clusters
– textual clusters
• 15 participants, between-subject design
• Tasks
–
–
–
–
–
Locate a particular document
Locate and mark a particular document
Locate a previously marked document
Locate all clusters that discuss some topic
List more frequently represented topics
Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.
41
Study 3
• Results (time to locate targets)
–
–
–
–
Text clusters fastest
2D next
3D last
With practice (6 sessions) 2D neared text results; 3D still
slower
– Computer experts were just as fast with 3D
• Certain tasks equally fast with 2D & text
– Find particular cluster
– Find an already-marked document
• But anything involving text (e.g., find title) much faster
with text.
– Spatial location rotated, so users lost context
• Helpful viz features
– Color coding (helped text too)
– Relative vertical locations
Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, SIGIR ‘99.
42
Clustering Study 4
• Compared several
factors
• Findings:
– Topic effects
dominate (this is a
common finding)
– Strong difference in
results based on
spatial ability
– No difference
between librarians
and other people
– No evidence of
usefulness for the
cluster visualization
Aspect windows, 3-D visualizations, and indirect comparisons of information retrieval systems,
Swan, &Allan, SIGIR 1998.
43
Summary:
Visualizing for Search Using Clusters
• Huge 2D maps may be inappropriate focus for
information retrieval
– cannot see what the documents are about
– space is difficult to browse for IR purposes
– (tough to visualize abstract concepts)
• Perhaps more suited for pattern discovery and
gist-like overviews
44
How do people want to search
and browse images?
• Ethnographic studies of people who use
images intensely find:
– Find specific objects is easy
• Find images of the Empire State Building
– Browsing is hard
• In a usability study with architects, to our
surprise we found their response to an imagebrowsing interface mock-up was they wanted
to see more text (categories).
Elliott, A. (2001). "Flamenco Image Browser: Using Metadata to Improve Image Search During Architectural
45
Design," in the Proceedings of CHI 2001.
Clustering in Image Search
• Using Visual “Content”
– Extract color, texture, shape
•
•
•
•
QBIC (Flickner et al. ‘95)
Blobworld (Carson et al. ‘99)
Body Plans (Forsyth & Fleck ‘00)
Piction: images + text (Srihari et al. ’91 ’99)
– Two uses:
• Show a clustered similarity space
• Show those images similar to a selected one
46
K. Rodden, Evaluating Similarity-Based Visualisations as Interfaces for Image Browsing, PhD thesis, 2001
K. Rodden, W. Basalaj, D. Sinclair, and K. Wood, Does Organisation by Similarity Assist Image Browsing?, CHI 2001
47
K. Rodden, Evaluating Similarity-Based Visualisations as Interfaces for Image Browsing, PhD thesis, 2001
K. Rodden, W. Basalaj, D. Sinclair, and K. Wood, Does Organisation by Similarity Assist Image Browsing?, CHI 2001
48
K. Rodden, Evaluating Similarity-Based Visualisations as Interfaces for Image Browsing, PhD thesis, 2001
K. Rodden, W. Basalaj, D. Sinclair, and K. Wood, Does Organisation by Similarity Assist Image Browsing?, CHI 2001
49
Image Clustering Study Results
• Searching was faster with the random
arrangement
• Preference for the clustered arrangement was
not overwhelming stronger than random
– 2 out of 10 participants prefered random and 3 had
no preference
– Median satisfaction for clustered was 4.5 and for
random was 4.0
K. Rodden, Evaluating Similarity-Based Visualisations as Interfaces for Image Browsing, PhD thesis, 2001
K. Rodden, W. Basalaj, D. Sinclair, and K. Wood, Does Organisation by Similarity Assist Image Browsing?, CHI 2001
50
An Alternative
• In the Flamenco project, we have shown that
hierarchical faceted metadata, paired with a
good interface, is highly effective for browsing
image collections
– Flamenco.berkeley.edu
• (But that’s a different talk)
51
Study 5: Comparing Textual Cluster
Interfaces to Category Interfaces
• DynaCat system
• Decide on important question types in an
advance
– What are the adverse effects of drug D?
– What is the prognosis for treatment T?
• Make use of MeSH categories
• Retain only those types of categories known to
be useful for this type of query.
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99
52
DynaCat Interface
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing Retrieved Documents. AAAI-99
53
DynaCat Study
• Design
– Three queries
– 24 cancer patients
– Compared three interfaces
• ranked list, clusters, categories
• Results
– Participants strongly preferred categories
– Participants found more answers using categories
– Participants took same amount of time with all three
interfaces
Pratt, W., Hearst, M, and Fagan, L. A Knowledge-Based Approach to Organizing
Retrieved Documents. AAAI-99
54
Study 6: Categories vs. Lists
• One study found users prefered one level of categories
over lists, and were faster at finding answers
– Only 13 top-level categories shown
– Secondary-level categories not very accurate
• However, the queries appeared to be somewhat setup to
optimize the usefulness of the clusters
– Example:
•
•
•
•
Query word: “indian”
Task: find indian motorcyles
Query: “alaska”
Task: find yatching adventures in alaska
Chen, Dumais, Bringing order to the web: Automatically categorizing search results.
CHI 2000
55
What about Textual Displays of
Clusters?
• Text-based clustering is more promising
• Text-based clustering on the Web
– In the early days, Excite had a mockup on about 10
documents that pretended to do Scatter/Gather (when
it was called Architext)
• Quickly removed it and started providing standard search
– For a while NorthernLight had a clustering interface
• Didn’t really get anywhere
– The latest entry is Vivisimo
• Has a lot of problems
• BUT … there’s a new development from Vivisimo called
Clusty
• Seems to have much improved clustering and interface
56
An Analysis of Vivisimo
• Query: barcelona
• Query: dog pregnancy
57
58
59
60
An Analysis of Vivisimo
• Query: barcelona
– Hotels and Travel Guide are both at top level
– Also, Barcelona City
– But Travel Guide contains
• Hotels
• Spain, Spanish
– Not really helping to make useful distinctions
61
62
63
An Analysis of Vivisimo
• Query: pregnant dog
– What does the category pregnant mean here?
– Why does it have a subcategory of whelping, when
there is also a main category of whelping?
– And what the relationship to Pregnancy and Birth
– The pages shown don’t seem strongly related to one
another
• How to followup?
– There is a “find in clusters” box, but not very helpful
because no hints about which words might work
64
Search within Results
65
Then along came Clusty …
•
•
•
•
Announced less than a week ago
Produced by Vivisimo
Much better interface
Much better clusters
66
67
68
69
70
71
Clusty Improvements
• Labels tend to be more at the same level of description
• Subcategories are more cautious, reflecting groups of
very similar documents
– Do a better job of really showing subcategories
• Nice interface touches
– Better use of color for distinguishing
– Small icons are inviting
– Incorporation of encyclopedia results high up
• Search results are better
– (Not always – pregnant dog not much better)
– Using metasearch
– May be throwing out some docs to get more distribution in
the types of results found
– Looks like they are focusing on term proximity to get more
meaningful grouping
– Don’t allow very many results
72
73
74
75
Clusty Improvements
• Doing sense disambiguation for abbreviations like ACL
– However, no good followup for how to make use of this
– E.g., to search on ACL (meaning comp ling) plus some
other concepts
– On the other hand, using multiple terms is how most
disambiguation is done now
• ACL + disambiguation
• Jaguar + prey
– So not clear if there is a net benefit
• Trying to approximate faceted queries
– Under Jaguar query, for history, show both history of band
with history of car and video game
76
77
Analysis
• Is it really helping? Or are the categories now
too general and overlapping?
• The main effect seems to be that the search
results are better due to the metasearch and
term proximity
78
79
More Analysis
• Reflects the frequency of topics in the data
– So no discussion of nukes in the Spain categories
– No discussion of hotels in the North Korea categories
– Is this good or bad? It depends.
80
81
82
83
84
85
86
More analysis
• Adding a related term (Degas, Cezanne)
brings up relations between the two that don’t
appear with the general term Degas alone
– Impressionists
– Pissaro, in particular (should be under
impressionists)
• Also leads to messier results
87
Summary
• Grouping search results is desirable
– Often requested by lay users
– Very positive results for category interface
• However, getting good groups is difficult
– Two main approaches:
• Predefined category sets
• Automatically created clusters
• Furthermore, incorporation of groups into interfaces has not
been done well
– Notable Failures in Search Interfaces:
•
•
•
•
Visualization of clusters
Unintuitive clusters and labels
Clustering of images according to visual attributes
Poor incorporation of categories into search interfaces (not covered)
• Good news: improvements are happening
– Improved clustering that takes better account of good display
principles as seen in Clusty
– Flamenco: Flexible search and navigation via faceted category
hierarchies (not discussed here)
88
A Promising Direction:
Combining Categories and Clusters
• Mehran Sahami’s work on combing categories
and clusters
• Ray Larson’s work on clustering results of
categorization
• Would be interesting to cluster MeSH category
labels
– Work using UMLS to select subsets of MeSH has been
successful for analysis tasks
89
Conclusions
• In order to use clustering in an interface, must
pay attention to what makes the groupings
intuitive
• Much work has been too much of a “science
project”
• Up to now, clustering hasn’t succeeded on web
search results, but Clusty show marked
improvements that are promising
90
Thank you!
Marti Hearst
www.sims.berkeley.edu/~hearst
91
More Recent Attempts
• Analyzing retrieval results
– KartOO
– Grokker
http://www.kartoo.com/
http://www.groxis.com/service/grok
92
93
94
95
96
References
Chen, Houston, Sewell, and Schatz, JASIS 49(7)
Chen and Yu, Empirical studies of information visualization: a meta-analysis,
IJHCS 53(5),2000
Dumais, Cutrell, Cadiz, Jancke, Sarin and Robbins, Stuff I've Seen: A system
for personal information retrieval and re-use. SIGIR 2003.
Hearst, English, Sinha, Swearingen, Yee. Finding the Flow in Web Site Search,
CACM 45(9), 2002.
Hearst, User Interfaces and Visualization, Chapter 10 of Modern Information
Retrieval, Baeza-Yates and Rebeiro-Nato (Eds), Addison-Wesley 1999.
Johnson, Manning, Hagen, and Dorsey. Specialize Your Site's Search. Forrester
Research, (Dec. 2001), Cambridge, MA
97
References
Sebrechts, Cugini, Laskowski, Vasilakis and Miller, Visualization of search
results: a comparative evaluation of text, 2D, and 3D interfaces, SIGIR ‘99.
Swan and Allan, Aspect windows, 3-D visualizations, and indirect comparisons
of information retrieval systems, SIGIR 1998.
Yee, Swearingen, Li, Hearst, Faceted Metadata for Image Search and Browsing,
Proceedings of CHI 2003
98
Download