Measuring Information Architecture

advertisement
Faceted Metadata in
Image Search & Browsing
Using Words to Browse a Thousand Images
Ka-Ping Yee, Kirsten Swearingen, Kevin Li, Marti Hearst
Group for User Interface Research
UC Berkeley
CHI 2003
Research funded by:
NSF CAREER Grant IIS-9984741
IBM Faculty Fellowship
Outline
• How do people search and browse for images?
• Current approaches:
– Keywords
– Spatial similarity
• Our approach:
– Hierarchical Faceted Metadata
– Very careful UI design and testing
• Usability Study
• Conclusions
M. Hearst
Faceted Metadata in Search
How do people want to search and
browse images?
Ethnographic studies of people who use
images intensely:
– Finding specific objects is easy
– Find images of the Empire State Building
– Browsing is difficult
– People want to use rich descriptions.
M. Hearst
Faceted Metadata in Search
Ethnographic Study
• Markkula & Sormunen ’00
– Journalists and newspaper editors
– Choosing photos from a digital archive
• Searching for specific objects is trivial
• Stressed a need for browsing
• Photos need to deal with themes, places, types
of objects, views
– Had access to a powerful interface, but it
had 40 entry forms and was generally hard
to use; no one used it.
M. Hearst
Faceted Metadata in Search
Markkula & Sormunen ’00
M. Hearst
Faceted Metadata in Search
Query Study
• Armitage & Enser ’97
– Analyzed 1,749 queries submitted to 7
image and film archives
– Classified queries into a 3x4 facet matrix
• Rio Carnivals: Geo Location x Kind of Event
– Concluded that users want to search
images according to combinations of
topical categories.
M. Hearst
Faceted Metadata in Search
Ethnographic Study
• Ame Elliot ’02
– Architects
• Common activities:
– Use images for inspiration
• Browsing during early stages of design
– Collage making, sketching, pinning up on walls
• This is different than illustrating powerpoint
• Maintain sketchbooks & shoeboxes of images
– Young professionals have ~500, older ~5k
• No formal organization scheme
– None of 10 architects interviewed about their image
collections used indexes
• Do not like to use computers to find images
M. Hearst
Faceted Metadata in Search
Current Approaches to Image Search
• Keyword based
– WebSeek (Smith and Jain ’97)
– Commercial web image search systems
– Commercial image vendors (Corbis, Getty)
– Museum web sites
M. Hearst
Faceted Metadata in Search
Current Approaches to Image Search
• Using Visual “Content”
– Extract color, texture, shape
• QBIC (Flickner et al. ‘95)
• Blobworld (Carson et al. ‘99)
• Piction: images + text (Srihari et al. ’91 ’99)
– Two uses:
• Show a clustered similarity space
• Show those images similar to a selected one
– Usability studies:
• Rodden et al.: a series of studies
• Clusters don’t work; showing textual labels is promising.
M. Hearst
Faceted Metadata in Search
Rodden et al., CHI 2001
M. Hearst
Faceted Metadata in Search
Rodden et al., CHI 2001
M. Hearst
Faceted Metadata in Search
Rodden et al., CHI 2001
M. Hearst
Faceted Metadata in Search
How Best to Support Browsing?
• To support serendipity, want to view
images that are related along multiple
dimensions.
• But clusters are not comprehensible.
• Instead, allow users to “steer” through
the multi-dimensional category space in
a flexible manner.
M. Hearst
Faceted Metadata in Search
Some Challenges
• Users don’t like new search interfaces.
• How to show lots more information
without overwhelming or confusing?
M. Hearst
Faceted Metadata in Search
Our Approach
• Integrate the search seamlessly into the
information architecture.
– Use proper HCI methodologies.
• Use faceted metadata:
– More flexible than canned hyperlinks
– Less complex than full search
– Help users see where to go next and
return to what happened previously
M. Hearst
Faceted Metadata in Search
Metadata: data about data
Facets: orthogonal categories
GeoRegion
M. Hearst
+ Time/Date
+
Topic
Faceted Metadata in Search
Hierarchical Faceted Metadata
Example: Biological Subject Headings
1. Anatomy [A]
2. Organisms [B]
3. Diseases [C]
4. Chemicals and Drugs [D]
5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]
6. Psychiatry and Psychology [F]
7. Biological Sciences [G]
8. Physical Sciences [H]
9. Anthropology, Education, Sociology and Social Phenomena [I]
10. Technology and Food and Beverages [J]
11. Humanities [K]
12. Information Science [L]
13. Persons [M]
14. Health Care [N]
15. Geographic Locations [Z]
M. Hearst
Faceted Metadata in Search
Hierarchical Faced Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
9. [I]
10. [J]
11. [K]
12. [L]
13. [M]
M. Hearst
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
9. [I]
10. [J]
11. [K]
12. [L]
13. [M]
M. Hearst
Abdomen [A01.047]
Back [A01.176]
Breast [A01.236]
Extremities [A01.378]
Head [A01.456]
Neck [A01.598]
….
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
2. [B]
Musculoskeletal System [A02]
3. [C]
Digestive System [A03]
4. [D]
Respiratory System [A04]
5. [E]
Urogenital System [A05]
6. [F]
……
7. [G]
8. Physical Sciences [H]
Electronics
9. [I]
Astronomy
10. [J]
Nature
11. [K]
Time
12. [L]
Weights and Measures
13. [M]
….
M. Hearst
Abdomen [A01.047]
Back [A01.176]
Breast [A01.236]
Extremities [A01.378]
Head [A01.456]
Neck [A01.598]
….
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
Abdomen [A01.047]
2. [B]
Musculoskeletal System [A02]
Back [A01.176]
3. [C]
Digestive System [A03]
Breast [A01.236]
4. [D]
Respiratory System [A04]
Extremities [A01.378]
5. [E]
Urogenital System [A05]
Head [A01.456]
6. [F]
……
Neck [A01.598]
7. [G]
….
8. Physical Sciences [H]
Electronics
Amplifiers
9. [I]
Astronomy
Electronics, Medical
10. [J]
Nature
Transducers
11. [K]
Time
12. [L]
Weights and Measures
13. [M]
….
M. Hearst
Faceted Metadata in Search
Hierarchical Faceted Metadata
1. Anatomy [A]
Body Regions [A01]
Abdomen [A01.047]
2. [B]
Musculoskeletal System [A02]
Back [A01.176]
3. [C]
Digestive System [A03]
Breast [A01.236]
4. [D]
Respiratory System [A04]
Extremities [A01.378]
5. [E]
Urogenital System [A05]
Head [A01.456]
6. [F]
……
Neck [A01.598]
7. [G]
….
8. Physical Sciences [H]
Electronics
Amplifiers
9. [I]
Astronomy
Electronics, Medical
10. [J]
Nature
Transducers
11. [K]
Time
12. [L]
Weights and Measures
Calibration
13. [M]
….
Metric System
Reference Standard
M. Hearst
Faceted Metadata in Search
The Interface Design
• Chess metaphor
– Opening
– Middle game
– End game
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
The Interface Design
• Tightly Integrated Search
• Supports Expand as well as Refine
• Dynamically Generated Pages
– Paths can be taken in any order
• Consistent Color Coding
• Consistent Backup and Bookmarking
• Standard HTML
M. Hearst
Faceted Metadata in Search
What is Tricky About This?
• It is easy to do it poorly
– Yahoo directory structure
• It is hard to be not overwhelming
– Most users prefer simplicity unless
complexity really makes a difference
• It is hard to “make it flow”
– Can it feel like “browsing the shelves”?
M. Hearst
Faceted Metadata in Search
Project History
• Identify Target Population
– Architects, city planners
• Needs assessment.
– Interviewed architects and conducted contextual inquiries.
• Lo-fi prototyping.
– Showed paper prototype to 3 professional architects.
• Design / Study Round 1.
– Simple interactive version. Users liked metadata idea.
• Design / Study Round 2:
– Developed 4 different detailed versions; evaluated with 11
architects; results somewhat positive but many problems
identified. Matrix emerged as a good idea.
• Metadata revision.
– Compressed and simplified the metadata hierarchies
M. Hearst
Faceted Metadata in Search
Project History
• Design / Study Round 3.
– New version based on results of Round 2
– Highly positive user response
• Identified new user population/collection
– Students and scholars of art history
– Fine arts images
• Study Round 4
– Compare the metadata system to a strong,
representative baseline
M. Hearst
Faceted Metadata in Search
New Usability Study
• Participants & Collection
– 32 Art History Students
– ~35,000 images from SF Fine Arts Museum
• Study Design
– Within-subjects
• Each participant sees both interfaces
• Balanced in terms of order and tasks
– Participants assess each interface after use
– Afterwards they compare them directly
• Data recorded in behavior logs, server logs, papersurveys; one or two experienced testers at each trial.
• Used 9 point Likert scales.
• Session took about 1.5 hours; pay was $15/hour
M. Hearst
Faceted Metadata in Search
The Baseline System
• Floogle
• Take the best of the existing keywordbased image search systems
M. Hearst
Faceted Metadata in Search
Comparison of Common Image Search Systems
M. Hearst
System
Collection # Results Catego #
/page
ries?
Familiar
Google
Web
20
No
27
AltaVista
Web
15
No
8
Corbis
Photos
9-36
No
8
Getty
Photos,
Art
12-90
Yes
6
MS Office
Photos,
Clip art
6-100
Yes
N/A
Thinker
Fine arts
images
10
Yes
4
BASELINE Fine arts
images
40
Yes
N/A
Faceted Metadata in Search
sword
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
M. Hearst
Faceted Metadata in Search
Evaluation Quandary
• How to assess the success of browsing?
– Timing is usually not a good indicator
– People often spend longer when browsing
is going well.
• Not the case for directed search
– Can look for comprehensiveness and
correctness (precision and recall) …
– … But subjective measures seem to be
most important here.
M. Hearst
Faceted Metadata in Search
Hypotheses
• We attempted to design tasks to test the
following hypotheses:
– Participants will experience greater search
satisfaction, feel greater confidence in the results,
produce higher recall, and encounter fewer dead
ends using FC over Baseline
– FC will perceived to be more useful and flexible
than Baseline
– Participants will feel more familiar with the
contents of the collection after using FC
– Participants will use FC to create multi-faceted
queries
M. Hearst
Faceted Metadata in Search
Four Types of Tasks
– Unstructured (3): Search for images of interest
– Structured Task (11-14): Gather materials for an
art history essay on a given topic, e.g.
• Find all woodcuts created in the US
• Choose the decade with the most
• Select one of the artists in this periods and show all of
their woodcuts
• Choose a subject depicted in these works and find
another artist who treated the same subject in a different
way.
– Structured Task (10): compare related images
• Find images by artists from 2 different countries that
depict conflict between groups.
– Unstructured (5): search for images of interest
M. Hearst
Faceted Metadata in Search
Other Points
• Participants were NOT walked through the interfaces.
• The wording of Task 2 reflected the metadata; not the
case for Task 3
• Within tasks, queries were not different in difficulty
(t’s<1.7, p >0.05 according to post-task questions)
• Flamenco is and order of magnitude slower than
Floogle on average.
– In task 2 users were allowed 3 more minutes in FC than in
Baseline.
– Time spent in tasks 2 and 3 were significantly longer in FC
(about 2 min more).
M. Hearst
Faceted Metadata in Search
Results
• Participants felt significantly more confident
they had found all relevant images using FC
(Task 2: t(62)=2.18, p<.05; Task 3: t(62)=2.03, p<.05)
• Participants felt significantly more satisfied
with the results
(Task 2: t(62)=3.78, p<.001; Task 3: t(62)=2.03, p<.05)
• Recall scores:
M. Hearst
– Task2a: In Baseline 57% of participants found all
relevant results, in FC 81% found all.
– Task 2b: In Baseline 21% found all relevant, in FC
77% found all.
Faceted Metadata in Search
Post-Interface Assessments
M. Hearst
All significant at p<.05 except simple and overwhelming
Faceted Metadata in Search
Perceived Uses of Interfaces
What is interface useful for?
9.00
7.97
7.91
8.00
7.00
6.64
6.44
6.00
5.47
6.16
5.91
4.91
5.00
Baseline
SHASTA
DENALI
4.00
FC
3.00
2.00
1.00
0.00
Useful for my
coursework
M. Hearst
Useful for
exploring an
unfamiliar
collection
Useful for finding Useful for seeing
a particular image relationships b/w
images
Faceted Metadata in Search
Post-Test Comparison
Which Interface Preferable For: Baseline FC
Find images of roses
Find all works from a given period
15
16
2
30
Find pictures by 2 artists in same media
1
29
4
28
8
23
6
24
28
3
1
31
2
29
M. Hearst
Faceted Metadata in Search
Post-Test Comparison
Which Interface Preferable For: Baseline FC
Find images of roses
Find all works from a given period
15
16
2
30
Find pictures by 2 artists in same media
1
29
4
28
8
23
6
24
28
3
1
31
2
29
Overall Assessment:
More useful for your tasks
Easiest to use
Most flexible
More likely to result in dead ends
Helped you learn more
Overall preference
M. Hearst
Faceted Metadata in Search
Facet Usage
• Facets driven largely by task content
– Multiple facets 45% of time in structured tasks
• For unstructured tasks,
–
–
–
–
–
Artists (17%)
Date (15%)
Location (15%)
Others ranged from 5-12%
Multiple facets 19% of time
• From end game, expansion from
M. Hearst
– Artists (39%)
– Media (29%)
– Shapes (19%)
Faceted Metadata in Search
Qualitative Observations
• Baseline:
– Simplicity, similarity to Google a plus
– Also noted the usefulness of the category links
• FC:
– Starting page “well-organized”, gave “ideas for what to search
for”
– Query previews were commented on explicitly by 9 participants
– Commented on matrix prompting where to go next
• 3 were confused about what the matrix shows
– Generally liked the grouping and organizing
– End game links seemed useful; 9 explicitly remarked positively
on the guidance provided there.
– Often get requests to use the system in future
M. Hearst
Faceted Metadata in Search
Study Results Summary
• Strongly positive results for the faceted
metadata interface.
• Moderate use of multiple facets.
• Strong preference over the current state of
the art.
– Chair of Architecture Dept: “It felt like I was
browsing the shelves!”
– This kind of enthusiasm is not seen in similaritybased image search interfaces.
• Hypotheses are supported.
M. Hearst
Faceted Metadata in Search
Implementation
• All open source code
– Mysql database
– Python web server (Webkit)
– Python code
– Lucene search engine (java)
M. Hearst
Faceted Metadata in Search
Metadata Availability
• Many collections already have rich
metadata associated with them.
• Automated methods are improving.
• This tool may be helpful for resolving
metadata creation wars.
M. Hearst
Faceted Metadata in Search
Summary
• Usability studies done on 3 collections:
– Recipes: 13,000 items
– Architecture Images: 40,000 items
– Fine Arts Images: 35,000 items
• Conclusions:
– Users like and are successful with the dynamic
faceted hierarchical metadata, especially for
browsing tasks
– Very positive results, in contrast with studies on
earlier iterations
– Note: it seems you have to care about the
contents of the collection to like the interface
M. Hearst
Faceted Metadata in Search
Other Domains
• Applying this to
– Text
• Tobacco Documents Archives
• Medline biomedical texts
– Products/Catalogs
• Don’t have a collection; would like one
M. Hearst
Faceted Metadata in Search
Future Work
• What about information visualization?
• How to integrate with relevance
feedback (more like this)?
• How to incorporate user preferences
and past behavior?
• How to combine facets to reflect tasks?
M. Hearst
Faceted Metadata in Search
Try the Demo:
flamenco.berkeley.edu
Thanks to:
Andrea Sahli
Rashmi Sinha
NSF CAREER Grant IIS-9984741
IBM Faculty Fellowship
65
Download