Some Thoughts on Tagging Marti Hearst UC Berkeley

advertisement
Some Thoughts on Tagging
Marti Hearst
UC Berkeley
Outline
 What are Tags?
 Organizing Tags for Navigation
 Facets and faceted navigation
 How to (semi)automatically create facet hierarchies
 What’s up with Tag Clouds?
Marti Hearst, MIT HCI ‘07
Social Tagging
 Metadata assignment without all the bother
 Spontaneous, easy, and tends towards single terms
 Usually used in the context of social media
Marti Hearst, MIT HCI ‘07
The Tagging Opportunity
 At last! Content-oriented metadata in the large!
 Attempts at metadata standardization always end
up with something like the Dublin Core
 author, date, publisher, … yaaawwwwnnn.
 I’ve always thought the action was in the subject
metadata, and have focused on how to navigate
collections given such data.
Marti Hearst, MIT HCI ‘07
The Tagging Opportunity
 Tags are inherently faceted !
 It is assumed that multiple labels will be assigned to
each item
 Rather than placing them into a folder
 Rather than placing them into a hierarchy
 Concepts are assigned from many different content
categories
 Helps alleviate the metadata wars:
 Allows for both splitters and lumpers


Is this a bird or a robin
Doesn’t matter, you can do both!
 Allows for differing organizational views


Does NASCAR go under sports or entertainment?
Doesn’t matter, you can do both!
Marti Hearst, MIT HCI ‘07
Tagging Problems
 Tags aren’t organized
 Thorough coverage isn’t controlled for
 The haphazard assignments lead to problems with


Synonymy
Homonymy
 See how this author attempts to compensate:
Marti Hearst, MIT HCI ‘07
Tagging Problems / Opportunities
 Some tags are fleeting in meaning or too personal
 toread todo
 Tags are not “professional”
 (I personally don’t think this matters)
 Great example from Trant:
 "Anecdotal evidence also shows that ‘professional’ cataloguers
find the basic description of visual elements surprisingly
difficult: a curator exhibited significant discomfort during this
description task. When asked what was wrong, he blurted out
"everything I know isn't in the picture".
Investigating social tagging and folksonomy in the art museum with steve.museum", J.
Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Investigating social tagging and folksonomy in the art museumwith steve.museum", J.
Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop
What about Browsing?
 I think tags need some organization
 Currently most tags are used as a direct index into items
 Click on tag, see items assigned to it, end of story
 Co-occurring tags are not shown
 Grouping into small hierarchies is not usually done
 del.icio.us now has bundles, but navigation isn’t good
 IBM’s dogear and RawSugar come the closest
 I think the solution is to organize tags into faceted
hierarchies and do browsing in the standard way
Marti Hearst, MIT HCI ‘07
Faceted Navigation and Flamenco
The Problem With Hierarchy
 Most things can be classified in more than one way.
 Most organizational systems do not handle this well.
 Example: Animal Classification
robin
penguin
otter
penguin
robin
salmon
wolf
cobra
bat
robin
bat
robin
bat
salmon
salmon
cobra
wolf
wolf
cobra
bat
otter
wolf
penguin
otter, seal
salmon
otter
penguin
seal
Skin
Covering
Locomotion
Diet
Marti Hearst, MIT HCI ‘07
The Problem with Hierarchy
 Inflexible
 Force the user to start with a particular category
 What if I don’t know the animal’s diet, but the
interface makes me start with that category?
 Wasteful
 Have to repeat combinations of categories
 Makes for extra clicking and extra coding
 Difficult to modify
 To add a new category type, must duplicate it
everywhere or change things everywhere
Marti Hearst, MIT HCI ‘07
The Problem With Hierarchy
start
Locomotion:
swim
Covering:
Diet:
fur
fly
scales
feathers
fur
run
scales
feathers
fur
scales
slither
feathers
…
fish
fish
fish
fish
fish
fish
fish
fish
fish
rodents
rodents
rodents
rodents
rodents
rodents
rodents
rodents
rodents
insects
otter
insects
salmon
insects
insects
bat
insects
insects
robin
insects
insects
inse
wolf
Marti Hearst, MIT HCI ‘07
The Idea of Facets
 Facets are a way of labeling data
 A kind of Metadata (data about data)
 Can be thought of as properties of items
 Facets vs. Categories
 Items are placed INTO a category system
 Multiple facet labels are ASSIGNED TO items
Marti Hearst, MIT HCI ‘07
The Idea of Facets
 Create INDEPENDENT categories (facets)
 Each facet has labels (sometimes arranged in a hierarchy)
 Assign labels from the facets to every item
 Example: recipe collection
Ingredient
Cooking
Method
Chicken
Stir-fry
Bell Pepper
Curry
Course
Cuisine
Main Course
Thai
Marti Hearst, MIT HCI ‘07
The Flamenco Interface
Fine Arts Museum Example
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Advantages of the Approach
 Systematically integrates search results:
 reflect the structure of the info architecture
 retain the context of previous interactions
 Gives users control and flexibility
 Over order of metadata use
 Over when to navigate vs. when to search
 Allows integration with advanced methods
 Collaborative filtering, predicting users’
preferences
Marti Hearst, MIT HCI ‘07
Advantages of Facets
 Can’t end up with empty results sets
 (except with keyword search)
 Helps avoid feelings of being lost.
 Easier to explore the collection.
 Helps users infer what kinds of things are in the
collection.
 Evokes a feeling of “browsing the shelves”
 Is preferred over standard search for collection
browsing in usability studies.
 (Interface must be designed properly)
Marti Hearst, MIT HCI ‘07
Related Work:
Automated Tag Organization
 Some efforts are on tag prediction:
 Mishne ’06:

Uses IR techniques to find the closest tagged documents, uses their
tags to assign new tags. Measures on how well new tags predicted
 Xu et al. ’06:

Use tags that have already been predicted for a document to predict
which to show to a new user who is tagging the document
 Some efforts on tag organization:
 Brooks & Montanez ’06:


Tries to see if tags can predict document clusters, which in my book
aren’t really categories
After clustering based on text they try to induce a tag hierarchy by
agglomerative clustering the text. Results not described in detail
 Begelman et al. ’06:

Use clustering and tag co-occurrence to find associated tags. Not
clear what the organizational goal is
Marti Hearst, MIT HCI ‘07
RawSugar
 A company/website that organizes tags from
blogs into facets
 They are undergoing a revamp, will move to
channels
 However, nothing published on this
 (presumably, patents filed)
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
Marti Hearst, MIT HCI ‘07
How to Create Facet Hierarchies?
Our Approach: Castanet
(Stoica & Hearst, to appear at HLT-NAACL ’07)
Example: Recipes
(3500 docs)
Marti Hearst, MIT HCI ‘07
Castanet Output (shown in Flamenco)
Marti Hearst, MIT HCI ‘07
Castanet Output (shown in Flamenco)
Marti Hearst, MIT HCI ‘07
Castanet Output (shown in Flamenco)
Marti Hearst, MIT HCI ‘07
Example: Biology Journal Titles
Castanet Output (shown in Flamenco)
Marti Hearst, MIT HCI ‘07
Castanet Algorithm
Select terms
Documents
 Leverage the structure of WordNet
Get
hypernym
paths
Build
tree
Compress
tree
WordNet
Divide into facets
Marti Hearst, MIT HCI ‘07
 Select well
distributed
terms from collection
red
blue
Select terms
Documents
1. Select Terms
Get
hypernym
paths
Build
tree
Comp.
tree
WordNet
Marti Hearst, MIT HCI ‘07
Select terms
Documents
2. Get Hypernym Path
Get
hypernym
paths
Build
tree
Comp.
tree
WordNet
abstraction
abstraction
property
property
visual property
visual property
color
color
chromatic color
chromatic color
red, redness
red
blue, blueness
blue
Marti Hearst, MIT HCI ‘07
Select terms
Documents
3. Build Tree
Get
hypernym
paths
Build
tree
Comp.
tree
WordNet
abstraction
abstraction
abstraction
property
property
property
visual property
visual property
visual property
color
color
color
chromatic color
chromatic color
chromatic color
red, redness
red
blue, blueness
blue
red, redness
red
blue, blueness
Marti Hearst, MIT HCI ‘07
blue
Select terms
Documents
4. Compress Tree
Get
hypernym
paths
Build
tree
Comp.
tree
WordNet
color
color
chromatic color
chromatic color
red, redness blue, blueness green, greenness
red
red
blue
blue
green
green
Marti Hearst, MIT HCI ‘07
Select terms
Documents
4. Compress Tree (cont.)
Get
hypernym
paths
Build
tree
Comp.
tree
WordNet
color
color
chromatic color
red
blue green
red
blue green
Marti Hearst, MIT HCI ‘07
5. Divide into Facets
Divide into facets
Marti Hearst, MIT HCI ‘07
Disambiguation
 Ambiguity in:
 Word senses
 Paths up the hypernym tree
2 paths for same word
Sense 1 for word “tuna”
organism, being
=> plant, flora
=> vascular plant
=> succulent
=> cactus
=> tuna
Sense 2 for word “tuna”
organism, being
=> fish
=> food fish
=> tuna
=> bony fish
=> spiny-finned fish
=> percoid fish
=> tuna
2 paths for
same sense
Marti Hearst, MIT HCI ‘07
How to Select the Right Senses and Paths?

First: build core tree


(1) Create paths for words with only one sense
(2) Use Domains

Wordnet has 212 Domains





medicine, mathematics, biology, chemistry, linguistics,
soccer, etc.
Automatically scan the collection to see which domains
apply
The user selects which of the suggested domains to use
or may add own
Paths for terms that match the selected domains are
added to the core tree
Then: add remaining terms to the core tree.
Marti Hearst, MIT HCI ‘07
Castanet Evaluation Method
 Information architects assessed the category
systems
 For each of 2 systems’ output:
 Examined and commented on top-level
 Examined and commented on two sub-levels
 Also compared to a baseline system
 Then comment on overall properties
 Meaningful?
 Systematic?
 Likely to use in your work?
Marti Hearst, MIT HCI ‘07
CastaNet Evaluation Results
 Results on recipes collection for
“Would you use this system in your work?”
 # “Yes in some cases” or “yes, definitely”:




Castanet:
LDA:
Subsumption:
Baseline:
29/34
0/18
6/16
25/34
 Average response to questions about quality
(4 = “strongly agree”)
Marti Hearst, MIT HCI ‘07
Will Castanet Work on Tags?
 Class project by Simon King and Jeff Towle, 2004
 1650 captions captured from mobile phones
 “Blocks with Grandpa”, “Weezer” , “A veterans day
tour of berkeley in front of south hall.”, “Bad photo”,
“Kitchen”, “Jgj ”
 Wanted to organize them.
 Use the CastaNet wordnet-based facet-hierarchy
creation algorithm
 by Stoica & Hearst, to appear at HLT-NAACL ’07
 Had to first remove proper names
Marti Hearst, MIT HCI ‘07
Example Photos & Captions (King & Towle)
very scary x-mas tree
chasing a cat in the dark
Hp presentation
My cat
Marti Hearst, MIT HCI ‘07

instrumentality, (112)
vehicle (26)
car (9)
bike (8)
vessel, watercraft (4)
mayflower (2)
ferry (1)
gig (1)
truck (3)
airplane (2)
device (20)
machine (7)
computer (4)
laptop (1)
sander (1)
container (16)
vessel (7)
bottle (5)
water_bottle (2)
jug (1)
pill_bottle (1)
bath (2)
bowl (1)
can (2)
backpack (1)
bumper (1)
empty (1)
salt_shaker (1)
furniture, piece of furniture, article of
furniture (12)
seat (8)
bench (2)
chair (2)
couch (2)
lounge (1)
bed (4)
desk (1)
Marti Hearst, MIT HCI ‘07
Research Questions for Tags & Search
 The role of interface on tag convergence
 There seems to be a big effect
 Would be really interesting to experiment with this
 Also, for facet grouping
 Anchor text vs. tags?
 How are they the same; how do they differ?
 How to get tag expertise?
 Right now, in many cases it is least-commondenominator
 ESP-game
Marti Hearst, MIT HCI ‘07
What’s up with
Tag Clouds?
What does a typical tag cloud look like?
Definition
Tag Cloud: A visual representation of social tags,
organized into paragraph-style layout, usually in
alphabetical order, where the relative size and
weight of the font for each tag corresponds to the
relative frequency of its use.
Marti Hearst, MIT HCI ‘07
Definition
Tag Cloud: A visual representation
of social
tags,
organized into paragraph-style layout,
usually in alphabetical order, where the relative
size and weight of the font for each tag
corresponds to the
relative frequency
of its use.
Marti Hearst, MIT HCI ‘07
flickr’s tag cloud
Marti Hearst, MIT HCI ‘07
del.icio.us
Marti Hearst, MIT HCI ‘07
del.icio.us
Marti Hearst, MIT HCI ‘07
blogs
Marti Hearst, MIT HCI ‘07
ma.gnolia.com
Marti Hearst, MIT HCI ‘07
NYTimes.com: tags from most frequent search terms
Marti Hearst, MIT HCI ‘07
IBM’s manyeyes project
Marti Hearst, MIT HCI ‘07
Amazon.com: Tag clouds on term frequenies
Marti Hearst, MIT HCI ‘07
Alternative: “Semantic” Layout

Improving TagClouds as Visual
Information
Retrieval
Interfaces,
Yusef HassanMonteroa, 1 and
Víctor HerreroSolana,
InSciT2006

Tags grouped by
“similarity,
based on
clustering
techniques and
co-occurrence
analysis”
Marti Hearst, MIT HCI ‘07
I was puzzled by the questions:
 What are designers and authors’ intentions in
creating or using tag clouds?
 How do they expect their readers to use them?
Marti Hearst, MIT HCI ‘07
On the positive side:
 Compact
 Draws the eye towards the most frequent
(important?) tags
 You get three dimensions simultaneously!
 alphabetical order
 size indicating importance
 the tags themselves
Marti Hearst, MIT HCI ‘07
Weirdnesses
 Initial encounters unencouraging
 Some reports from industry:
 Is the computer broken?
 Is this a ransom note?
Marti Hearst, MIT HCI ‘07
Weirdnesses
 Violates principles of perceptual design
 Longer words grab more attention than shorter
 Length of tag is conflated with its size
 White space implies meaning when there is none
intended
 Ascenders and descenders can also effect focus
 Eye moves around erratically, no flow or guides for
visual focus
 Proximity does not hold meaning
 The paragraph-style layout makes it quite arbitrary which
terms are above, below, and otherwise near which other terms
 Position within paragraph has saliency effects
 Visual comparisons difficult (see Tufte)
Marti Hearst, MIT HCI ‘07
Weirdnesses
 Meaningful
associations are
lost
 Where are the
different
country names
in this tag
clouds?
Marti Hearst, MIT HCI ‘07
Weirdnesses
Which operating systems are mentioned?
Marti Hearst, MIT HCI ‘07
Tag Cloud Study (1)
 First part compared tag cloud layouts

Independent Variables:





Tag size
Tag proximity to a large font
Tag quadrant position
Task: recall after a distractor task
13 participants; effects for size and
quadrant
 Second part compared tag clouds to lists


11 participants
Tested recognition (from a set of like words)
and impression formation

Alphabetical lists were best for the latter; no
differences for the former
Getting our head in the clouds: Toward evaluation studies of tagclouds, Walkyria
Rivadeneira Daniel M. Gruen Michael J. Muller David R. Millen, CHI 2007 note
Marti Hearst, MIT HCI ‘07
Tag Cloud Study (2)
 62 participants did a selection task


(find this country out of a list of 10
countries)
Independent Variables:










Horizontal list
Horizontal list, alphabetical
Vertical list
Vertical list, alphabetical
Spatial tag cloud
Spatial tag cloud, alphabetical
Order for non-alphabetical not described
Alphabetical fastest in all cases, lists faster
than spatial
May have used poor clouds (some people
couldn’t “see” larger font answers)
An Assessment of Tag Presentation Techniques; Martin Halvey, Mark
Keane, poster at WWW 2007.
Marti Hearst, MIT HCI
‘07
A Justifying Claim
 You get three dimensions simultaneously!
 alphabetical order
 size indicating importance
 the tags themselves
… but is this really a conscious design decision?
Marti Hearst, MIT HCI ‘07
Solution: Celebrity Interviews
 I was really confused about tag clouds, so I
decided to ask the people behind the puffs
 15 interviews, conducted at foocamp’06
 Several web 2.0 leaders
 5 more interviews at Google and Berkeley
Marti Hearst, MIT HCI ‘07
A Surprise
 7 interviewees DID NOT REALIZE that alphabetical
ordering is standard.
 2 of these people were in charge of such sites but had
had others write the code
 What was the answer given to “what order are
tags shown in?”




hadn’t thought about it
don’t think about tag clouds that way
random order
ordered by semantic similarity
 Suggests that perhaps people are too distracted by
the layout to use the alphabetical ordering
Marti Hearst, MIT HCI ‘07
Suggested main purposes:
 To signal the presence of tags on the site
 A good way to get the gist of the site
 An inviting and fun way to get people interacting
with the site
 To show what kinds of information are on the site
 Some of these said they are good for navigation
 Easy to implement
Marti Hearst, MIT HCI ‘07
Tag Clouds as Self-Descriptions
 Several noted that a tag cloud showing one’s own
tags can be evocative
 A good summary of what one is thinking and reading
about
 Useful for self-reflection
 Useful for showing others one’s thoughts
 One example: comparing someone else’s tags to own’s one to
see what you have in common, and what special interests
differentiate you
 Useful for tracking changes in friends’ lives

Oh, a new girl’s name has gotten larger; he must have a new
girlfriend!
Marti Hearst, MIT HCI ‘07
Tag Clouds as showing “Trends”
 Several people used this term, that tag clouds
show trends in someone’s behavior
 Trends are usually patterns across time, which are not
inherently visible in tag clouds
 To note a trend using a tag cloud, one must remember
what was there at an earlier time, and what changed
 tracking the girls’ names example
 This suggests a reason for the importance of the large
tags – draws one’s attention to what is big now versus
was used to be large.
 Suggests also why it doesn’t matter that you can’t see
small tags.
Marti Hearst, MIT HCI ‘07
New Perspective: Tag Clouds are Social!
 It’s not about the “information”!
 Not surprising in retrospect; tagging is in large
part about the social aspect
 Seems to work mainly when the tags can be seen by
many
 Even better when items can be tagged by many and
seen by many
 What does this mean though when tag clouds are
applied to non-social information?
Marti Hearst, MIT HCI ‘07
Follow-up Study
 Informed by the interview results, we search for, read, and
coded web pages that mentioned tag clouds.
 Looked at about 140 discussions
 Developed 21 codes
 Looked at another 90 discussions
 Used web queries: “tag clouds”, usability tag clouds, etc
 Sampled every 10th url




58% personal blogs
20% commercial blogs
10% commercial web pages
rest from group blogs and discussion lists
 Doesn’t tell us what people who don’t write about tag
clouds think.
Marti Hearst, MIT HCI ‘07
The Role of Popularity
 Popularity in the sense that tag clouds (and tagging) are
trendy and popular.
 Some people liked the visualization, but their popularity made
them less appealing


Famous post: “Tag clouds are the new mullets”
Led to self-consciousness about liking them
 Many complained about unaesthetic cloud designs
 Little consensus on if they are a fad or have staying power
 Popularity also in the sense of the large font size for more
popular tags
 Many people like the prominence of large tags, but several
commented on the tyranny of the popular
Marti Hearst, MIT HCI ‘07
The Role of Navigation
 Opinions vary
 Many simply state they are useful for navigation, but
with no support for this claim
 Some claim the compactness makes navigation easier
than a vertical list
 Some object to the varying font size on scannability
 Others object to the lack of organization
 Overall, there is no evidence either way that we could
find in the blog community
Marti Hearst, MIT HCI ‘07
Aesthetic Considerations
 Disagreement on the aesthetic and emotional
appeal, especially for lay users.
 Those who like them find them fun and appealing
 Those who don’t find them messy, strange, like a
ransom note
 Informal reports with first time users who are not
in the Web 2.0 community are negative
Marti Hearst, MIT HCI ‘07
Trends again
 As in the interviews, the benefit of “trends” was
mentioned many times.
 There is another sense of “trend” as “tendency or
inclination,” and this might be what people mean.
Marti Hearst, MIT HCI ‘07
Summary of Stated Reasons for Tag Clouds
(Note: some refuted by studies)
Marti Hearst, MIT HCI ‘07
Tag Clouds as Social Information
 An emphasis that tag clouds are meant to show
human behavior.
 We found reports of people commenting on other
uses that were invalid because they did not reflect
live user input:
 One blogger noted the incongruity of an online library
using keyword frequencies in a tag cloud rather than
having it reflect patron’s usage of the collection.
 An online community noticed one site’s cloud didn’t
change over time and realized the sizes were decided
by marketing. This was greated with derision.
Marti Hearst, MIT HCI ‘07
Implications
 Assume tag clouds are meant to reflect human
mental activity (individual or group)
 Then what might seem design flaws from an
information conveyance perspective may not be
 A large part of the appeal is the fun and liveliness.
 The informality of the layout reflects the human
activity beneath it.
Marti Hearst, MIT HCI ‘07
Judith Donath, CACM 45(4), 2002
“Traditional data visualization focuses on making
abstract numbers and relationships into concrete,
spatialized images; the goal is to highlight important
patterns while also representing the data accurately.
This is a fine approach for social scientists studying the
dynamics of online interactions. Yet for our purpose it is
also important that the visualization evoke an
appropriate intuitive response representing the feel of
the conversation as well as depicting its dynamics”
Marti Hearst, MIT HCI ‘07
Judith Donath, CACM 45(4), 2002
“[O]ne argument for deliberately designing evocative
visualizations for online social environments is the
existing default textual interfaces are themselves
evocative, they simply evoke an aura of business-like
monotony rather than the lively social scene that
actually exists.''
Marti Hearst, MIT HCI ‘07
Tag Cloud Alternatives
Provided by Martin Wattenberg
Marti Hearst, MIT HCI ‘07
Conclusions
 Social tagging is, in my view, a terrific way to get good
content metadata.
 I think automated techniques can do a lot to help clean
them up and organize them.
 They are an inherently social phenomenon, part of
social media, which is a really exciting area.
 The socialness of social media can yield surprises, like
tag clouds.
Marti Hearst, MIT HCI ‘07
Download