ppt - The BioText Project

advertisement
Text, Tags and Thumbnails:
Latest Trends in Bioscience Literature Search
Marti A. Hearst
Associate Professor
UC Berkeley
Special Libraries Association
Pharmaceutical & Health Technologies Division Spring Meeting
March 22, 2009
Some research reported here supported by
NSF DBI-0317510 and a gift from Genentech
Tutorial Outline
•
•
Fundamentals of User Interface Design
Search Interfaces


Faceted navigation
Specific to bioscience literature


•
Term suggestions
Showing figures in search results
Social Tagging
Marti Hearst
SLA’09 Spring Meeting
Let’s get acquainted
Fundamentals of UI Design
Principles of HCI
(Human-Computer Interaction)
• Design for the user



•
Make use of cognitive principles where available

Marti Hearst
AKA: user-centered design
Not for the designers
Not for the system
Important guidelines for search:

Reduce memory load

Speak the user’s language

Provide helpful feedback

Respect perceptual principles
SLA’09 Spring Meeting
What makes for a good/bad
user experience?
Your examples?
My (subtle) example
Paying my taxes online, March 2009.
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Problems
•
Biggest problem: I will pay taxes for the wrong year,
requiring a very costly repair.

•
They have a special option in the phone tree for this
error (proof of a usability problem!)
Other problems:


Marti Hearst
What does this mean? What do I do?
Why are there so few tax forms to choose from, and
what the heck are they?
SLA’09 Spring Meeting
Problems
•
Biggest problem: I will pay taxes for the
wrong year, requiring a very costly repair.

They have a special option in the phone tree
for this error (proof of a usability problem!)


Marti Hearst
Yes; I made this error last year and it still isn’t fixed!
Violates: avoid errors, provide good defaults
SLA’09 Spring Meeting
Problems
•
•
What does this mean? What do I do?
Violates:


Marti Hearst
Speak the user’s language
Provide help.
SLA’09 Spring Meeting
Problems
•
•
•
•
Which form am I selecting?
What if my choice is missing?
Entering in a form number doesn’t work.
Violates:


Marti Hearst
Provide useful labels
Match the user’s task
SLA’09 Spring Meeting
User-Centered Design
•
Needs assessment


•
Iterate between


Marti Hearst
Find out

who users are

what their goals are

what tasks they need to perform
Task Analysis

Characterize what steps users need to take

Create scenarios of actual use

Decide which users and tasks to support
Designing
Evaluating
SLA’09 Spring Meeting
User Interface Design is an Iterative Process
Design
Evaluate
Prototype
Marti Hearst
SLA’09 Spring Meeting
Rapid Prototyping
•
•
Build a mock-up of design
Low fidelity techniques



Marti Hearst
paper sketches
cut, copy, paste
video segments
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Telebears
example
Marti Hearst
SLA’09 Spring Meeting
Why Do Prototypes?
•
•
•
•
Get feedback on the design faster
Experiment with alternative designs
Fix problems before code is written
Keep the design centered on the user
Marti Hearst
SLA’09 Spring Meeting
Evaluation
•
Test with real users (participants)

•
Formally or Informally
“Discount” techniques



Marti Hearst
Potential users interact with paper computer
Expert evaluations (heuristic evaluation)
Expert walkthroughs
SLA’09 Spring Meeting
Design Guidelines
•
What are they?


•
Examples:



•
Provide informative feedback
Support recognition over recall
Provide for user control and understanding
Heuristic Evaluation:

Marti Hearst
Rules of thumb for how to design
Bloopers book has many recommendations
An expert measures the mock-ups against
well-known design guidelines.
SLA’09 Spring Meeting
Results of Using Heuristic Evaluation
•
Single evaluator achieves poor results



only finds 35% of usability problems
5 evaluators find ~ 75% of usability problems
why not more evaluators? 10? 20?


Marti Hearst
adding evaluators costs more
adding more evaluators doesn’t increase the
number of unique problems found
SLA’09 Spring Meeting
Decreasing Returns
problems found
•
•
•
Marti Hearst
benefits / cost
(from Nielsen)
Caveat: these graphs are for a specific example
This is a controversial point.
SLA’09 Spring Meeting
Affordances
•
The perceived properties of an object that
determine how it can be used. (Don Norman)


•
Some affordances are obvious, some learned


•
Knobs are for turning.
Buttons are for pushing.
Glass can be seen through.
Glass breaks easily.
Sometimes visual plus physical feedback

Marti Hearst
Floppy disk example

Rectangular – can’t insert sideways

Tabs on the disk prevent the drive from letting it be
fully inserted backwards
SLA’09 Spring Meeting
Affordances of a Teapot?
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Affordances of an iPod?
Marti Hearst
SLA’09 Spring Meeting
Small Details Matter
•
UIs for search especially require great care in
small details


•
How and where to place things is important


Marti Hearst
In part due to the text-heavy nature of search
A tension between more information and
introducing clutter
People tend to scan or skim
Only a small percentage reads instructions
SLA’09 Spring Meeting
Small Details Matter
Example:

In an earlier version of the Google
Spellchecker, people didn’t always see the
suggested correction

Used a long sentence at the top of the page:
“If you didn’t find what you were looking
for …”


Marti Hearst
People complained they got results, but not
the right results.
In reality, the spellchecker had suggested an
appropriate correction.
SLA’09 Spring Meeting
Small Details Matter
•
The fix:

Analyzed logs, saw people didn’t see the correction:





•
clicked on first search result,
didn’t find what they were looking for (came right back
to the search page
scrolled to the bottom of the page, did not find anything
and then complained directly to Google
Solution was to repeat the spelling suggestion at the
bottom of the page.
More adjustments:

The message is shorter, and different on the top vs. the
bottom
Marti Hearst
Interview with Marissa Mayer by Mark Hurst:
http://www.goodexperience.com/columns/02/1015google.html
SLA’09 Spring Meeting
Time for a Break!
Searching Bioscience Literature
Double Exponential Growth in
Bioscience Journal Articles
From Hunter & Cohen, Molecular Cell 21, 2006
Marti Hearst
SLA’09 Spring Meeting
BioText Project Goals
•
Provide flexible, useful, appealing search for
bioscientists.
•
Focus on:



•
Supported by the NSF

Marti Hearst
Full text journal articles
New language analysis algorithms
New search interfaces
http://biotext.berkeley.edu
SLA’09 Spring Meeting
The Importance of Figures and Captions
•
Observations of biologists’ reading habits:

It has often observed that biologists focus on
figures+captions along with title and abstract.
•
KDD Cup 2002

The objective was to extract only the papers that included
experimental results regarding expression of gene products
and

to identify the genes and products for which experimental
results were provided.

ClearForest+Celera did well in part by focusing on figure
captions, which contain critical experimental evidence.
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Our Idea
•
Make a full text search engine for journal
articles that focuses on showing figures
•
Make it possible to search over caption text
(and text that refers to captions)
•
Try to group the figures intelligently
Marti Hearst
SLA’09 Spring Meeting
Developing the
BioText Search Interface
•
Main idea: a search interface that meets the unique
needs of bioscientists.
•
Hypothesis: the articles’ figures should be exposed in
the interface.
•
Process:





•
Did interviews, designed mock-up
Made an initial prototype
Did a pilot study
Used these results to redesign
Evaluated the new design
Results: highly positive responses.
Marti Hearst
SLA’09 Spring Meeting
Related Work
•
Cohen & Murphy:


•
Yu et al.

•
Parsed structure of image captions
Extract facts about subcellular localization
Created a small image taxonomy; classified images
according to these with SVMs
Yu & Lee:


Marti Hearst
BioEx: Link sentences from an abstract to images in
the same paper; show those when displaying a paper.
Not focused on a full search interface; can’t search
over caption text.
SLA’09 Spring Meeting
Pilot Usability Study
•
Primary Goal:

•
Determine whether biological researchers
would find the idea of caption search and
figure display to be useful or not.
Secondary Goal:

Marti Hearst
Should caption search and figure display be
useful, how best to support these features in
the interface.
SLA’09 Spring Meeting
Method
•
Told participants we were evaluating a new search
interface

•
(tip: don’t say “our” interface)
Asked them to use each design on their own queries

(order of presentation was varied)
•
Had them fill out a questionnaire after each
interface session
•
Also had open-ended discussions about the designs
Marti Hearst
SLA’09 Spring Meeting
Participants
Marti Hearst
SLA’09 Spring Meeting
Captions + Figure View
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Captions + Figure & Thumbnails
Marti Hearst
SLA’09 Spring Meeting
Results
Captions + Figure View
7 = strongly agree
1 = strong disagree
participant #
Marti Hearst
participant #
SLA’09 Spring Meeting
Results
•
7 out of 8 said they would want to use either CF or
CFT in their bioscience journal article searches

The 8th thought figures would not be useful in
their tasks
•
Many participants noted that caption search would
be better for some tasks than others
•
Two of the participants preferred CFT to CF; the rest
thought CFT was too busy.

Best to show all the thumbnails that correspond
to a given article after full text search

Best to show only the figure that corresponds to
the caption in the caption search view
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Results, cont.
•
All four participants who saw the Grid view
liked it, but noted that the metadata shown
was insufficient;
•
If it were changed to include title and other
bibliographic data, 2 of the 4 who saw Grid
said they would prefer that view over the CF
view.
Marti Hearst
SLA’09 Spring Meeting
Current Design
http://biosearch.berkeley.edu
Current Design
•
Indexes the PubMedCentral open access journal
article collection, with more than:

300 journals



Marti Hearst
129,000 articles
247,000 figures
104,000 tables
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Second Study
•
•
Modified, improved interface
20 participants

6 grad students, 6 postdocs, 1 faculty, 7 other


Marti Hearst
Cell or molecular biology, genetics or genomics,
biochemistry, evolutionary biology,
bioinformatics.
All use PubMed, most as primary tool
SLA’09 Spring Meeting
Second Study
•
Procedure:



•
Session lasted ~1 hour
Participants were shown the interface and its
views, and then asked to use it and respond.
They then assessed the interfaces explicitly.
Measures:


Marti Hearst
Focus on subjective responses.
Intent to use is a reliable indicator of actual
usage. (Venkatesh & Morris 03, Sun & Zhang 06)
SLA’09 Spring Meeting
How Likely to Use Interface?
Marti Hearst
SLA’09 Spring Meeting
Full Text View: Favorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Full Text View: Unfavorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Figure Caption Views:
Favorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Figure Caption Views:
Unfavorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Table View: Favorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Table View: Unfavorable Aspects
Marti Hearst
SLA’09 Spring Meeting
Showing Related Terms in
Bioscience Literature Search
Needs assessment and low-fi evaluation
First Questionnaire
•
General information about how they search and what
related information they want to see.
•
38 participants


Marti Hearst
22 grad students, 6 postdocs, 5 faculty, 5
other
Systems biology, bioinformatics, genomics,
biochemistry, cellular and evolutionary
biology, microbiology, physiology, …
SLA’09 Spring Meeting
Participants’ Characteristics
Marti Hearst
SLA’09 Spring Meeting
Results
Related Information Type
Avg rating
# selecting 1 or 2
Gene’s Synonyms
4.4
Gene’s Synonyms refined by organism
Gene’s Homologs
Genes from same family: parents
Genes from same family: children
Genes from same family: siblings
2
4.0
3.7
3.4
3.6
3.2
Genes this gene interacts with
3.7
Diseases this gene is associated with
Chemicals/drugs this gene is associated with
Localization information for this gene
1
Marti Hearst
(Do NOT
want this)
2
3
(Neutral)
2
5
7
4
9
4
3.4
3.2
3.7
4
6
8
3
5
SLA’09 Spring
Meetingthis)
(REALLY
want
Second Questionnaire
•
Evaluating 4 designs for gene/protein name
suggestions
•
19 participants


Marti Hearst
4 grad students, 7 postdocs, 3 faculty, 5 other
Wide range of specializations
SLA’09 Spring Meeting
Design 1: Baseline
Marti Hearst
SLA’09 Spring Meeting
Design 2: Links
Marti Hearst
SLA’09 Spring Meeting
Design 3: Checkboxes
Marti Hearst
SLA’09 Spring Meeting
Design 4: Grouped Links
Marti Hearst
SLA’09 Spring Meeting
Results
Design
3
Participants who rated
design 1st or 2nd
Average rating
(1=low, 4=high)
#
%
15
79
3.3
10
53
2.6
9
47
2.5
0
0
1.6
(checkboxes)
4
(grouped links)
2
(links)
1
(baseline)
Marti Hearst
SLA’09 Spring Meeting
Results: More Detail
•
Strong desire for the search system to suggest
information closely related to gene/protein names.
•
•
Some interest in less closely related information .
•
Most participants want to see organism names in
conjunction with gene names.
A majority of participants prefer to see term
suggestions grouped by type (synonyms, homologs,
etc).
Marti Hearst
SLA’09 Spring Meeting
Results: More Detail
•
Split in preference between single-click hyperlink
interaction (categories or single terms) and
checkbox-style interaction.
•
The majority of participants prefers to have the
option to chose either individual names or whole
groups with one click.
•
Split in preference between the system suggesting
only names that it is highly confident are related
and include names that it is less confident about
under a “show more” link.
Marti Hearst
SLA’09 Spring Meeting
Summary: BioText Search Studies
•
Nearly all participants strongly desire


•
Impediments to adoption


•
Full text search
Figure display in search results
Needs to index all articles
Needs to be in the primary search tool(s)
Participants also want to see term
suggestions that are closely related to their
query.
Marti Hearst
SLA’09 Spring Meeting
Time for a Break!
More on Search Interfaces
Useful Search Interface Tropes
•
Dynamic query term suggestions


Marti Hearst
Others’ queries
Metadata or text from the Collection
SLA’09 Spring Meeting
Useful Search Interface Tropes
•
Grouping of retrieval results


Marti Hearst
By meaningful categories
By genre
SLA’09 Spring Meeting
NextBio
Marti Hearst
SLA’09 Spring Meeting
NextBio
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
NextBio
Marti Hearst
SLA’09 Spring Meeting
NextBio
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Oops …
Marti Hearst
SLA’09 Spring Meeting
Faceted Navigation
Improving collection search interfaces
What we want to Achieve
•
Integrate browsing and searching
seamlessly
•
•
Support exploration and learning
Marti Hearst
Avoid dead-ends, “pogo’ing”, and
“lostness”
SLA’09 Spring Meeting
Main Idea
•
•
Use hierarchical faceted metadata
Design the interface to:




Marti Hearst
Allow flexible navigation
Provide previews of next steps
Organize results in a meaningful way
Support both expanding and refining the
search
SLA’09 Spring Meeting
The Problem With Hierarchy
•
•
•
Most things can be classified in more than one way.
Most organizational systems do not handle this well.
Example: Animal Classification
robin
penguin
otter
penguin
robin
salmon
wolf
cobra
bat
Marti Hearst
robin
bat
robin
bat
salmon
salmon
cobra
wolf
wolf
cobra
bat
otter
wolf
penguin
otter, seal
salmon
otter
penguin
seal
Skin
Covering
Locomotion
Diet
SLA’09 Spring Meeting
The Problem with Hierarchy
•
Inflexible


•
Wasteful


•
Force the user to start with a particular category
What if I don’t know the animal’s diet, but the
interface makes me start with that category?
Have to repeat combinations of categories
Makes for extra clicking and extra coding
Difficult to modify

Marti Hearst
To add a new category type, must duplicate it
everywhere or change things everywhere
SLA’09 Spring Meeting
The Problem With Hierarchy
start
swim
fur
fish
rodents
insects
fly
scales
feathers
scales
feathers
fur
scales
slither
…
feathers
fish
fish
fish
fish
fish
fish
fish
fish
rodents
rodents
rodents
rodents
rodents
rodents
rodents
rodents
insects
insects
salmon
Marti Hearst
fur
run
insects
bat
insects
insects
robin
insects
insects
wolf
SLA’09 Spring Meeting
insects
The Idea of Facets
•
Facets are a way of labeling data


•
A kind of Metadata (data about data)
Can be thought of as properties of items
Facets vs. Categories


Marti Hearst
Items are placed INTO a category system
Multiple facet labels are ASSIGNED TO items
SLA’09 Spring Meeting
The Idea of Facets
•
Create INDEPENDENT categories (facets)

Each facet has labels (sometimes arranged in a hierarchy)
•
Assign labels from the facets to every item

Example: bioscience journal articles
Drug
Disease
Rx1
Glacoma
Rx2
Rx3
Anatomy
Species
Eye
Zebrafish
Marti Hearst
SLA’09 Spring Meeting
Example:
Nobel Prize Winners Collection
(Before and After Facets)
Marti Hearst
SLA’09 Spring Meeting
Only One Way to View Laureates
Marti Hearst
SLA’09 Spring Meeting
First, Choose Prize Type
Marti Hearst
SLA’09 Spring Meeting
Next, view the list!
The user must first choose an
Award type (literature), then browse
through the laureates in
chronological order.
No choice is given to, say organize
by year and then award, or by
country, then decade, then award, etc.
Marti Hearst
SLA’09 Spring Meeting
Flamenco Interface:
Using Hierarchical Faceted
Metadata
Marti Hearst
SLA’09 Spring Meeting
Opening View
Select literature from PRIZE facet
Marti Hearst
SLA’09 Spring Meeting
Group results by YEAR facet
Marti Hearst
SLA’09 Spring Meeting
Select 1920’s from YEAR facet
Marti Hearst
SLA’09 Spring Meeting
Current query is PRIZE > literature AND
YEAR: 1920’s. Now remove PRIZE >
literature
Marti Hearst
SLA’09 Spring Meeting
Now Group By YEAR > 1920’s
Marti Hearst
SLA’09 Spring Meeting
Hierarchy Traversal:
Group By YEAR > 1920’s, and drill down to
1921
Marti Hearst
SLA’09 Spring Meeting
Select an individual item
Marti Hearst
SLA’09 Spring Meeting
Use Endgame to expand out
Marti Hearst
SLA’09 Spring Meeting
Use Endgame to expand out
Marti Hearst
SLA’09 Spring Meeting
Or use “More like this” to find similar
items
Marti Hearst
SLA’09 Spring Meeting
Start a new search using keyword
“California”
Marti Hearst
SLA’09 Spring Meeting
Note that category structure remains after the
keyword search
Marti Hearst
SLA’09 Spring Meeting
The query is now a keyword ANDed with a facet
subhierarchy
Marti Hearst
SLA’09 Spring Meeting
Advantages of Faceted Navigation
•
•
Gives users control and flexibility
Can’t end up with empty results sets

•
•
Helps avoid feelings of being lost.
Easier to explore the collection.


•
(except with keyword search)
Helps users infer what kinds of things are in the collection.
Evokes a feeling of “browsing the shelves”
Is preferred over standard search for collection browsing
in usability studies.

Marti Hearst
(Interface must be designed properly)
SLA’09 Spring Meeting
Advantages of Faceted Metadata

Helps alleviate the metadata wars:

Allows for both splitters and lumpers
 Is this a bird or a robin
 Doesn’t matter, you can do both!

Allows for differing organizational views
 Does NASCAR go under sports or entertainment?
 Doesn’t matter, you can do both!
Marti Hearst
SLA’09 Spring Meeting
MeSH (Medical Subject Headings)
•
NLM’s MeSH category labels are assigned to
Medline Articles
•
•
But it is hard to browse.
We converted it to a faceted structure, but
haven’t used it yet.
Marti Hearst
SLA’09 Spring Meeting
Aquabrowser:
Faceted Navigation in a DL
Shown on lens.lib.uchicago.edu
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Marti Hearst
SLA’09 Spring Meeting
Results after Refinement
Marti Hearst
SLA’09 Spring Meeting
Time for a Break!
Tags and other Social Media
Two Main Points
1. Massive user behavior is aiding search algorithms
in interesting ways.
2. Going deeper: An examination of social tagging:



Marti Hearst
The controversy
Research questions
Our work on automating creation of metadata
structure
SLA’09 Spring Meeting
User-contributed content is exploding!
Marti Hearst
SLA’09 Spring Meeting
Social Information & Search
•
Trend: human behavioral information is
getting “baked in” to search algorithms.
•
In many cases, the actions of the many is
more useful than the actions of the
individual.
•
Three examples follow.
Marti Hearst
SLA’09 Spring Meeting
Actions of the Many vs. Individual
1.
Anchor text for improved ranking.

vs author-supplied meta-tags
Marti Hearst
SLA’09 Spring Meeting
Actions of the Many vs. Individual
2.“Clickthrough” to improve
ranking.



vs. an individual’s prior clicks
Joachims et al. and Agichtein et
al. found that human selections
of links from search results could
improve rankings for popular
queries.
Some surprising rules:

Marti Hearst
Assign negative weight to an
unclicked link that appears
above and below a clicked link
SLA’09 Spring Meeting
Actions of the Many vs. the
Individual
3. Query auto-suggest based on other users’ queries

Marti Hearst
vs based on one one’s prior queries alone
SLA’09 Spring Meeting
Social Tagging
•
•
•
Metadata assignment without all the bother
Spontaneous, easy, low cognitive overhead
Usually used in the context of social media
Marti Hearst
SLA’09 Spring Meeting
Popular pages
on del.icio.us
Marti Hearst
SLA’09 Spring Meeting
Visitor tagging at Powerhouse Museum
Marti Hearst
SLA’09 Spring Meeting
Tagging is Controversial!
•
•
•
Sloppy!
Disorganized!
Incorrect!
•
•
•
Power to the people!
Easy!
Cheap!
Professional Cataloguer:
“Everything I know
isn't in the picture!”
Investigating
social tagging and folksonomy in the art museumwith steve.museum", J.
Marti Hearst
Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop
SLA’09 Spring Meeting
The Tagging Opportunity
•
At last! Content-oriented metadata in the large!
•
Attempts at metadata standardization always end
up with something like the Dublin Core

•
author, date, publisher, ....
I think the action is in the subject metadata, and
have focused on how to navigate collections given
such data.
Marti Hearst
SLA’09 Spring Meeting
The Tagging Opportunity
•
Tags are inherently faceted !

Multiple labels are assigned to each item



Marti Hearst
Rather than placing them into a folder
Rather than placing them into a hierarchy
Concepts are assigned from many different
content categories
SLA’09 Spring Meeting
Tagging Problems
•
The haphazard assignments lead to problems with

Synonymy

Homonymy

Unpredictability
See how this author attempts to compensate:
Marti Hearst
SLA’09 Spring Meeting
Tagging Problems
•
Some tags are fleeting in meaning or too personal

•
•
•
toread todo
Tags don’t “cover” all the concepts
Tags are disorganized
Tags are not “professional”

Marti Hearst
(I personally don’t think this matters)
SLA’09 Spring Meeting
•
•
Research Questions for Tags &
Search
How to improve tag convergence?
How to group tags meaningfully? How to eliminate
uninteresting tags?


Marti Hearst
What is the role of user interface on tag convergence?

Preliminary evidence suggests there is a big effect

There are some good ideas out there

More experimentation is needed.
What algorithms can we use to clean up the tags after
they are assigned?

There is some work here, much more can be done.
 TagAssist: Automatic Tag Suggestion for Blog Posts, Sood et al.,
ICWSM 2007
SLA’09 Spring Meeting
Interface for adding tags on
del.icio.us
Marti Hearst
SLA’09 Spring Meeting
Effects of Interface
On the Structure, Properties and Utility of Internal Corporate
Blogs,Kolari et al. ICWSM 2007
Marti Hearst
SLA’09 Spring Meeting
Research Questions for Tags &
Search
How to get tag expertise?
Who will identify the
plant species in
this image?
office desk
Marti Hearst
plants windows
shadows
SLA’09 Spring Meeting
Research Questions for Tags &
Search
•
What is the relationship of social tags to
automated content extraction?
•
Are tags more informative, or differently
informative, than other labeling methods?
Marti Hearst
SLA’09 Spring Meeting
Research Questions for Tags & Society
•
What motivates people to tag?
•
Who owns the tags?
•
Privacy and sharing of tags?
Marti Hearst
SLA’09 Spring Meeting
Research Questions for Tags & Search
•
How to use tags for browsing / navigation?

Currently most tags are used as a direct index into
items


Grouping into small hierarchies is not usually done


•
Click on tag, see items assigned to it, end of story
del.icio.us now has bundles, but navigation isn’t good
IBM’s dogear comes the closest
One solution: organize tags into faceted hierarchies, use
faceted navigation.
Marti Hearst
SLA’09 Spring Meeting
How to Create Faceted
Hierarchies?
Our Approach: Castanet
(Stoica, Hearst, & Merichar, HLT-NAACL ’07)
Example: Biology Journal Titles
Castanet Output (shown in Flamenco)
Marti Hearst
SLA’09 Spring Meeting
Example: Biology Journal Titles
Castanet Output (shown in Flamenco)
Marti Hearst
SLA’09 Spring Meeting
Castanet Algorithm
Select terms
Leverage the structure of WordNet
Documents
•
Get
hypernym
paths
Build
tree
Compress
tree
WordNet
Divide into facets
Marti Hearst
SLA’09 Spring Meeting
Will Castanet Work on Tags?
•
•
•
•
Class project by Simon King and Jeff Towle, 2004
1650 captions captured from mobile phones
Wanted to organize them.
Used the CastaNet algorithm

Marti Hearst
Had to first remove proper names
SLA’09 Spring Meeting
Example Photos & Captions
(King & Towle)
very scary x-mas tree
chasing a cat in the dark
Marti Hearst
Hp presentation
My cat
SLA’09 Spring Meeting
•
instrumentality, (112)

vehicle (26)






mayflower (2)
ferry (1)
gig (1)


truck (3)
airplane (2)
machine (7)



computer (4)
laptop (1)
sander (1)
can (2)

backpack (1)

bumper (1)

empty (1)

salt_shaker (1)
furniture, piece of furniture,
article of furniture (12)


seat (8)






Marti Hearst
bottle (5)

water_bottle (2)

jug (1)

pill_bottle (1)
bath (2)
bowl (1)

device (20)

container (16)

vessel (7)
car (9)
bike (8)
vessel, watercraft (4)





bench (2)
chair (2)
couch (2)
lounge (1)
bed (4)
desk (1)
SLA’09 Spring Meeting
Tag Clouds
Explained
What does a typical tag cloud look like?
Definition
Tag Cloud: A visual representation of social tags,
organized into paragraph-style layout, usually in
alphabetical order, where the relative size and
weight of the font for each tag corresponds to
the relative frequency of its use.
Marti Hearst
SLA’09 Spring Meeting
Definition
Tag Cloud: A visual representation
of social
tags,
organized into paragraph-style
layout, usually in alphabetical order, where
relative size and weight of the font
for each tag corresponds to the relative frequency
the
of its use.
Marti Hearst
SLA’09 Spring Meeting
del.icio.us
Marti Hearst
SLA’09 Spring Meeting
del.icio.us
Marti Hearst
SLA’09 Spring Meeting
blogs
Marti Hearst
SLA’09 Spring Meeting
I was puzzled by the questions:
•
What are designers and authors’ intentions in
creating or using tag clouds?
•
How do they expect their readers to use
them?
Marti Hearst
SLA’09 Spring Meeting
On the positive side:
•
•
Compact
•
You get three dimensions simultaneously!
Draws the eye towards the most frequent
(important?) tags



Marti Hearst
alphabetical order
size indicating importance
the tags themselves
SLA’09 Spring Meeting
Weirdnesses
•
Initial encounters unencouraging

Some reports from industry:


Marti Hearst
Is the computer broken?
Is this a ransom note?
SLA’09 Spring Meeting
Violates Principles of Perceptual Design
•
Eye moves around
erratically
•
Longer words grab
more intention
•
White space caused
by ascenders &
descenders aren’t
meaningful
•
Proximity doesn’t
hold meaning
•
Paragraph position
has saliency effects
•
Should allow for
visual comparisons
(Tufte)
Marti Hearst
SLA’09 Spring Meeting
Weirdnesses
•
Meaningful
associations are
lost

Marti Hearst
Where are
the different
country
names in this
tag clouds?
SLA’09 Spring Meeting
Weirdnesses
Which operating systems are mentioned?
Marti Hearst
SLA’09 Spring Meeting
Two Studies of Use in Information Analysis
•
Both found that the spatial organization and
varying font sizes were inferior for:


Marti Hearst
Finding items in list
Getting the gist of the tags
SLA’09 Spring Meeting
Interviews
•
I was really confused about tag clouds, so I
decided to ask the people behind the puffs

15 interviews, conducted at foocamp’06


Marti Hearst
Several web 2.0 leaders
5 more interviews at Google and Berkeley
SLA’09 Spring Meeting
A Surprise
•
7 interviewees DID NOT REALIZE that alphabetical
ordering is standard.

•
What was the answer given to “what order are tags shown
in?”




•
2 of these people were in charge of such sites but had had
others write the code
hadn’t thought about it
don’t think about tag clouds that way
random order
ordered by semantic similarity
Suggests that perhaps people are too distracted by the
layout to use the alphabetical ordering
Marti Hearst
SLA’09 Spring Meeting
Suggested main purposes:
•
•
•
To signal the presence of tags on the site
•
To show what kinds of information are on the site
A good way to get the gist of the site
An inviting and fun way to get people interacting
with the site

•
Some of these said they are good for navigation
Easy to implement
Marti Hearst
SLA’09 Spring Meeting
Tag Clouds as Self-Descriptions
•
Several noted that a tag cloud showing one’s own
tags can be evocative



A good summary of what one is thinking and
reading about
Useful for self-reflection
Useful for showing others one’s thoughts


Marti Hearst
One example: comparing someone else’s tags to own’s
one to see what you have in common, and what
special interests differentiate you
Useful for tracking changes in friends’ lives
 Oh, a new girl’s name has gotten larger; he must have a
new girlfriend!
SLA’09 Spring Meeting
Tag Clouds as showing “Trends”
•
Several people used this term, that tag clouds show
trends in someone’s behavior




Marti Hearst
Trends are usually patterns across time, which are not
inherently visible in tag clouds
To note a trend using a tag cloud, one must remember what
was there at an earlier time, and what changed

tracking the girls’ names example
This suggests a reason for the importance of the large tags –
draws one’s attention to what is big now versus was used to
be large.
Suggests also why it doesn’t matter that you can’t see small
tags.
SLA’09 Spring Meeting
New Perspective: Tag Clouds are Social!
•
•
It’s not about the “information”!
Not surprising in retrospect; tagging is in large
part about the social aspect


•
Seems to work mainly when the tags can be seen
by many
Even better when items can be tagged by many
and seen by many
What does this mean though when tag clouds are
applied to non-social information?
Marti Hearst
SLA’09 Spring Meeting
Follow-up Study
•
Informed by the interview results, we search for, read,
and coded web pages that mentioned tag clouds.



Looked at about 140 discussions
Developed 21 codes
Looked at another 90 discussions


Used web queries: “tag clouds”, usability tag clouds,
etc
Sampled every 10th url





Marti Hearst
58% personal blogs
20% commercial blogs
10% commercial web pages
rest from group blogs and discussion lists
Doesn’t tell us what people who don’t write about
SLA’09 Spring Meeting
The Role of Popularity
•
Popularity in the sense that tag clouds (and tagging) are
trendy and popular.



•
Some people liked the visualization, but their popularity
made them less appealing

Famous post: “Tag clouds are the new mullets”

Led to self-consciousness about liking them
Many complained about unaesthetic cloud designs
Little consensus on if they are a fad or have staying power
Popularity also in the sense of the large font size for more
popular tags

Marti Hearst
Many people like the prominence of large tags, but several
commented on the tyranny of the popular
SLA’09 Spring Meeting
The Role of Navigation
•
Opinions vary





Marti Hearst
Many simply state they are useful for
navigation, but with no support for this claim
Some claim the compactness makes
navigation easier than a vertical list
Some object to the varying font size on
scannability
Others object to the lack of organization
Overall, there is no evidence either way that
we could find in the blog community
SLA’09 Spring Meeting
Aesthetic Considerations
•
Disagreement on the aesthetic and emotional
appeal, especially for lay users.
•
Those who like them find them fun and
appealing
•
Those who don’t find them messy, strange,
like a ransom note
•
Informal reports with first time users who are
not in the Web 2.0 community are negative
Marti Hearst
SLA’09 Spring Meeting
Trends again
•
As in the interviews, the benefit of “trends”
was mentioned many times.
•
There is another sense of “trend” as
“tendency or inclination,” and this might be
what people mean.
Marti Hearst
SLA’09 Spring Meeting
Tag Clouds as Social Information
•
An emphasis that tag clouds are meant to show
human behavior.
•
We found reports of people commenting on other
uses that were invalid because they did not
reflect live user input:


Marti Hearst
One blogger noted the incongruity of an online
library using keyword frequencies in a tag cloud
rather than having it reflect patron’s usage of the
collection.
An online community noticed one site’s cloud
didn’t change over time and realized the sizes
were decided by marketing. This was greated
SLA’09 Spring Meeting
Implications
•
Assume tag clouds are meant to reflect human
mental activity (individual or group)
•
Then what might seem design flaws from an
information conveyance perspective may not be
•
A large part of the appeal is the fun and
liveliness.

Marti Hearst
The informality of the layout reflects the human
activity beneath it.
SLA’09 Spring Meeting
Conclusions on Tagging
•
Social tagging is, in my view, a terrific way to get good
content metadata.
•
I think automated techniques can do a lot to help clean
them up and organize them.
•
They are an inherently social phenomenon, part of
social media, which is a really exciting area.
•
The socialness of social media can yield surprises, like
tag clouds.
Marti Hearst
SLA’09 Spring Meeting
Download