Text, Tags and Thumbnails: Latest Trends in Bioscience Literature Search Marti A. Hearst Associate Professor UC Berkeley Special Libraries Association Pharmaceutical & Health Technologies Division Spring Meeting March 22, 2009 Some research reported here supported by NSF DBI-0317510 and a gift from Genentech Tutorial Outline • • Fundamentals of User Interface Design Search Interfaces Faceted navigation Specific to bioscience literature • Term suggestions Showing figures in search results Social Tagging Marti Hearst SLA’09 Spring Meeting Let’s get acquainted Fundamentals of UI Design Principles of HCI (Human-Computer Interaction) • Design for the user • Make use of cognitive principles where available Marti Hearst AKA: user-centered design Not for the designers Not for the system Important guidelines for search: Reduce memory load Speak the user’s language Provide helpful feedback Respect perceptual principles SLA’09 Spring Meeting What makes for a good/bad user experience? Your examples? My (subtle) example Paying my taxes online, March 2009. Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Problems • Biggest problem: I will pay taxes for the wrong year, requiring a very costly repair. • They have a special option in the phone tree for this error (proof of a usability problem!) Other problems: Marti Hearst What does this mean? What do I do? Why are there so few tax forms to choose from, and what the heck are they? SLA’09 Spring Meeting Problems • Biggest problem: I will pay taxes for the wrong year, requiring a very costly repair. They have a special option in the phone tree for this error (proof of a usability problem!) Marti Hearst Yes; I made this error last year and it still isn’t fixed! Violates: avoid errors, provide good defaults SLA’09 Spring Meeting Problems • • What does this mean? What do I do? Violates: Marti Hearst Speak the user’s language Provide help. SLA’09 Spring Meeting Problems • • • • Which form am I selecting? What if my choice is missing? Entering in a form number doesn’t work. Violates: Marti Hearst Provide useful labels Match the user’s task SLA’09 Spring Meeting User-Centered Design • Needs assessment • Iterate between Marti Hearst Find out who users are what their goals are what tasks they need to perform Task Analysis Characterize what steps users need to take Create scenarios of actual use Decide which users and tasks to support Designing Evaluating SLA’09 Spring Meeting User Interface Design is an Iterative Process Design Evaluate Prototype Marti Hearst SLA’09 Spring Meeting Rapid Prototyping • • Build a mock-up of design Low fidelity techniques Marti Hearst paper sketches cut, copy, paste video segments SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Telebears example Marti Hearst SLA’09 Spring Meeting Why Do Prototypes? • • • • Get feedback on the design faster Experiment with alternative designs Fix problems before code is written Keep the design centered on the user Marti Hearst SLA’09 Spring Meeting Evaluation • Test with real users (participants) • Formally or Informally “Discount” techniques Marti Hearst Potential users interact with paper computer Expert evaluations (heuristic evaluation) Expert walkthroughs SLA’09 Spring Meeting Design Guidelines • What are they? • Examples: • Provide informative feedback Support recognition over recall Provide for user control and understanding Heuristic Evaluation: Marti Hearst Rules of thumb for how to design Bloopers book has many recommendations An expert measures the mock-ups against well-known design guidelines. SLA’09 Spring Meeting Results of Using Heuristic Evaluation • Single evaluator achieves poor results only finds 35% of usability problems 5 evaluators find ~ 75% of usability problems why not more evaluators? 10? 20? Marti Hearst adding evaluators costs more adding more evaluators doesn’t increase the number of unique problems found SLA’09 Spring Meeting Decreasing Returns problems found • • • Marti Hearst benefits / cost (from Nielsen) Caveat: these graphs are for a specific example This is a controversial point. SLA’09 Spring Meeting Affordances • The perceived properties of an object that determine how it can be used. (Don Norman) • Some affordances are obvious, some learned • Knobs are for turning. Buttons are for pushing. Glass can be seen through. Glass breaks easily. Sometimes visual plus physical feedback Marti Hearst Floppy disk example Rectangular – can’t insert sideways Tabs on the disk prevent the drive from letting it be fully inserted backwards SLA’09 Spring Meeting Affordances of a Teapot? Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Affordances of an iPod? Marti Hearst SLA’09 Spring Meeting Small Details Matter • UIs for search especially require great care in small details • How and where to place things is important Marti Hearst In part due to the text-heavy nature of search A tension between more information and introducing clutter People tend to scan or skim Only a small percentage reads instructions SLA’09 Spring Meeting Small Details Matter Example: In an earlier version of the Google Spellchecker, people didn’t always see the suggested correction Used a long sentence at the top of the page: “If you didn’t find what you were looking for …” Marti Hearst People complained they got results, but not the right results. In reality, the spellchecker had suggested an appropriate correction. SLA’09 Spring Meeting Small Details Matter • The fix: Analyzed logs, saw people didn’t see the correction: • clicked on first search result, didn’t find what they were looking for (came right back to the search page scrolled to the bottom of the page, did not find anything and then complained directly to Google Solution was to repeat the spelling suggestion at the bottom of the page. More adjustments: The message is shorter, and different on the top vs. the bottom Marti Hearst Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html SLA’09 Spring Meeting Time for a Break! Searching Bioscience Literature Double Exponential Growth in Bioscience Journal Articles From Hunter & Cohen, Molecular Cell 21, 2006 Marti Hearst SLA’09 Spring Meeting BioText Project Goals • Provide flexible, useful, appealing search for bioscientists. • Focus on: • Supported by the NSF Marti Hearst Full text journal articles New language analysis algorithms New search interfaces http://biotext.berkeley.edu SLA’09 Spring Meeting The Importance of Figures and Captions • Observations of biologists’ reading habits: It has often observed that biologists focus on figures+captions along with title and abstract. • KDD Cup 2002 The objective was to extract only the papers that included experimental results regarding expression of gene products and to identify the genes and products for which experimental results were provided. ClearForest+Celera did well in part by focusing on figure captions, which contain critical experimental evidence. Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Our Idea • Make a full text search engine for journal articles that focuses on showing figures • Make it possible to search over caption text (and text that refers to captions) • Try to group the figures intelligently Marti Hearst SLA’09 Spring Meeting Developing the BioText Search Interface • Main idea: a search interface that meets the unique needs of bioscientists. • Hypothesis: the articles’ figures should be exposed in the interface. • Process: • Did interviews, designed mock-up Made an initial prototype Did a pilot study Used these results to redesign Evaluated the new design Results: highly positive responses. Marti Hearst SLA’09 Spring Meeting Related Work • Cohen & Murphy: • Yu et al. • Parsed structure of image captions Extract facts about subcellular localization Created a small image taxonomy; classified images according to these with SVMs Yu & Lee: Marti Hearst BioEx: Link sentences from an abstract to images in the same paper; show those when displaying a paper. Not focused on a full search interface; can’t search over caption text. SLA’09 Spring Meeting Pilot Usability Study • Primary Goal: • Determine whether biological researchers would find the idea of caption search and figure display to be useful or not. Secondary Goal: Marti Hearst Should caption search and figure display be useful, how best to support these features in the interface. SLA’09 Spring Meeting Method • Told participants we were evaluating a new search interface • (tip: don’t say “our” interface) Asked them to use each design on their own queries (order of presentation was varied) • Had them fill out a questionnaire after each interface session • Also had open-ended discussions about the designs Marti Hearst SLA’09 Spring Meeting Participants Marti Hearst SLA’09 Spring Meeting Captions + Figure View Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Captions + Figure & Thumbnails Marti Hearst SLA’09 Spring Meeting Results Captions + Figure View 7 = strongly agree 1 = strong disagree participant # Marti Hearst participant # SLA’09 Spring Meeting Results • 7 out of 8 said they would want to use either CF or CFT in their bioscience journal article searches The 8th thought figures would not be useful in their tasks • Many participants noted that caption search would be better for some tasks than others • Two of the participants preferred CFT to CF; the rest thought CFT was too busy. Best to show all the thumbnails that correspond to a given article after full text search Best to show only the figure that corresponds to the caption in the caption search view Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Results, cont. • All four participants who saw the Grid view liked it, but noted that the metadata shown was insufficient; • If it were changed to include title and other bibliographic data, 2 of the 4 who saw Grid said they would prefer that view over the CF view. Marti Hearst SLA’09 Spring Meeting Current Design http://biosearch.berkeley.edu Current Design • Indexes the PubMedCentral open access journal article collection, with more than: 300 journals Marti Hearst 129,000 articles 247,000 figures 104,000 tables SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Second Study • • Modified, improved interface 20 participants 6 grad students, 6 postdocs, 1 faculty, 7 other Marti Hearst Cell or molecular biology, genetics or genomics, biochemistry, evolutionary biology, bioinformatics. All use PubMed, most as primary tool SLA’09 Spring Meeting Second Study • Procedure: • Session lasted ~1 hour Participants were shown the interface and its views, and then asked to use it and respond. They then assessed the interfaces explicitly. Measures: Marti Hearst Focus on subjective responses. Intent to use is a reliable indicator of actual usage. (Venkatesh & Morris 03, Sun & Zhang 06) SLA’09 Spring Meeting How Likely to Use Interface? Marti Hearst SLA’09 Spring Meeting Full Text View: Favorable Aspects Marti Hearst SLA’09 Spring Meeting Full Text View: Unfavorable Aspects Marti Hearst SLA’09 Spring Meeting Figure Caption Views: Favorable Aspects Marti Hearst SLA’09 Spring Meeting Figure Caption Views: Unfavorable Aspects Marti Hearst SLA’09 Spring Meeting Table View: Favorable Aspects Marti Hearst SLA’09 Spring Meeting Table View: Unfavorable Aspects Marti Hearst SLA’09 Spring Meeting Showing Related Terms in Bioscience Literature Search Needs assessment and low-fi evaluation First Questionnaire • General information about how they search and what related information they want to see. • 38 participants Marti Hearst 22 grad students, 6 postdocs, 5 faculty, 5 other Systems biology, bioinformatics, genomics, biochemistry, cellular and evolutionary biology, microbiology, physiology, … SLA’09 Spring Meeting Participants’ Characteristics Marti Hearst SLA’09 Spring Meeting Results Related Information Type Avg rating # selecting 1 or 2 Gene’s Synonyms 4.4 Gene’s Synonyms refined by organism Gene’s Homologs Genes from same family: parents Genes from same family: children Genes from same family: siblings 2 4.0 3.7 3.4 3.6 3.2 Genes this gene interacts with 3.7 Diseases this gene is associated with Chemicals/drugs this gene is associated with Localization information for this gene 1 Marti Hearst (Do NOT want this) 2 3 (Neutral) 2 5 7 4 9 4 3.4 3.2 3.7 4 6 8 3 5 SLA’09 Spring Meetingthis) (REALLY want Second Questionnaire • Evaluating 4 designs for gene/protein name suggestions • 19 participants Marti Hearst 4 grad students, 7 postdocs, 3 faculty, 5 other Wide range of specializations SLA’09 Spring Meeting Design 1: Baseline Marti Hearst SLA’09 Spring Meeting Design 2: Links Marti Hearst SLA’09 Spring Meeting Design 3: Checkboxes Marti Hearst SLA’09 Spring Meeting Design 4: Grouped Links Marti Hearst SLA’09 Spring Meeting Results Design 3 Participants who rated design 1st or 2nd Average rating (1=low, 4=high) # % 15 79 3.3 10 53 2.6 9 47 2.5 0 0 1.6 (checkboxes) 4 (grouped links) 2 (links) 1 (baseline) Marti Hearst SLA’09 Spring Meeting Results: More Detail • Strong desire for the search system to suggest information closely related to gene/protein names. • • Some interest in less closely related information . • Most participants want to see organism names in conjunction with gene names. A majority of participants prefer to see term suggestions grouped by type (synonyms, homologs, etc). Marti Hearst SLA’09 Spring Meeting Results: More Detail • Split in preference between single-click hyperlink interaction (categories or single terms) and checkbox-style interaction. • The majority of participants prefers to have the option to chose either individual names or whole groups with one click. • Split in preference between the system suggesting only names that it is highly confident are related and include names that it is less confident about under a “show more” link. Marti Hearst SLA’09 Spring Meeting Summary: BioText Search Studies • Nearly all participants strongly desire • Impediments to adoption • Full text search Figure display in search results Needs to index all articles Needs to be in the primary search tool(s) Participants also want to see term suggestions that are closely related to their query. Marti Hearst SLA’09 Spring Meeting Time for a Break! More on Search Interfaces Useful Search Interface Tropes • Dynamic query term suggestions Marti Hearst Others’ queries Metadata or text from the Collection SLA’09 Spring Meeting Useful Search Interface Tropes • Grouping of retrieval results Marti Hearst By meaningful categories By genre SLA’09 Spring Meeting NextBio Marti Hearst SLA’09 Spring Meeting NextBio Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting NextBio Marti Hearst SLA’09 Spring Meeting NextBio Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Oops … Marti Hearst SLA’09 Spring Meeting Faceted Navigation Improving collection search interfaces What we want to Achieve • Integrate browsing and searching seamlessly • • Support exploration and learning Marti Hearst Avoid dead-ends, “pogo’ing”, and “lostness” SLA’09 Spring Meeting Main Idea • • Use hierarchical faceted metadata Design the interface to: Marti Hearst Allow flexible navigation Provide previews of next steps Organize results in a meaningful way Support both expanding and refining the search SLA’09 Spring Meeting The Problem With Hierarchy • • • Most things can be classified in more than one way. Most organizational systems do not handle this well. Example: Animal Classification robin penguin otter penguin robin salmon wolf cobra bat Marti Hearst robin bat robin bat salmon salmon cobra wolf wolf cobra bat otter wolf penguin otter, seal salmon otter penguin seal Skin Covering Locomotion Diet SLA’09 Spring Meeting The Problem with Hierarchy • Inflexible • Wasteful • Force the user to start with a particular category What if I don’t know the animal’s diet, but the interface makes me start with that category? Have to repeat combinations of categories Makes for extra clicking and extra coding Difficult to modify Marti Hearst To add a new category type, must duplicate it everywhere or change things everywhere SLA’09 Spring Meeting The Problem With Hierarchy start swim fur fish rodents insects fly scales feathers scales feathers fur scales slither … feathers fish fish fish fish fish fish fish fish rodents rodents rodents rodents rodents rodents rodents rodents insects insects salmon Marti Hearst fur run insects bat insects insects robin insects insects wolf SLA’09 Spring Meeting insects The Idea of Facets • Facets are a way of labeling data • A kind of Metadata (data about data) Can be thought of as properties of items Facets vs. Categories Marti Hearst Items are placed INTO a category system Multiple facet labels are ASSIGNED TO items SLA’09 Spring Meeting The Idea of Facets • Create INDEPENDENT categories (facets) Each facet has labels (sometimes arranged in a hierarchy) • Assign labels from the facets to every item Example: bioscience journal articles Drug Disease Rx1 Glacoma Rx2 Rx3 Anatomy Species Eye Zebrafish Marti Hearst SLA’09 Spring Meeting Example: Nobel Prize Winners Collection (Before and After Facets) Marti Hearst SLA’09 Spring Meeting Only One Way to View Laureates Marti Hearst SLA’09 Spring Meeting First, Choose Prize Type Marti Hearst SLA’09 Spring Meeting Next, view the list! The user must first choose an Award type (literature), then browse through the laureates in chronological order. No choice is given to, say organize by year and then award, or by country, then decade, then award, etc. Marti Hearst SLA’09 Spring Meeting Flamenco Interface: Using Hierarchical Faceted Metadata Marti Hearst SLA’09 Spring Meeting Opening View Select literature from PRIZE facet Marti Hearst SLA’09 Spring Meeting Group results by YEAR facet Marti Hearst SLA’09 Spring Meeting Select 1920’s from YEAR facet Marti Hearst SLA’09 Spring Meeting Current query is PRIZE > literature AND YEAR: 1920’s. Now remove PRIZE > literature Marti Hearst SLA’09 Spring Meeting Now Group By YEAR > 1920’s Marti Hearst SLA’09 Spring Meeting Hierarchy Traversal: Group By YEAR > 1920’s, and drill down to 1921 Marti Hearst SLA’09 Spring Meeting Select an individual item Marti Hearst SLA’09 Spring Meeting Use Endgame to expand out Marti Hearst SLA’09 Spring Meeting Use Endgame to expand out Marti Hearst SLA’09 Spring Meeting Or use “More like this” to find similar items Marti Hearst SLA’09 Spring Meeting Start a new search using keyword “California” Marti Hearst SLA’09 Spring Meeting Note that category structure remains after the keyword search Marti Hearst SLA’09 Spring Meeting The query is now a keyword ANDed with a facet subhierarchy Marti Hearst SLA’09 Spring Meeting Advantages of Faceted Navigation • • Gives users control and flexibility Can’t end up with empty results sets • • Helps avoid feelings of being lost. Easier to explore the collection. • (except with keyword search) Helps users infer what kinds of things are in the collection. Evokes a feeling of “browsing the shelves” Is preferred over standard search for collection browsing in usability studies. Marti Hearst (Interface must be designed properly) SLA’09 Spring Meeting Advantages of Faceted Metadata Helps alleviate the metadata wars: Allows for both splitters and lumpers Is this a bird or a robin Doesn’t matter, you can do both! Allows for differing organizational views Does NASCAR go under sports or entertainment? Doesn’t matter, you can do both! Marti Hearst SLA’09 Spring Meeting MeSH (Medical Subject Headings) • NLM’s MeSH category labels are assigned to Medline Articles • • But it is hard to browse. We converted it to a faceted structure, but haven’t used it yet. Marti Hearst SLA’09 Spring Meeting Aquabrowser: Faceted Navigation in a DL Shown on lens.lib.uchicago.edu Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Marti Hearst SLA’09 Spring Meeting Results after Refinement Marti Hearst SLA’09 Spring Meeting Time for a Break! Tags and other Social Media Two Main Points 1. Massive user behavior is aiding search algorithms in interesting ways. 2. Going deeper: An examination of social tagging: Marti Hearst The controversy Research questions Our work on automating creation of metadata structure SLA’09 Spring Meeting User-contributed content is exploding! Marti Hearst SLA’09 Spring Meeting Social Information & Search • Trend: human behavioral information is getting “baked in” to search algorithms. • In many cases, the actions of the many is more useful than the actions of the individual. • Three examples follow. Marti Hearst SLA’09 Spring Meeting Actions of the Many vs. Individual 1. Anchor text for improved ranking. vs author-supplied meta-tags Marti Hearst SLA’09 Spring Meeting Actions of the Many vs. Individual 2.“Clickthrough” to improve ranking. vs. an individual’s prior clicks Joachims et al. and Agichtein et al. found that human selections of links from search results could improve rankings for popular queries. Some surprising rules: Marti Hearst Assign negative weight to an unclicked link that appears above and below a clicked link SLA’09 Spring Meeting Actions of the Many vs. the Individual 3. Query auto-suggest based on other users’ queries Marti Hearst vs based on one one’s prior queries alone SLA’09 Spring Meeting Social Tagging • • • Metadata assignment without all the bother Spontaneous, easy, low cognitive overhead Usually used in the context of social media Marti Hearst SLA’09 Spring Meeting Popular pages on del.icio.us Marti Hearst SLA’09 Spring Meeting Visitor tagging at Powerhouse Museum Marti Hearst SLA’09 Spring Meeting Tagging is Controversial! • • • Sloppy! Disorganized! Incorrect! • • • Power to the people! Easy! Cheap! Professional Cataloguer: “Everything I know isn't in the picture!” Investigating social tagging and folksonomy in the art museumwith steve.museum", J. Marti Hearst Trant, B. Wyman, WWW 2006 Collaborative Tagging Workshop SLA’09 Spring Meeting The Tagging Opportunity • At last! Content-oriented metadata in the large! • Attempts at metadata standardization always end up with something like the Dublin Core • author, date, publisher, .... I think the action is in the subject metadata, and have focused on how to navigate collections given such data. Marti Hearst SLA’09 Spring Meeting The Tagging Opportunity • Tags are inherently faceted ! Multiple labels are assigned to each item Marti Hearst Rather than placing them into a folder Rather than placing them into a hierarchy Concepts are assigned from many different content categories SLA’09 Spring Meeting Tagging Problems • The haphazard assignments lead to problems with Synonymy Homonymy Unpredictability See how this author attempts to compensate: Marti Hearst SLA’09 Spring Meeting Tagging Problems • Some tags are fleeting in meaning or too personal • • • toread todo Tags don’t “cover” all the concepts Tags are disorganized Tags are not “professional” Marti Hearst (I personally don’t think this matters) SLA’09 Spring Meeting • • Research Questions for Tags & Search How to improve tag convergence? How to group tags meaningfully? How to eliminate uninteresting tags? Marti Hearst What is the role of user interface on tag convergence? Preliminary evidence suggests there is a big effect There are some good ideas out there More experimentation is needed. What algorithms can we use to clean up the tags after they are assigned? There is some work here, much more can be done. TagAssist: Automatic Tag Suggestion for Blog Posts, Sood et al., ICWSM 2007 SLA’09 Spring Meeting Interface for adding tags on del.icio.us Marti Hearst SLA’09 Spring Meeting Effects of Interface On the Structure, Properties and Utility of Internal Corporate Blogs,Kolari et al. ICWSM 2007 Marti Hearst SLA’09 Spring Meeting Research Questions for Tags & Search How to get tag expertise? Who will identify the plant species in this image? office desk Marti Hearst plants windows shadows SLA’09 Spring Meeting Research Questions for Tags & Search • What is the relationship of social tags to automated content extraction? • Are tags more informative, or differently informative, than other labeling methods? Marti Hearst SLA’09 Spring Meeting Research Questions for Tags & Society • What motivates people to tag? • Who owns the tags? • Privacy and sharing of tags? Marti Hearst SLA’09 Spring Meeting Research Questions for Tags & Search • How to use tags for browsing / navigation? Currently most tags are used as a direct index into items Grouping into small hierarchies is not usually done • Click on tag, see items assigned to it, end of story del.icio.us now has bundles, but navigation isn’t good IBM’s dogear comes the closest One solution: organize tags into faceted hierarchies, use faceted navigation. Marti Hearst SLA’09 Spring Meeting How to Create Faceted Hierarchies? Our Approach: Castanet (Stoica, Hearst, & Merichar, HLT-NAACL ’07) Example: Biology Journal Titles Castanet Output (shown in Flamenco) Marti Hearst SLA’09 Spring Meeting Example: Biology Journal Titles Castanet Output (shown in Flamenco) Marti Hearst SLA’09 Spring Meeting Castanet Algorithm Select terms Leverage the structure of WordNet Documents • Get hypernym paths Build tree Compress tree WordNet Divide into facets Marti Hearst SLA’09 Spring Meeting Will Castanet Work on Tags? • • • • Class project by Simon King and Jeff Towle, 2004 1650 captions captured from mobile phones Wanted to organize them. Used the CastaNet algorithm Marti Hearst Had to first remove proper names SLA’09 Spring Meeting Example Photos & Captions (King & Towle) very scary x-mas tree chasing a cat in the dark Marti Hearst Hp presentation My cat SLA’09 Spring Meeting • instrumentality, (112) vehicle (26) mayflower (2) ferry (1) gig (1) truck (3) airplane (2) machine (7) computer (4) laptop (1) sander (1) can (2) backpack (1) bumper (1) empty (1) salt_shaker (1) furniture, piece of furniture, article of furniture (12) seat (8) Marti Hearst bottle (5) water_bottle (2) jug (1) pill_bottle (1) bath (2) bowl (1) device (20) container (16) vessel (7) car (9) bike (8) vessel, watercraft (4) bench (2) chair (2) couch (2) lounge (1) bed (4) desk (1) SLA’09 Spring Meeting Tag Clouds Explained What does a typical tag cloud look like? Definition Tag Cloud: A visual representation of social tags, organized into paragraph-style layout, usually in alphabetical order, where the relative size and weight of the font for each tag corresponds to the relative frequency of its use. Marti Hearst SLA’09 Spring Meeting Definition Tag Cloud: A visual representation of social tags, organized into paragraph-style layout, usually in alphabetical order, where relative size and weight of the font for each tag corresponds to the relative frequency the of its use. Marti Hearst SLA’09 Spring Meeting del.icio.us Marti Hearst SLA’09 Spring Meeting del.icio.us Marti Hearst SLA’09 Spring Meeting blogs Marti Hearst SLA’09 Spring Meeting I was puzzled by the questions: • What are designers and authors’ intentions in creating or using tag clouds? • How do they expect their readers to use them? Marti Hearst SLA’09 Spring Meeting On the positive side: • • Compact • You get three dimensions simultaneously! Draws the eye towards the most frequent (important?) tags Marti Hearst alphabetical order size indicating importance the tags themselves SLA’09 Spring Meeting Weirdnesses • Initial encounters unencouraging Some reports from industry: Marti Hearst Is the computer broken? Is this a ransom note? SLA’09 Spring Meeting Violates Principles of Perceptual Design • Eye moves around erratically • Longer words grab more intention • White space caused by ascenders & descenders aren’t meaningful • Proximity doesn’t hold meaning • Paragraph position has saliency effects • Should allow for visual comparisons (Tufte) Marti Hearst SLA’09 Spring Meeting Weirdnesses • Meaningful associations are lost Marti Hearst Where are the different country names in this tag clouds? SLA’09 Spring Meeting Weirdnesses Which operating systems are mentioned? Marti Hearst SLA’09 Spring Meeting Two Studies of Use in Information Analysis • Both found that the spatial organization and varying font sizes were inferior for: Marti Hearst Finding items in list Getting the gist of the tags SLA’09 Spring Meeting Interviews • I was really confused about tag clouds, so I decided to ask the people behind the puffs 15 interviews, conducted at foocamp’06 Marti Hearst Several web 2.0 leaders 5 more interviews at Google and Berkeley SLA’09 Spring Meeting A Surprise • 7 interviewees DID NOT REALIZE that alphabetical ordering is standard. • What was the answer given to “what order are tags shown in?” • 2 of these people were in charge of such sites but had had others write the code hadn’t thought about it don’t think about tag clouds that way random order ordered by semantic similarity Suggests that perhaps people are too distracted by the layout to use the alphabetical ordering Marti Hearst SLA’09 Spring Meeting Suggested main purposes: • • • To signal the presence of tags on the site • To show what kinds of information are on the site A good way to get the gist of the site An inviting and fun way to get people interacting with the site • Some of these said they are good for navigation Easy to implement Marti Hearst SLA’09 Spring Meeting Tag Clouds as Self-Descriptions • Several noted that a tag cloud showing one’s own tags can be evocative A good summary of what one is thinking and reading about Useful for self-reflection Useful for showing others one’s thoughts Marti Hearst One example: comparing someone else’s tags to own’s one to see what you have in common, and what special interests differentiate you Useful for tracking changes in friends’ lives Oh, a new girl’s name has gotten larger; he must have a new girlfriend! SLA’09 Spring Meeting Tag Clouds as showing “Trends” • Several people used this term, that tag clouds show trends in someone’s behavior Marti Hearst Trends are usually patterns across time, which are not inherently visible in tag clouds To note a trend using a tag cloud, one must remember what was there at an earlier time, and what changed tracking the girls’ names example This suggests a reason for the importance of the large tags – draws one’s attention to what is big now versus was used to be large. Suggests also why it doesn’t matter that you can’t see small tags. SLA’09 Spring Meeting New Perspective: Tag Clouds are Social! • • It’s not about the “information”! Not surprising in retrospect; tagging is in large part about the social aspect • Seems to work mainly when the tags can be seen by many Even better when items can be tagged by many and seen by many What does this mean though when tag clouds are applied to non-social information? Marti Hearst SLA’09 Spring Meeting Follow-up Study • Informed by the interview results, we search for, read, and coded web pages that mentioned tag clouds. Looked at about 140 discussions Developed 21 codes Looked at another 90 discussions Used web queries: “tag clouds”, usability tag clouds, etc Sampled every 10th url Marti Hearst 58% personal blogs 20% commercial blogs 10% commercial web pages rest from group blogs and discussion lists Doesn’t tell us what people who don’t write about SLA’09 Spring Meeting The Role of Popularity • Popularity in the sense that tag clouds (and tagging) are trendy and popular. • Some people liked the visualization, but their popularity made them less appealing Famous post: “Tag clouds are the new mullets” Led to self-consciousness about liking them Many complained about unaesthetic cloud designs Little consensus on if they are a fad or have staying power Popularity also in the sense of the large font size for more popular tags Marti Hearst Many people like the prominence of large tags, but several commented on the tyranny of the popular SLA’09 Spring Meeting The Role of Navigation • Opinions vary Marti Hearst Many simply state they are useful for navigation, but with no support for this claim Some claim the compactness makes navigation easier than a vertical list Some object to the varying font size on scannability Others object to the lack of organization Overall, there is no evidence either way that we could find in the blog community SLA’09 Spring Meeting Aesthetic Considerations • Disagreement on the aesthetic and emotional appeal, especially for lay users. • Those who like them find them fun and appealing • Those who don’t find them messy, strange, like a ransom note • Informal reports with first time users who are not in the Web 2.0 community are negative Marti Hearst SLA’09 Spring Meeting Trends again • As in the interviews, the benefit of “trends” was mentioned many times. • There is another sense of “trend” as “tendency or inclination,” and this might be what people mean. Marti Hearst SLA’09 Spring Meeting Tag Clouds as Social Information • An emphasis that tag clouds are meant to show human behavior. • We found reports of people commenting on other uses that were invalid because they did not reflect live user input: Marti Hearst One blogger noted the incongruity of an online library using keyword frequencies in a tag cloud rather than having it reflect patron’s usage of the collection. An online community noticed one site’s cloud didn’t change over time and realized the sizes were decided by marketing. This was greated SLA’09 Spring Meeting Implications • Assume tag clouds are meant to reflect human mental activity (individual or group) • Then what might seem design flaws from an information conveyance perspective may not be • A large part of the appeal is the fun and liveliness. Marti Hearst The informality of the layout reflects the human activity beneath it. SLA’09 Spring Meeting Conclusions on Tagging • Social tagging is, in my view, a terrific way to get good content metadata. • I think automated techniques can do a lot to help clean them up and organize them. • They are an inherently social phenomenon, part of social media, which is a really exciting area. • The socialness of social media can yield surprises, like tag clouds. Marti Hearst SLA’09 Spring Meeting