Organizing Search Results Susan Dumais Microsoft Research Sackler – May 11, 2003 Organizing Search Results Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Main goal to support search Also information analysis and discovery Example applications SWISH, results classification GridViz, results summarization SIS, personal landmarks for context Sackler – May 11, 2003 Searching with Information Structured Hierarchically (SWISH) Collaborators Edward Cutrell, Hao Chen (Berkeley) Key Themes Going beyond long lists of results Classification algorithms UI techniques More about it http://research.microsoft.com /~sdumais Sackler – May 11, 2003 Organizing Search Results Query: “jaguar” List Organization => Shopping => Automotive => Computers => Automotive Sackler – May 11, 2003 SWISH Category Organization Web Directory LookSmart Directory Structure ~400k pages; 17k categories; 7 levels 13 top-level categories; 150 second-level categories Top-level Categories Automotive Business & Finance Computers & Internet Entertainment & Media Health & Fitness Hobbies & Interests Home & Family People & Chat Reference & Education Shopping & Services Society & Politics Sports & Recreation Travel & Vacations Sackler – May 11, 2003 Buy or Sell a Car Chat Finance & Insurance Magazines & Books Maintenance & Repair Makes, Models & Clubs Motorcycles New Car Showrooms Off-Road, 4X4 & RVs Other Auto Interests Shows & Museums Trucks & Tractors Vintage & Classic SWISH System Combines the advantages of Directories - Manually crafted structure but small <~3 million pages> Search engines - Broad coverage but limited metadata <~3 billion pages> Project search engine results to category structure Two main components Text classification models UI for integrating search results and structure Sackler – May 11, 2003 Context (category structure) plus focus (search results) SWISH Architecture Train (offline) manually classified web pages Sackler – May 11, 2003 Classify (online) SVM model web search results local search results ... Learning & Classification Support Vector Machine (SVM) Accurate and efficient for text classification (Dumais et al., Joachims) Model = weighted vector of words “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ... Hierarchical models for LS directory 1 model for top level; N models for second Very useful in conjunction w/ user interaction Sackler – May 11, 2003 User Interface Experiments List Organization Sackler – May 11, 2003 Category Organization No Cat Browse Hover Inline + Cat Hover Inline Names Names Group Interface Sackler – May 11, 2003 List Interface Effect of Query Difficulty Easy queries are faster (p<0.01) H A R D E A S Y Group Sackler – May 11, 2003 H A R D E A S Y Group faster than List (p<0.01) Benefit is larger for hard queries (p<0.06) List SWISH: Summary and Design Implications Text Classification Learn accurate category models Classify new web pages onthe-fly Organize search results User Interface Tightly couple search results with category structure User manipulation of presentation of category structure Sackler – May 11, 2003 GridViz Collaborators George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech) Key Themes Abstract beyond individual results Highly interactive interface to support understanding of trends and relationships More about it http://research.microsoft.com/~sdumais Sackler – May 11, 2003 GridViz Summarize the results of a search Grid-based design Axes represent topic, time, people Cells encode frequency, recency Supports activities like: What newsgroups are active (on topic x)? What people are active, authoritative (on topic x)? When did I last interact w/ people? Sackler – May 11, 2003 GridViz Demo Sackler – May 11, 2003 User Interface Experiments List View GridViz 40 35 30 25 20 15 10 5 0 Sackler – May 11, 2003 GridViz Summary Abstracting beyond individual results Highly interactive interface Grid-based design Axes represent people, topic, time Cells encode frequency, recency Preliminary but promising Sackler – May 11, 2003 Stuff I’ve Seen (SIS) Collaborators Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford) Key Themes Your content Information re-use Integration across sources More about it … internal for now Sackler – May 11, 2003 Search Today … Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, intranet) Often slow Sackler – May 11, 2003 Search with SIS Unified index of stuff you’ve seen Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc. Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) Automatic and immediate update of index Rich UI possibilities, since it’s your content Architecture Client side indexing and storage Built using MS Search components Sackler – May 11, 2003 SIS Demo Sackler – May 11, 2003 SIS Alpha Observations 800+ internal users Usage logs (incl different interfaces), survey data File types opened 76% Email 14% Web pages 10% Files Age of items accessed 7% today 22% within the last week 46% within the last month Sackler – May 11, 2003 120 100 Frequency Item Access Distribution 80 60 40 20 0 0 500 1000 1500 2000 Days Since Item First Seen 2500 SIS Alpha Observations Use of other search tools Importance of people Non-SIS search for web, email, and files decreases 25% of the queries involve people’s names Importance of time Date by far the most popular sort field, followed by rank, author, title Sackler – May 11, 2003 Even when rank is the default 6 5 Pre-usage Post-usage 4 3 2 1 0 Files Email Web Pages SIS UI Innovations Timeline w/ Landmarks Importance of time Timeline interface Contextualize results using important landmarks as pointers into human memory General: holidays, world events Personal: important photos, appointments Sackler – May 11, 2003 Milestones in Time Demo Sackler – May 11, 2003 Milestones in Timeline 30 Search Time (s) 25 20 15 10 5 0 Landmarks + Dates Sackler – May 11, 2003 Dates Only SIS Summary Unified index of stuff you’ve seen Fast access to full-text and metadata, from heterogeneous sources Automatic and immediate update of index Rich UI possibilities Next steps Better support for tagging -> “flatland” Implicit queries for finding related info, and identifying “Stuff I Should See” Integration with richer activity-based info, Eve Sackler – May 11, 2003 Organizinging Search Results Algorithms and interfaces to improve search Examples and key themes Important attributes: People, topics, time Interaction Evaluation More information SWISH … grouping GridViz … abstraction SIS … personal content and landmarks Also Use structure and context http://research.microsoft.com/~sdumais sdumais@microsoft.com Christopher Lee of (SIG)IR … http://www.cdvp.dcu.ie/SIGIR/index.html Sackler – May 11, 2003