Understanding Web Searching Secondary Readings and So On… Will Meurer for WIRED October 7, 2004 Introduction • Why do we care about how people use the Web? • Today’s topics (10/7, not the present age): – – – – – – – Implicit vs. explicit feedback Representation effectiveness Browser-based activities History mechanisms How do we cater to the people? Resources Research Implicit vs. Explicit Feedback Reading Time, Scrolling and… (Kelly & Belkin, 2001) • Implicit feedback (Morita & Shinoda): – Time spent on a page is directly related to user interest. Backed by many studies. • Explicit feedback (this study) – Time spent on a page is similar for relevant and irrelevant content. • Results suggest: – “Generalizability” is severely affected by explicit feedback methods. – Spend time to choose the right feedback type! Implicit vs. Explicit Feedback Reading Time, Scrolling and… (Kelly & Belkin, 2001) • Why do the results differ? – Relevance was difficult to distinguish this time – Participants are truly interested in the content former studies – Users may have rushed to complete in this experimental context Representation Effectiveness How we really use the Web (Krug, 2000) Three “facts of life”: 1. “We don’t read pages. We scan them.” – Why? hurry, necessity, habit – If we are to read its entirety, we save or print! (ClearType project) Representation Effectiveness How we really use the Web (Krug, 2000) 2. “We don’t make optimal choices. We Satisfice.” – Why? hurry, quick access to and fro, less work than thinking – Generally, it’s more productive to guess. Representation Effectiveness How we really use the Web (Krug, 2000) 3. “We don’t figure out how things work.” – Why? not important, “if it ain’t broke (baroque)…” – Is it important to us whether the user understands how it works or not? Why? Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) • Users get lost on the Web. Why? • It is not just interactivity between user and system, rather user, task, and information • Analysis structure of browsing behavior presented and tested “The Interactivity Framework” or “How we should analyze cognitive strategies” Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) • The Interactivity Framework – User Level – Web experience, cognitive processes, cognitive style, knowledge (CS majors knew more about SE processes) – User Strategies – based on searching structure (or lack of), task nature SEARCHING CONDITIONS FACT FINDING EXPLORATORY DISPERSED STRUCTURE • • Look for data base algorithm in Java Look for criteria for the diagnosis of diseases • Find all the available jobs for profession CATEGORY STRUCTURE • Look for word definition • Find all information about 1997 Nobel Prize for Literature Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) – Information Structure • Internal (user’s) representation • External (system’s) representation • Computational Offloading – How much work does the user have to do to understand and how much does a representation help? – Re-representation – How much it makes problem solving easier or more difficult – Graphical Constraining – How it constrains inferences – Temporal and Spatial Constraining – How it helps when distributed over time and space Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) SEARCHING TASK EXPERIENCED WEB-PARTICIPANTS INFORMATION IN WEB DISPERSED STRUCTURE SPECIFIC FACT FINDING: • Bottom-up • Mixed strategy at the beginning and selecting Bottom-up (e.g. find criteria for a psychological disease) NOVICE WEBPARTICIPANTS • • Start with top-down and change at the end to bottom-up Start typing without knowing why EXPLORATORY: • Top-down INFORMATION IN WEB CATEGORY STRUCTURE (e.g. find a job opening) • • Mixed strategy at the beginning and then selecting top-down Top-down • • Top-down following browser categories Start with bottom-up and change to top-down Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) • More Results – Experienced users searched with a plan – By having a plan you keep a more internal representation and focus your search – Inexperienced users were more influenced by external representations – Computational Offloading Results • Must explain – How have these issues changed? Representation Effectiveness Cognitive Strategies in Web… (Navarro-Prieto, et al, 1999) • Conclusions – Cognitive strategies used by the participants depend on how the information is structured. – Interaction is a multi-dimensioned concept. – Search engine interfaces should be designed to have less restrictive external representation. Browser-based Activities Characterizing Browsing… (Catledge & Pitkow, 1995) • User study of browsing events at the Georgia Tech (xMosaic browser) • Three main browsing strategies identified: – Search browsing – directed search, goal known – General purpose browsing – consulting highly likely sources for needed information (dictionary.com) – Serendipitous browsing – random – Most people use a combination of these Browser-based Activities Characterizing Browsing… (Catledge & Pitkow, 1995) • Results – Users were patient 99% of the time for long page loads – 1222 unique sites accessed outside of GATech (~16% of Web servers) – Paths were calculated (sequences of page navigation) • Per session, paths of 7 different sites occurred 5 times • Per user, paths of 8 different sites occurred 9 times Browser-based Activities Characterizing Browsing… (Catledge & Pitkow, 1995) • More Results – 2% of the retrieved pages were saved or printed – Based on user’s slope, browsing strategy categories were applied – Slope can also categorize usage patterns of Web documents – Users tended to operate in one small area of a site Browser-based Activities Characterizing Browsing… (Catledge & Pitkow, 1995) • Design Strategies – Users averaged 10 pages per server • Make most important info within 2 or 3 jumps from the index • Do not put too many links on one page – increases search time (back, forward, back, site map, etc.) – Facilitate the likely visitor browser patterns • Maybe make more than one version of your page? • Most work well in a “hub and spoke” environment • The Future – Offer site tour based on most frequently traveled paths – Alter page design dynamically based on site trends History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Purpose: Provide empirical data to aid in the development of effective history mechanisms – Understand revisitation patterns – Evaluate current mechanisms and suggest best practices and methods • Data Collection – Altered version of xMosaic to record activity – Survey of users afterward History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Revisitation Results – 58% recurrence rate (>40% are new pages!) – As people search they build their vocabulary – 7 browsing strategies • • • • • • • First-time visits to cluster of pages Revisits to pages Authoring of pages (high reload percentage) Regular use of web-based apps Hub-and-spoke (breadth-first approach) Guided tour (e.g. next page links) Depth-first search (following links deeply before returning to the index) History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Revisitation Results – Visit frequency as a function of distance • Users mostly revisit recently visited pages (within about 6 jumps) • 39% chance that the next URL will match one of the previous 6 pages visited – Access frequency • • • • 60% of pages visited only once 19% visited twice 8% visited 3 times 4% visited 4 times – Locality (not valuable for predicting next page) • Most locality sets were small • Only 2.5 to 4.5 URLs per set • Only 15% of pages were part of a locality set – Paths (not valuable for predicting next page) • Could these be captured and offered in a history mechanism? • Time per page could indicate path History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Mechanism types – Recency Ordered • • • • Sequential order based on time accessed Repeated entries for revisitation “Pruned” by keeping only first instance or only last Simple for users to understand (they remember paths) – Frequency Ordered • • • • • Most visited at top, least visited at bottom User interest changes, latest URLs must have frequency How to break ties – last visited, earliest visited When few items are on the list, this suffers Difficult for users to understand History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Stack-based – Recently visited at top – Order and availability depend on: • Loading – causes page to be added to the top • Recalling – changes pointer to the currently displayed page • Revisiting – user reloads the page, has no effect on the stack – – – – Keeps duplicates Non-persistent vs. persistent (btw sessions) Better than recency at short distances Users have difficulty understanding this model History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Hierarchically Structured – Recency ordered hyperlink sublists • • • • Like recency w/ latest position saved Each URL has its own sublist of links from that page Helps with common linking paths Easier to understand – Context-sensitive web subspace • Somewhat of a combination of the above-mentioned and stack-based approaches • Gives user better understanding of context of his/her searches • May be difficult to remember where a certain URL was • I THINK this approach would be a great tool History Mechanisms (in browsers) Revisitation Patterns in… (Tauscher & Greenberg, 1997) • Do users actually use history mechanisms? – Less than 1% of navigation – 3% involve favorites – 30% of navigation was back button usage How do we cater to the people? • Inter-site browsing strategies are not easy to tackle. How would you control that? • Why should we attempt to understand user behavior and search strategies? – Formulate general design principles (e.g. 3 level depth) – Design for multiple searching personalities – Understand how to survey your intended users or get feedback most appropriately – Identify importance of all aspects of the development process and allocate resources accordingly How do we cater to the people? Some Bright Ideas • Personalized search – Learning systems – You might also like… – www.a9.com (history, favorites, personalized interface) – But what about changing for different types of user behavior based on the user’s path history on your server? • Researched since 1995 and earlier! • What has resulted? • Microsoft ASP.net 2.0 – Web Parts What resources are out there? • xMosaic 2.6 download, for those of you so excited • Architecture of the World Wide Web http://www.w3.org/TR/webarch/ • Sum Sun Sug Gestions http://www.sun.com/980713/webwriting/ • Jakob Nielsen – research on content usability, http://useit.com/alertbox/9710a.html Research • Vox Populi: The Public Searching Of The Web (2001) – Compares statistics from two studies – Shows how public searching changed from 1997 to 1999 • Usage Patterns of a Web-Based Library Catalog (2001), Michael D. Cooper • Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web (2000), Jansen, Spink & Saracevic • Redefining the Browser History in Hypertext Terms (), Mark Ollerenshaw