WIRED Week 7 • Quick review of Information Seeking • Readings Review - Questions & Comment - How does this affect IR system use? - How would this change evaluating IR systems? • Topic Discussions • Web search lab game! What Is Information Seeking? • “a process in which humans purposefully engage in order to change their state of knowledge.” p. 5 • “a process driven by human’s need for information so that they can interact with the environment.” p. 28 • “begins with recognition and acceptance of the problem and continues until the problem is resolved or abandoned” p. 49 Marchionini • more than just representation, storage and systematic retrieval Information Seeking in Context Learning Information Seeking Information Retrieval Analytical Strategy Browsing Strategy How do we search? • Analytical • • • • • careful planning recall of query terms iterative query reformulations examination of results batched • Browsing • • • • heuristic opportunistic recognizing relevant information interactive (as can be) Iseek - WebTracker study • Corporate IT and knowledge workers - In work environment - Own browser and network connection • • • • • Long-term study (weeks) Overall Web use analyzed Bookmarks, printed pages How sites/pages found Frequency of page visits Web Study Methodology • Surveys • Interviews • Web Use Data* - History Files - WebTracker - Server Logs • Bookmarks* • Printouts Study Elements - Research Design • Field Work • Field Workers - Data Collection 1. Questionnaire survey 2. WebTracker application (and Proxy Server) 3. Personal interviews Collecting Web Client Data • Modified client - • • Pitkow and Catledge 1995 Bookmarks • Chosen Web sites are personal information space • Most valuable data file on user’s system • Automatically organizing bookmarks History logs • The history mechanism • Most promising source for usage data WebTracker Expanded Window WebTracker Log User Browser Action Date and Time DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT STARTUP LINK_TO LINK_TO Button back LINK_TO Key add bkmk Key open page LINK_TO Key back LINK_TO Menu save as LINK_TO Menu print Button back LINK_TO Button reload LINK_TO SHUTDOWN URL Visited Page Title http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/test1.html http://donturn.fis.utoronto.ca/test/test1.html http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/index.html http://www8.org http://www8.org http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/printme.html http://donturn.fis.utoronto.ca/test/printme.html http://donturn.fis.utoronto.ca/test/printme.html http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/index.html http://donturn.fis.utoronto.ca/test/index.html Home Page 1 Page 1 Home Home Home WWW8 WWW8 Home Home Print Print Print Home Home Home 11/1/98 16:29:44 11/1/98 16:34:59 11/1/98 16:35:08 11/1/98 16:35:09 11/1/98 16:35:17 11/1/98 16:35:17 11/1/98 16:35:28 11/1/98 16:35:35 11/1/98 16:35:41 11/1/98 16:35:44 11/1/98 16:35:46 11/1/98 16:36:11 11/1/98 16:36:15 11/1/98 16:36:46 11/1/98 16:36:56 11/1/98 16:36:59 11/1/98 16:37:14 11/1/98 16:41:11 Data Analysis • Log files tabulated into spreadsheets • Examined for clusters or patterns of behavior • Selection of episodes of Information Seeking behavior - a highlighting of the episode by the participant during the personal interview; - evidence of the episode having consumed a relatively substantial amount of time and effort; - evidence that the episode was a recurrent activity. • Determined the modes of scanning & moves exercised by the participants Behavioral Model • Recurring Web behavioral patterns that relate people’s browser actions (Web moves) to their browsing/searching context (Web modes) • Modes of scanning: Aguilar (1967) & Weick & Daft (1983, 1984) • Moves in information seeking behavior: Ellis (1989) & Ellis et. al. (1993, 1997) Modes of Scanning Scanning Modes Undirected Viewing Conditioned Viewing Information Need Information Use General areas of interest; specific need to be revealed Serendipitous discovery Able to recognize topics of interest Increase understanding Amount of Targeted Effort Number of Sources Minimal Many “Sensing” Tactics • Scan broadly a diversity of sources, taking advantage of what’s easily accessible • “Touring” Low Few “Sensemaking” • Browse in pre-selected sources on pre-specified topics of interest • “Tracking” Informal Search Able to formulate queries Increase knowledge within narrow limits Medium Few “Learning” • Search is focused on an issue or event, but a goodenough search is satisfactory • “Satisficing” Formal Search Able to specify targets Formal use of information for planning, acting “Deciding” High Many • Systematic gathering of information on a target, following some method or procedure • “Retrieving” Modes of Scanning for Information Scanning Modes Information Need Information Seeking Information Use Undirected Viewing General areas of interest “Sweeping” “Browsing” Conditioned Viewing Able to recognize topics of interest “Discriminating” “Learning” Informal Search Able to formulate simple queries “Satisficing” “Selecting” Formal Search Able to specify targets in detail “Optimizing” “Retrieving” ISeek Behaviors & Web Moves Modes & Moves Model Undirected Viewing Conditioned Viewing Informal Search Formal Search Starting Chaining Identifying selecting starting pages, sites Following links on initial pages Browsing Browsing entry pages, headings, site maps Differentiating Monitoring Extracting Bookmarking, Revisiting printing, ‘favorite’ or copying bookmarked sites for new Going directly information to known site Bookmarking, Revisiting printing, ‘favorite’ or copying bookmarked sites for new Going directly information to known site Using (local) search engines to extract information Revisiting ‘favorite’ or bookmarked sites for new info Using search engines to extract information Behavioral Model Verification Starting Undirected Viewing Chaining Browsing Differentiating Monitoring Extracting 12 Episodes Conditioned Viewing Informal Search Formal Search • 61 identifiable episodes 18 Episodes 23 Episodes 8 Episodes Behavioral Model Results • People who use the Web engage in 4 complementary modes of information seeking • Certain browser based actions & events indicate a particular mode of information seeking • Surprises - No Explicit Instances of Monitoring to Support Formal Searching - Very Few Instances of “Push” Monitoring - Extracting Involved Basic Search Strategies Only Interview Highlights • Most useful work-related sites: 1. 2. 3. 4. Resource sites by associations & user groups News sites Company sites Search engines • Most people do not avidly search for new Web sites • Criteria to bookmark is largely based on a site providing relevant & up-to-date information • Learning about new Web sites: 1. Search engines 2. Magazines & newsletters 3. Other people/colleagues Survey Highlights • The Web was the 3rd most frequently used source • Participants spent about 20% of their work hours using the Web • Majority looked for technical information on the Web • Quality of Web information was perceived to be “very high” (reliable) • Web was perceived as accessible as other “internal” sources however less accessible than mass media sources • Few participants deliberately set out to search for new sites Study 1 Summary • Behavioral model of information seeking on the Web • People who use the Web engage in complementary modes of information seeking • Certain browser based actions & events indicate particular moves in information seeking • The study suggests: - that a behavioral framework that relates user motivations and Web moves may be helpful in analyzing Web-based Information Seeking - that multiple, complementary methods of collecting qualitative and quantitative data may help compose a richer portrayal of how individuals use Web-based information in their natural work settings Study Recommendations Web Use Mode Undirected viewing: starting and chaining Enhancing Web Use • Introduce systems that search/recommend jump sites • Encourage groups to share bookmarks, Web pages, URLs • Design portals to support undirected, serendipitous viewing Conditioned • Train users to evaluate and escalate priority or importance of info viewing: • Develop ways of sharing Web-based info via email / online forums browsing, differentiating • Provide ways of telling users about new content on Web pages monitoring Informal • Pre-select sources & search engines for quick, informal searches search: • Prepackage search strategies developed by subject matter experts differentiating • Educate users on how to evaluate info provenance and quality monitoring, extracting Formal search: extracting • Use multiple info sources for comprehensive searching • Educate users about when to use different information services • Train users on advanced search techniques Iseek Expanded Study (2) • • • • • • Larger Dataset One Organization Longer Duration Open-ended Interviews IT Survey More Quantitative Modeling - Glassman (1994); - Catledge & Pitkow (1995); - Tauscher & Greenberg (1997a, 1997b); - Huberman, Pirolli, Pitkow, & Lukose (1998) New Types Data Collection • Sources - Modified Logs - Interviews (More Focused) - Survey (Broader Focus) - Field Observation (Cube Work) • Volume - Over 1400 Consistent Users - Over a Month of Web Use - 8+ GB of data Collecting Web Server Data - Web Server Log Accuracy • • • - Proxy Server Logs • • • - Hit - a single file is requested from the Web server View - all of the information contained on a single Web page Visit - one series of views at a particular Web site. Day sampling - stop caching and analyzing data. IP sampling - cancel caching of particular Web users and measuring these results only Continuous sampling - use cookie files to track a particular user(s) KDD Survey Highlights • Users not motivated to change/update browser versions or startup page • IT made no modifications of browser until recently, primarily for system access testing • Most of most frequent users from technical departments • All IT system work now Web-specific Interview Highlights • Corporate adoption of Internet access driven by Intranet development • Local portrayals of successful Web work drove rapid adoption • Use of Intranet viewed as both resource conservation and expanded work • Logging of Web use data not a high concern • Open to recommendations to improve Web use • “Webify”ing Everything seen as good KDD Highlights • Extremely High Data Collection Reliability • Tightly-focused Web Use (business sites) • Very Small (Determinable) Inappropriate Use ( >.001%) • Lower than Expected Search Engine Use - Influenced by Startup Page - Internal Search Results Pages Used • Higher than Expected (Average) Use of Intranet KDD Use Highlights • 40,000+ episodes • 11:15 average episode length • Search term mode of 1 - Not dominantly work-related terms - Use of intranet search results influential Updated Behavioral Model Starting Undirected Viewing Conditioned Viewing Chaining Browsing Differentiating Monitoring Extracting 3079 Episodes Episodes 1924 Episodes Episodes 5170 Episodes Informal Search Formal Search • 32,512 identifiable episodes 13,992 Episodes 8347 Episodes Behaviors Breakdown Other Studies • Tend to focus on server logs, a broad range of Web users, general Web seeking activity, quantitative methods - Glassman (1994): Proxy Study - Catledge & Pitkow (1995): Surveys and Client tool; - Tauscher & Greenberg (1997a, 1997b): The Back button; - Ingwersen (1995 & 1997): Informetrics - Huberman, Pirolli, Pitkow, & Lukose (1998): Information Foraging, “Law of Surfing” - Huberman “Laws of the Web” (2001) Study 2 Summary • Behavioral Model Scales Up • Server Logs Provide Significant Gains in Quantity • Server Logs Provide Challenges in Deriving Quality • Organizations Provide Focused View of Overall Web Use • Knowledge Workers Collaborate (But Not Enough) Summary • (New) Methodology • Provide new ideas for data collection & cleaning tools • Verify models of Information Seeking and Web Use • Discover models of Web usage • Find different types of Web users • Gain rich descriptions of perception of Web & Web use • Evoke new system & interface designs Other Tools for Web Studies • Pete Pirolli, Rob Reeder, Ed Chi, et. al (UIR Group Xerox PARC) Web Logger • Eytan Adgar, Bernardo Huberman (Web Ecology Group @ PARC, now HP) • Andy Edmonds – Uzilla.net • Vividence • Web Evaluation Tool (WET) • Eye Tracking (*) Improving Web Use • • • • • Expert Systems - SNLP Multimedia Databases & Metadata Display Technology Better GUIs Better, More Available Search Engines/query Syntax - Desktop Search - Ranking - Relevance • Help expert users get more expert Web Activities Taxonomies • What types of activities on the Web have impact? • What we do vs. what seems significant • Purpose of people’s search - Find • Get a fact or document • Download information • Find out about a product - Compare/Choose: 51% • Methods used to find information - Explore, Monitor, Find, Collect: 71% • Content for which they are searching - Medical: 18%, People: 13%, … Berrypicking & IR Flexibility • IR systems are rational, users aren’t (always) • We don’t search in a linear model - Single query, one good result • We gradually build on what we know, how we find it - Footnote chasing (backward chaining) Citation searching (forward chaining) Journal run (favorite sites) Area scanning (browsing) Subject searches in bibliographies, abstracts & indices Author searching • We combine all of these when searching • Interface support for each & combinations Berrypicking Paths Web Search Studies Framework • Web IR is still relatively new - Differences in users & information - Changes in IR systems are rapid • Who doesn’t search now? • “A Web searching study focuses on isolating searching characteristics of searchers using a Web IR system via analysis of data, typically gathered from transaction logs.” p 3 • Studying Search Engine use - AltaVista, Excite • Web Searching Studies - Single & Multiple Web sites Characterizing Browsing • Modifed XMosiac to learn Web browser behavior • Path lengths key (but changed) • Types of users: - Serendipitous browsers – little repetition, short sequences - General purpose browsers – average, repeated actions - Searchers – long navigational sequences Cognitive Strategies in Web Search • Systems help with: • re-representation - different external representations, that have the same abstract structure, make problemsolving easier or more difficult. It also refers to how different strategies and representations, varying in their efficiency for solving a problem. • graphical constraining - constrain the kinds of inferences that can be made about the underlying represented concept. • temporal and spatial constraining - different representations make relevant aspects of processes and events more salient when distributed over time and space. Cognitive Strategies • Searching Conditions - Dispersed or Category Structures • • • • Fact finding Exploratory searching Novice & Experiences users Top-down, bottom-up & mixed Reading Time, Scrolling & Interaction • Can implicit feedback improve relevancy? - 561 documents, 6 subjects - Read documents & score them • Better than reading, saving & printing? - Measure use now vs. later - Focused on document, not activity • • • • • How do you know the user is reading? Is saving a relevance measure? No differences noted in scrolling (4.28) What about following links? Finding, highlighting, copying? How do we really use the Web? • • • • People don’t read, they scan Web pages We move quickly, we know we can go back Quick experimentation & short memory Behaviors that work are reinforced & continued • Satificing makes measures of quality difficult • Web pages as Billboards? • What’s billboard information for IR systems? Revisitation Patterns on WWW • • • • Mostly Re-Visits (58%) Continually Visit New Pages Access Only A Few Pages Frequently Clusters (Sets) & Short Paths of URLs - Frequency - Recency - “Distance” • Types of Navigation - Hub and Spoke - Depth Searching (lots of links before returning, if at all) - Guided Tour (Tasks) Revisitation Patterns 2 • Back Button Use Affects Everything (Even More Since Study) • Navigation Methods Differ • Reasons for Revisiting - Explore Further - Use Feature (Search or Home Page) - “On the Way” to another Page (IA Problem) • Users Don’t Understand Browser History Very Well or Do They Misunderstand Page/Site Navigation? • Provide Navigation Support • Work with the Back Button – Don’t Break its Functionality Web search lab game • Break into groups • Answer a set of questions • Different rules for each search 1. 2. 3. 4. 5. Search as you would Talk & decide before each move No typing this time! Search as you would again Fast as possible