SIMS 202 Information Organization and Retrieval Prof. Marti Hearst and Prof. Ray Larson UC Berkeley SIMS Tues/Thurs 9:30-11:00am Fall 2000 Today Modern IR textbook topics The Information Seeking Process Textbook Topics More Detailed View What We’ll Cover A Lot A Little Search and Retrieval Outline of Part I of SIMS 202 The Search Process Information Retrieval Models Content Analysis/Zipf Distributions Evaluation of IR Systems – Precision/Recall – Relevance – User Studies System and Implementation Issues Web-Specific Issues User Interface Issues Special Kinds of Search What is an Information Need? The Standard Retrieval Interaction Model Standard Model Assumptions: – Maximizing precision and recall simultaneously – The information need remains static – The value is in the resulting document set Problem with Standard Model: Users learn during the search process: – Scanning titles of retrieved documents – Reading retrieved documents – Viewing lists of related topics/thesaurus terms – Navigating hyperlinks Some users don’t like long disorganized lists of documents Search is an Iterative Process Repositories Goals Workspace “Berry-Picking” as an Information Seeking Strategy (Bates 90) Standard IR model – assumes the information need remains the same throughout the search process Berry-picking model – interesting information is scattered like berries among bushes – the query is continually shifting A sketch of a searcher… “moving through many actions towards a general goal of satisfactory completion of research related to an information need.” (after Bates 89) Q2 Q4 Q3 Q1 Q0 Q5 Berry-picking model (cont.) The query is continually shifting New information may yield new ideas and new directions The information need – is not satisfied by a single, final retrieved set – is satisfied by a series of selections and bits of information found along the way. Information Seeking Behavior Two parts of a process: » search and retrieval » analysis and synthesis of search results This is a fuzzy area; we will look at several different working theories. Search Tactics and Strategies Search Tactics – Bates 79 Search Strategies – Bates 89 – O’Day and Jeffries 93 Tactics vs. Strategies Tactic: short term goals and maneuvers – operators, actions Strategy: overall planning – link a sequence of operators together to achieve some end Information Search Tactics (after Bates 79) Monitoring tactics – keep search on track Source-level tactics – navigate to and within sources Term and Search Formulation tactics – designing search formulation – selection and revision of specific terms within search formulation Term Tactics Move around the thesaurus – superordinate, subordinate, coordinate – neighbor (semantic or alphabetic) – trace -- pull out terms from information already seen as part of search (titles, etc) – morphological and other spelling variants – antonyms (contrary) Source-level Tactics “Bibble”: – look for a pre-defined result set – e.g., a good link page on web Survey: – look ahead, review available options – e.g., don’t simply use the first term or first source that comes to mind Cut: – eliminate large proportion of search domain – e.g., search on rarest term first Source-level Tactics (cont.) Stretch – use source in unintended way – e.g., use patents to find addresses Scaffold – take an indirect route to goal – e.g., when looking for references to obscure poet, look up contemporaries Cleave – binary search in an ordered file Monitoring Tactics (strategy-level) Check – compare original goal with current state Weigh – make a cost/benefit analysis of current or anticipated actions Pattern – recognize common strategies Correct Errors Record – keep track of (incomplete) paths Additional Considerations (Bates 79) Add a Sort tactic! More detail is needed about short-term cost/benefit decision rule strategies When to stop? – How to judge when enough information has been gathered? – How to decide when to give up an unsuccesful search? – When to stop searching in one source and move to another? Lexis-Nexis Interface What tactics did you use? What strategies did you use? Implications Interfaces should make it easy to store intermediate results Interfaces should make it easy to follow trails with unanticipated results Makes evaluation more difficult. Orienteering (O’Day & Jeffries 93) Interconnected but diverse searches on a single, problem-based theme Focus on information delivery rather than search performance Classifications resulting from an extended observational study: – 15 clients of professional intermediaries – financial analyst, venture capitalist, product marketing engineer, statistician, etc. Orienteering (O’Day & Jeffries 93) Identified three main search types: – Monitoring – Following a plan – Exploratory A series of interconnected but diverse searches on one problem-based theme – Changes in direction caused by “triggers” Each stage followed by reading, assimilation, and analysis of resulting material. Orienteering (O’Day & Jeffries 93) Defined three main search types – monitoring » a well-known topic over time » e.g., research four competitors every quarter – following a plan » a typical approach to the task at hand » e.g., improve business process X – exploratory » explore topic in an undirected fashion » get to know an unfamiliar industry Orienteering (O’Day & Jeffries 93) Trends: – A series of interconnected but diverse searches on one problem-based theme – This happened in all three search modes – Each analyst did at least two search types Each stage followed by reading, assimilation, and analysis of resulting material Orienteering (O’Day & Jeffries 93) *Searches tended to trigger new directions – Overview, then detail, repeat – Information need shifted between search requests – Context of problem and previous searches were carried to next stage of search *The value was contained in the accumulation of search results, not the final result set – *These observations verified Bates’ predictions. Orienteering (O’Day & Jeffries 93) Triggers: motivation to switch from one strategy to another – next logical step in a plan – encountering something interesting – explaining change – finding missing pieces Stop Conditions (O’Day & Jeffries 93) Stopping conditions not as clear as for triggers People stopped searching when – no more compelling triggers – finished an appropriate amount of searching for the task – specific inhibiting factor » e.g., learning market was too small – lack of increasing returns » 80/20 rule Missing information/inferences ok – business world different than scholarship After the Search: Analyzing and Synthesizing Search Results Orienteering Post-Search Behaviors: – Read and Annotate – Analyze: 80% fell into six main types Post-Search Analysis Types (O’Day & Jeffries 93) Trends Comparisons Aggregation and Scaling Identifying a Critical Subset Assessing Interpreting The rest: » cross-reference » summarize » find evocative visualizations » miscellaneous SenseMaking (Russell et al. 93) The process of encoding retrieved information to answer task-specific questions Combine – internal cognitive resources – external retrieved resources Create a good representation – an iterative process – contend with a cost/benefit tradoff Sensemaking (Russell et al. 93) Most of the effort is in the synthesis of a good representation – covers the data – increase usability – decrease cost-of-use Summary The information access process – Berry picking/orienteering offer an alternative to the standard IR model – More difficult to assess results – Interactive search behavior can be analyzed in terms of tactics and strategies Sensemaking: – Combining searching with the use of the results of search. Next Time IR Systems Overview Query Languages – Boolean Model – Boolean Queries