EVALUATION in searching IR systems Digital libraries Reference sources Web sources © Tefko Saracevic, Rutgers University 1 Importance of evaluation Integral part of searching always there - wanted or not informal or formal Growing problem for all information explosion makes finding “good” stuff very difficult Formal evaluation part of professional job & skills requires knowledge of evaluation criteria, measures, methods more & more prized © Tefko Saracevic, Rutgers University 2 Information systems Considered here: information retrieval (IR) systems, e.g. Dialog, Nexis … sources included in digital libraries, e.g. Rutgers reference services e.g. in libraries or commercial on the Web Web sources e.g. as found in many domain sites Many approaches, criteria, measures, methods are similar & can be adapted for specific source or information system © Tefko Saracevic, Rutgers University 3 Broad context Evaluating the role that an information system plays as related to: SOCIETY - community, culture, discipline ... INSTITUTION - university, organization, company ... INDIVIDUALS - users & potential users (nonusers) Roles lead to broad, but hard questions as to what context to choose for evaluation © Tefko Saracevic, Rutgers University 4 Context (cont.) Social: how well does an information system support social demands & roles? • hardest to evaluate Institutional: how well does it support institutional/organizational mission & objectives? • tied to objectives of institution • also hard to evaluate Individual: how well does it support inf. needs & activities of people? • most evaluations in this context © Tefko Saracevic, Rutgers University 5 Approaches to evaluation Many approaches exist quantitative, qualitative … effectiveness, efficiency ... each has strong & weak points Systems approach prevalent Effectiveness: How well does a system perform that for which it was designed? Evaluation related to objective(s) Requires choices: • Which objective, function to evaluate? © Tefko Saracevic, Rutgers University 6 Approaches … (cont.) Economics approach: Efficiency: at what costs? Cost-effectiveness: cost for a given level of effectiveness Ethnographic approach practices, effects within an organization, community learning & using practices & comparisons © Tefko Saracevic, Rutgers University 7 Basic requirements for evaluation Once a context is selected need to specify all five: 1. Construct A system, process, source • e.g. a given IR function or system; a Web site, a Dlib source 2. Criteria - to reflect objective(s) e.g. relevance, utility, satisfaction, accuracy, completeness, time, costs 3. Measure(s) - to reflect criteria precision, recall, various Likert scales, $$$, ... © Tefko Saracevic, Rutgers University 8 Requirements … (cont.) 4. Measuring instrument judgments by users on relevance or on a scale; cost/function 5. Methodology - procedures for collecting & analyzing data No evaluation can proceed if not ALL of these are specified! Sometimes specification on some are informal & implied, but they are always there. © Tefko Saracevic, Rutgers University 9 In IR: 1. Construct In research: most done on test collections & test questions Text Retrieval Conference - TREC • evaluation of algorithms, interactions • reported in research literature In practice: on use & user level: mostly done on operational collections & systems e.g. Dialog, Nexis, various files • evaluation, comparison of various procedures, commands, contents • user proficiencies, characteristics • evaluation of interactions • reported in professional literature © Tefko Saracevic, Rutgers University 10 2. Criteria Relevance basic & most used criterion strengths, weaknesses Relevance as area of study basic notion in inf. science User & use level: many other utility, satisfaction, success, time, value, impact ... Market evaluations: those + quality, fitness-of-use, penetration ... © Tefko Saracevic, Rutgers University 11 3. Measures Precision & recall preferred Problem with recall: how? use of methodological “tricks” some think as metaphysical Use & user level Likert scales, differentials for many criteria • e.g. satisfaction on a scale of 1 to x (1=not satisfied, x=satisfied) observational measures • e.g. overlap, consistency © Tefko Saracevic, Rutgers University 12 4.Instruments People used as instruments they judge relevance, scale ... But people who? users, surrogates, analysts, domain experts, librarians ... How do relevance, utility ... judges effect results? who knows? Reliability of judgments: about 50 - 60% for experts © Tefko Saracevic, Rutgers University 13 5. Methods Includes design, procedures for observations, experiments, analysis of results Challenges: Validity? Reliability? Reality? • • • • Collection - selection? size? Request - generation? Searching - conduct? Results - obtaining? judging? feedback? • Analysis - conduct? tools? • Interpretation - warranted? generalizable? © Tefko Saracevic, Rutgers University 14 Criteria for evaluation of information sources Includes Dlib & Web sources What? Content: Subject? Topic? Level? Depth? Exhaustively? Specificity? Organization? Timeliness of content? Up-todate? Revisions? Accuracy? Why? Purpose? Scope? Viewpoint? For? Intended audience? What need satisfied? Appropriateness? © Tefko Saracevic, Rutgers University 15 Criteria ... Who done it? Author(s), institution, company, publisher, creator: • Authority? Reputation? Credibility? Persistency? Trustworthiness? • Refereeing? Transparency? How? Content treatment: • Readability? Style? Organization? Clarity? Physical treatment: • Format? Layout? Legibility? Visualization? Where? Availability? Accessibility? © Tefko Saracevic, Rutgers University 16 Criteria ... How? Searching, navigation, browsing? Feedback? Links? Output: Organization? Features? Variations? Control? Effort? Learning factors? How much? Price? Total costs? Costbenefits? In comparison to? Other similar sources? © Tefko Saracevic, Rutgers University 17 Conclusions Evaluation is a complex task but also an essential part of being an information professional Traditional approaches & criteria still apply but new ones added or adapted to satisfy new sources, & new methods of access & use Evaluation skills in growing demand particularly because Web is value neutral Great professional skill to sell! © Tefko Saracevic, Rutgers University 18