Evaluation of library and information services (LIS): an overview Contexts Approaches Levels Requirements Measures © Tefko Saracevic, Rutgers University Tefko Saracevic, Rutgers University 1 Why evaluate ? Importance of evaluation of LIS increasing, because: • • • • Social importance of information changing Transition from “just-in-case” to “just-in-time” model of service - stress on access Increased competition - many new players competing for resources Growth of electronic inf. resources & networks Demands for justification growing by funders in practice & research © Tefko Saracevic, Rutgers University 2 Broad context Role that LIS play related to: SOCIETY - community, culture, discipline ... INSTITUTIONS- universities, organizations, companies ... INDIVIDUALS - users & potential users (nonusers) Roles lead to broad, but hard questions as to what context to choose for evaluation Each context demands different criteria, measures, methodologies © Tefko Saracevic, Rutgers University 3 Context questions Social: • how well do LIS support inf. demands, needs & roles of society, community? – Institutional: • how well do LIS support institutional/organizational mission & objectives? – – hardest to evaluate tied to objectives of institution also hard to evaluate Individual: • how well do LIS support inf. needs & activities of people? – most evaluations in this context © Tefko Saracevic, Rutgers University 4 Approaches to evaluation Many approaches exist • • • quantitative, qualitative … effectiveness, efficiency ... each has strong & weak points Systems approach prevalent • • • Effectiveness: How well does a system perform that for which it was designed? Evaluation related to objective(s) Requires choices: – Which objective, function to evaluate? © Tefko Saracevic, Rutgers University 5 Approaches (cont) Economics approach: • • Efficiency: at what costs? Cost-effectiveness: cost for a given level of effectiveness Ethnographic approach • • practices, effects within an organization, community learning & using practices & comparisons © Tefko Saracevic, Rutgers University 6 Approaches ... Distinction between: Effectiveness: • how well does a LIS achieve that for which it was designed? – Efficiency: • what are the costs in performing a LIS? – relates to objectives relates to $$$, time, effort … Cost effectiveness: • what are the costs for a given level of effectiveness – relates both effectiveness & efficiency © Tefko Saracevic, Rutgers University 7 Levels of evaluation System- centered: 1. Engineering: hardware & software; reliability, errors 2. Input: contents, coverage 3. Processing: procedures, techniques, algorithms User- centered: 4. Output: search, interaction 5. Use & user: application to tasks; market; fitness-of-use 6. Social: effect on research, productivity, organization... Danger: isolation of levels © Tefko Saracevic, Rutgers University 8 Requirements for evaluation Once a context is selected need to specify all five: 1. Construct • A system, process, source – e.g. a given IR function or system; a Web site, a Dlib source 2. Criteria - to reflect objective(s) • e.g. relevance, utility, satisfaction, accuracy, completeness, time, costs 3. Measure(s) - to reflect criteria • precision, recall, various Likert scales, $$$, ... © Tefko Saracevic, Rutgers University 9 Requirements … (cont.) 4. Measuring instrument judgments by users on relevance or on a scale; cost/function 5. Methodology - procedures for collecting & analyzing data No evaluation can proceed if not ALL of these are specified! Sometimes specification on some are informal & implied, but they are always there. © Tefko Saracevic, Rutgers University 10 LIS functions When evaluating we have to consider processes/functions • Each function: different evaluation approaches Major LIS functions: • • • • AVAILABILITY --acquisition of inf. materials & resources; holdings ORGANIZATION -- intellectual, physical ACCESS -- physical & intellectual – searching, retrieval OUTPUTS -- dissemination, use © Tefko Saracevic, Rutgers University 11 Availability Social: how good coverage? • field; problem area; community Criteria: representative, depth, breadth, up-to-date ... Measures: degree, duplication Method: compare, survey Institutional: how well inf. resources satisfy mission, needs, plans ... ? • education, research, work ... Criteria: matching, attributes Method: survey, functional comparison, e.g. curriculum © Tefko Saracevic, Rutgers University 12 Availability (cont.) Individual: how well users served, satisfied ? Criteria: awareness, expectations, satisfaction, success & failure rate Measures: scales, branching diagrams (success or failure at each point of user action) Methods: surveys, counting & statistical analyses, probability of success • e.g. requests made/fulfilled © Tefko Saracevic, Rutgers University 13 Organization Processing level: How well is a collection/data base represented, organized? Criteria: depth, breadth, type, relevance, quality, errors, time, effort, costs ... Measures: degree, precision, recall, quality benchmarks (standards), error rate, time/process, $$$... Methods: comparative processing, user or expert evaluation, quality analyses, economic analyses © Tefko Saracevic, Rutgers University 14 Access Individual: How well did users interact with a service? About users’ reactions to interaction with system Criteria: accessibility, effort, convenience, facilities (ease, adequacy), staff (helpfulness efficiency), frustration, errors, difficulties ... Measures: scales, indicators Methods: surveys, interviews, observations, experiments, transaction log analysis © Tefko Saracevic, Rutgers University 15 Access: searching, retrieval Individual: how well did users retrieve relevant answers? Related to user needs, tasks • Criterion: relevance • A few others proposed, e.g. satisfaction Measures: recall, precision • But often concentrated on system algorithms, H-C interactions etc Other: overlap, consistency, Likert scales Methods: labs (TREC), observation, © Tefko Saracevic, Rutgers University 16 Dissemination & use Individual: How did users perceive results of use? Related to users’ tasks Criteria: cognitive (learning ...), affective (satisfaction...), accomplishment (task), expectations (getting ...), time (saving, worth ...), money (cost value ...) Measures: scales, numbers Methods: survey, interviews, critical incidence, impact estimate © Tefko Saracevic, Rutgers University 17 Operational & quality criteria (Say, Seaman & Cohen) Reliability - delivery of a LIS accurately & dependably • correct answers, relevant • consistency Responsiveness - readiness to provide service • minimizing turnaround, time • callbacks Assurance - knowledge, ability, courtesy of staff • understanding of collection, technology • providing individual attention © Tefko Saracevic, Rutgers University 18 Quality criteria (cont.) Access - sufficiency in staff, equipment, hours of operation • waiting time • access policies; location Communication - informing & listening; language adjustment • question negotiation • teaching users; instructing Security - freedom from danger, risk or doubt • safety; confidentiality Tangibles - physical facilities • • building etc. condition; layouts equipment condition © Tefko Saracevic, Rutgers University 19 Branching method Reasons for satisfying (or not satisfying) a known item request : success & failure analysis Total requests (T) Circulation (C) Not acquired Library function (L) In circulation User function (U) Satisfied requests (S) Library malfunction User malfunction Satisfaction rate (percentage) = S/T © Tefko Saracevic, Rutgers University 20 Branching ... Example from a study of requests for specific books from an academic library T = 437 C = 399 Not acq.=38 L = 347 In circul.= 52 U = 299 S = 245 Libr. malf. = 48 User malf. = 54 © Tefko Saracevic, Rutgers University 21 Branching ... Calculation of perf. rates: Satisfaction rate = 245/437 = .56 = 56% Acquisition performance =399/437=91% i.e. library had 91 % of requested books Circulation perf. = 347/399 = 87% 13% of acquired books were in circulation Library perf. = 299/347 = 86% 14% of books not in circulation were not found because some library malfunction User performance = 245/299 = 82% 18% of books that were on the shelf were not found by users because of their error Satisfaction rate (by probabilities)= .91 (A) x .87 (C) x .86 (L) x .82 (U) = .56 or 56% © Tefko Saracevic, Rutgers University 22 Conclusions In practice need & importance of evaluation increasing In research an ever present need • new systems, approaches Essential for improvements, decisions, resource allocation But evaluation requires: • • • • commitment by management & staff; hard work financial & human resources knowledge how to do it continuous, not one-shot effort If we do not evaluate others will © Tefko Saracevic, Rutgers University 23