Measuring system performance The library A system view Environment Inputs energy money materials personnel information Transformational process Outputs products services U s e r s System performance measures recall precision relevance Robert Taylor's four levels of question formation Q1 The actual but unexpressed need for information (the visceral need) Q2 The conscious, within-brain description of the need (the conscious need) Q3 The formal statement of the need (the formalized need) Q4 The question as presented to the information system (the compromised need) Taylor, Robert S. 1968. Question-negotiation and information seeking in libraries. College & Research Libraries 29(3): 178-194 (May 1968). System-defined relevance "My feet are killing me." find health AND feet The health of the lumber 90% industry in terms of cubic feet of lumber produced Information retrieval process Question formulation Relevancy determination System: Which documents are relevant to the query? User: Are these documents relevant to my needs? Defining relevance System-defined User-defined vs. relevance relevance Objective Often topical. Does it match the query? Subjective. Situational. Is it useful? User-defined relevance "My feet are killing me." The effect of lysergic acid diethylamide ingestion on toenail fungus in cloned mice Soothing remedies for aching feet Controlling the body by controlling the mind-meditative techniques for dealing with pain Determining topical relevance • Analyze work as to what it is about • Assign to the document one or more terms from a finite list of topics • Users can then search on those topic indicators Recall No. of relevant documents retrieved Recall = Total no. of relevant documents in the file Precision No. of relevant documents retrieved Precision = Total no. of documents retrieved from the file Precision vs. Recall An inverse relationship As the level of recall rises the level of precision generally declines and vice versa. The Cranfield experiments (1957 & 1962) Cyril Cleverdon, p.i. Precision vs. Recall Subject: sexual dimorphism Word stemming: sex sexy sexes sexier Recall Precision sexual sexiest Field-specific searches: DE,TI/sexual()dimorphism Recall Precision User-defined relevance "Relevance appears to be a subjective quality, unique between the individual and a given document supporting the assumption that relevance can only be judged by the information user." Miranda Pao Years later "My feet are still killing me." The effect of lysergic acid diethylamide ingestion on toenail fungus in cloned mice Soothing remedies for aching feet Controlling the body by controlling the mind-meditative techniques for dealing with pain Factors affecting relevance (1) • Purpose of the information • Situation of the user • Level at which the information source is written – Journal of the Amer. Med. Assn. – Healthy times Factors affecting relevance (2) • Subject knowledge of the user – Is the data new to the user? – Does the information relate to the user's prior knowledge? • Values - ethical, social, philosophical, political, religious, legal User-defined relevance Subjectivity and fluidity make it difficult to use as measuring tool for system performance Incorporating user-defined relevance into information retrieval systems (1) • User performs search • System retrieves results . . . Incorporating user-defined relevance into information retrieval systems (2) • System asks user if he/she would like to retrieve similar documents Search for other documents with similar word frequencies Search for other documents with same subject descriptors Search for other documents with same subject descriptors Main Author: Title: Gribbin, John R. In search of Schrodinger's cat : quantum physics and reality / by John Gribbin. Subject(s): Schrodinger, Erwin, 1887-1961. Quantum theory History. Reality. Amazon.com Amazon.com Amazon.com Assisting users in determining relevancy Title Abstract Indexing terms Citation data Source: Barry, Carol L. 1998. Document representations and clues to document relevance. Journal of the American Society for Information Science 49(14):1293-1303. How relevant are these? Document representation research Titles Full text How relevant are these? Title: Getting good grades in graduate school Title: How to impress your advisor in graduate school Title: Writing a dissertation Title: The well-written graduate paper Getting good grades in graduate school How to impress your advisor in graduate school The best way to get good grades is to study hard… Never show up late for a meeting with your advisor… Writing a dissertation The well-written graduate paper The first thing to do is to pick a topic that truly interests you… Before finalizing your topic do a preliminary search on… Document representation research How relevant are these? Titles Full text Citation Indexing Abstracts data terms Full text Full text Full text How relevant are these? Utility studies - Indications that user found relevant materials • Citation & abstract databases – User requests citations be formatted for printing – User requests citations be sent by e-mail – User downloads citations • Full-text databases – Pull up the full text – Print the article – Download the article to their Blackberry If user stops may Utility studies - Indications that not have found a user found relevantrelevant materials article Search chocolate Short list Utility studies - Indications that user found relevant materials Search View full citation data for article Short list View full text of article Modifies search Download or print article Assume that user found article relevant Characteristics of searches that produce relevant materials • Subject searching • Utilization of Boolean operators • Search modification • Increased time in display activities • User of greater number of databases Cooper, Michael Dr. and Hui-Min Chen. 2001. Predicting the relevance of a library catalog search. Journal of the American Society for Information Science and Technology 52 (10):813-827. Importance of abstract (1) • Indication as to depth/scope of the article Authors studied leg-hair count variations of Drosophila in Kawainui Marsh • Delineates methodology-indication of reliability and Random sampling in 40 validity sectors during March, June, September & December • Gives indication as to content Greater variation in June novelty Importance of abstract (2) • Basis for research may indicate recency American housing market was selected because it is always robust. • Delineation of results indicates "tangibility" (important, useful data) Authors concluded that American teenagers listen to rock music. Types of abstracts • Indicative • Informative • Critical (evaluative) (Not common in library databases) Indicative abstract Indicates what the document is about but doesn't report findings Title: A review of the current literature on relevance. Abstract: The author reviews the current literature on relevance. Informative abstract Acts as a substitute for the document Title: The effects of library school on the mental health of library students Abstract: The authors performed longitudinal studies on 32 graduate students in 8 library and information science programs and found a significant increase in aberrant psychological traits over time. (fictitious title and abstracts) Abstract creation • Author-produced • Vendor-added • Automated abstracting Automated abstracting 1. Word counts 2. Remove stop words 3. Weight remaining words according to frequency 4. Search for sentences with highest density of most frequently-occurring words 1. Word count Title: Seasonal variations in the feral cat population of Fargo the is a to cats number season winter 81 68 56 42 61 45 27 11 summer spring fall monthly temperature variation food availability 11 11 11 10 61 12 10 10 average concept per over immediate implement mortality survival 9 7 8 9 5 3 8 9 2. Eliminate stop words Title: Seasonal variations in the feral cat population of Fargo the is a to cats number season winter 81 68 56 42 61 45 27 11 summer spring fall monthly temperature variation food availability 11 11 11 10 61 12 10 10 average concept per over immediate implement mortality survival 9 7 8 9 5 3 8 9 3. Rank by frequency Title: Seasonal variations in the feral cat population of Fargo cats temperature number seasonal variation winter 61 61 45 27 12 11 summer spring fall monthly food availability 11 11 11 10 10 10 average survival mortality concept immediate implement 9 9 8 7 5 3 4. Search for sentences with highest density of high frequency words Title: Seasonal variations in the feral cat population of Fargo We found a significant seasonal variation in the number of cats. The highest number of cats are found in the summer, the lowest number of cats in the winter. Automated abstract ... The Children's Internet Protection Act (CIPA) sets conditions on public libraries' receipt of federal financial assistance for Internet access. ... It would not have been possible for the broadcasting station to limit the use of federal funds to all non-editorializing activities. ... The instant Court distinguished Velazquez, restricting its holding to situations in which the grantee is "pit[ted] . . . against the Government. ... " Justice Stevens asserted that the filtering condition was unconstitutional because it distorted the normal usage of library Internet terminals as sources of a wide array of information. ... A condition mandating Internet filters distorts this mission by "deny[ing] patrons access to constitutionally protected speech that libraries would otherwise provide. ... Relevance and information overload In this age of information overload, tools to aid the user in determining relevance are increasingly critical.