Why is relevance still the basic notion in information science? (Despite great advances in information technology & applications) Tefko Saracevic, Ph.D. Rutgers University, USA tefkos@rutgers.edu Tefko Saracevic This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License 1 Fundamental concepts Every scholarly field has a fundamental, basic notion, concept, idea ... Relevance is a fundamental concept or notion in information science • It was, but is it still? Tefko Saracevic 2 Definition relevance 1 a : relation to the matter at hand 2: the ability (as of an information retrieval system) to retrieve material that satisfies the needs of the user Tefko Saracevic 3 What is “matter at hand”? • Context in relation to which – a problem is addressed – an information need is expressed as a question – a query for searching is formulated interaction is taking place • No such thing as relevance without a context • Axiom: One cannot not have a context in information interaction. Relevance is ALWAYS contextual Tefko Saracevic 4 Context has to be there Tefko Saracevic 5 Relevance – by any other name... Many names connote relevance e.g.: pertinent; useful; applicable; significant; germane; material; bearing; proper; related; important; fitting; suited; apropos; ... & nowadays even truthful "A rose by any other name would smell as sweet“ Shakespeare, Romeo and Juliet Connotations may differ but the concept is still relevance Tefko Saracevic 6 Tefko Saracevic 7 Two worlds in information science Information retrieval (IR) systems offer as answers their version of what may be relevant – by ever improving algorithms The two worlds interact People go their way & asses relevance – by their problem at hand, context & criteria Considered here: human world of relevance NOT covered: how IR deals with relevance Tefko Saracevic 8 Two large questions Why? (Part I) Why still? (Part II) • Why did relevance • Why did relevance still remain a become a central central notion? notion of - despite advances information in technology science? Tefko Saracevic 9 Part I WHY RELEVANCE? Tefko Saracevic 10 Bit of history • Vannevar Bush: Article “As we may think” 1945 – Defined the problem as “... the massive task of making more accessible of a bewildering store of knowledge.” • problem still with us & growing – Suggested a solution, a machine: “Memex ... association of ideas ... duplicate mental processes artificially.” • Technological fix to problem 1890-1974 Tefko Saracevic 11 Information Retrieval (IR) – definition • Term “information retrieval” coined & defined by Calvin Mooers, 1951 “ IR: ... intellectual aspects of description of information, ... and its specification for search ... and systems, technique, or machines... [to provide information] useful to user” 1919-1994 Tefko Saracevic 12 Technological determinant • In IR emphasis was not only on organization but even more on searching – information technology was eminently suitable for searching • particularly computers • Technological fix to the problem of information explosion Tefko Saracevic 13 Two important pioneers Hans Peter Luhn 1896-1964 • at IBM pioneered many IR computer applications – first to describe searching using Venn diagrams Tefko Saracevic Mortimer Taube1910-1965 • at Documentation Inc. pioneered coordinate indexing – first to describe searching as Boolean algebra 14 Searching & relevance • Searching became a key component of information retrieval • And searching is about retrieval of relevant answers – extensive theoretical & practical concern with searching – technology uniquely suitable for searching Thus RELEVANCE emerged as a key notion Tefko Saracevic 15 Basic Tefko Saracevic 16 Why relevance? Aboutness • A fundamental notion related to organization of information • Relates to subject & in a broader sense to epistemology Relevance • A fundamental notion related to searching for information • Relates to problem-at-hand and context & in a broader sense to pragmatism Relevance emerged as a central notion in information science because of practical & theoretical concerns with searching Tefko Saracevic 17 Aboutness vs. relevance Tefko Saracevic 18 Claims & counterclaims in IR • Historically & from the outset: “My system is better than your system!” • Well, which one is it? OK: Lets test it. But: – what criterion to use? – what measure(s) based on the criterion? • Things got settled by the end of 1950’s and remain mostly the same to this day Tefko Saracevic 19 Relevance & IR testing • In 1955 Allen Kent & James W. Perry were first to propose two measures for test of IR systems: Allen Kent 1921 - 2014 – “relevance” later renamed “precision” & “recall” • A scientific & engineering approach to testing James W. Perry 1907-1971 Tefko Saracevic 20 Tefko Saracevic 21 Relevance as criterion for measures Precision • Probability that what is retrieved is relevant – conversely: how much junk is retrieved? Recall • Probability that what is relevant in a file is retrieved – conversely: how much relevant stuff is missed? Probability of agreement between what the system retrieved/not retrieved as relevant (systems relevance) & what the user assessed as relevant (user relevance) where user relevance is the gold standard for comparison Tefko Saracevic 22 User relevance still ... Tefko Saracevic 23 Tefko Saracevic 24 Part II WHY STILL RELEVANCE? Tefko Saracevic 25 changing dramatically, globally • • • • • Many new applications Transformations Impacts Connections New, newer, newest Tefko Saracevic 26 Social media ... • • • • • • • • Tefko Saracevic Twitter Facebook Instagram Linkedin Tumbrl Youtube Google+ Pinterest 27 Search engines ... Discovery tools Tefko Saracevic 28 And of course ... Tefko Saracevic 29 Societies, Journals, Conferences ... Tefko Saracevic 30 Users After the Web took over the world Up to the time of the Web • Primary & almost exclusive users were – – – – scientists professionals businesses policy makers • Everybody is a user • Everybody searches for everything • And everything reflects their needs, fashion, behavior • People ... all over the globe • And everything reflected their needs, behavior • Professionals searched Tefko Saracevic 31 Everyone ... Tefko Saracevic 32 As the word is changing - so is research Tefko Saracevic 33 Relevance experiments – then • First experiments reported in 1960 & 61 – by an IBM group – compared effects on relevance judgements of various representations Tefko Saracevic • In the next 50 years some 300 or so experiments conducted • A variety of factors in human judgments of relevance addressed 34 Relevance experiments move on • Eye-tracking studies in information science first reported in 2003 • Continued with studies of web and online searching Tefko Saracevic • Moved to include relevance in 2012 35 Jacek Gwizdka, Gmunden Retreat on NeuroIS 2012 (Neuro Information Systems) Marrying neuro-cognitive methods & information science • Cognitive aspects of human information interaction make information science a good field for application of neuroscience theories & tools Tefko Saracevic • Hypothesis for relevance experiments: – fundamental neural processes are associated with relevance decisions & these processes can be detected by EEG or fMRI. 36 Symbolically Tefko Saracevic 37 Types of techniques • Eye-tracking – measurement of eye activity. Where do we look? What do we ignore? • Functional magnetic resonance imaging (fMRI) – measures brain activity by detecting associated changes in blood flow • Electro-encephalography (EEG) – detects electrical activity in the brain using small, flat metal discs (electrodes) attached to the scalp Tefko Saracevic 38 Jacek Gwizdka, Gmunden Retreat on NeuroIS 2013 • Experiment detecting brain activity related to information relevance judgments – 10 subjects given news stories; looked for factual relevant information • Provides experimental design & conduct but results is in next paper Tefko Saracevic • Won the Dr. Hermann Zemlicka Award (“most visionary paper”) – among 23 papers 39 • Does the degree of relevance of a text document affect how it is read? YES “relevant documents tend to be read more coherently, whereas irrelevant documents tend to be scanned.” Tefko Saracevic 40 EEG = Electro-encephalography Tefko Saracevic 41 A study from Finland – in SIGIR Forty participants viewed six terms: which is relevant for given topics? (relevant and irrelevant terms defined by “experts”) Findings: “ ... showed improvement up to 17% in relevance prediction based on brain signals alone.” Tefko Saracevic 42 Another study from Finland MEG= magnetoencephalographic Nine subjects viewed images – which is relevant for a task? Findings: “the relevance of an image a subject looks at can be decoded from MEG signals with performance significantly better than chance ...” Tefko Saracevic 43 “There is one more thing...” “I think the biggest innovation of the twenty first century will be the intersection of biology and technology. A new era is beginning ... “ Tefko Saracevic 44 But also a reality in numbers No. of people in the world: 7.3 billion No. of people using the Internet: 2.9 billion No. of people NOT connected to the Internet: 4.4 billion (60%) of these, 3 billion live in only 20 countries Tefko Saracevic 45 ...... different technology... Tefko Saracevic 46 and relevance in its use Tefko Saracevic 47 Tefko Saracevic 48 Christian Franjo Thank you for inviting me! Tefko Saracevic 49 FYI – For Your Information Presentation and paper at: http://comminfo.rutgers.edu/~tefko/articles.htm URLs and references are in PowerPoint Notes – accessible after download Tefko Saracevic 50