Information Science 2005 Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko © Tefko Saracevic 1 Information science: a short definition “the science dealing with the efficient collection, storage, and retrieval of information” Webster © Tefko Saracevic 2 Organization of presentation 1. 2. 3. 4. 5. 6. 7. 8. Big picture – problems, solutions, social place Structure – main areas in research & practice Technology – information retrieval – largest part Information – representation; bibliometrics People – users, use, seeking, context Paradigm shift – distancing of areas Digital libraries – whose are they anyhow? Conclusions – big questions for the future © Tefko Saracevic 3 Scope α Evolution and state of the field in the last decade of the old and first decade of the new century © Tefko Saracevic 4 1. The big picture Problems addressed α Bit of history: Vannevar Bush (1945): β Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.” β Problem still with us & growing [to Table of Content] © Tefko Saracevic 5 … solution α Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.” α Technological fix to problem α Still with us: technological determinant © Tefko Saracevic 6 At the base of information science: Problem Trying to control content in α Information explosion β exponential growth of information artifacts, if not of information itself PLUS today α Communication explosion β exponential growth of means and ways by which information is communicated, transmitted, accesses, used © Tefko Saracevic 7 technological solution, BUT … applying technology to solving problems of effective use of information BUT: from a HUMAN & SOCIAL and not only TECHNOLOGICAL perspective © Tefko Saracevic 8 or a symbolic model People Information Technology © Tefko Saracevic 9 Problems & solutions: SOCIAL CONTEXT α Professional practice AND scientific inquiry related to: Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information α Taking advantage of modern information technology © Tefko Saracevic 10 or as White & McCaine put it: “modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.” © Tefko Saracevic 11 Elaboration α Knowledge records = texts, sounds, images, multimedia, web ... ‘literature’ in given domains β content-bearing structures – central to information science α Communication = human-computer-literature interface β study of information science is the interface between people & literatures α Information need, seeking, and use = reason d'être α Effectiveness = relevance, utility © Tefko Saracevic 12 General characteristics α Interdisciplinarity - relations with a number of fields, some more or less predominant α Technological imperative - driving force, as in many modern fields α Information society - social context and role in evolution shared with many fields © Tefko Saracevic 13 2. Structure Composition of the field α As many fields, information science has different areas of concentration & specialization α They change, evolve over time β grow closer, grow apart β ignore each other, less or more [to Table of Content] © Tefko Saracevic 14 most importantly different areas… α receive more or less in funding & emphasis β producing great imbalances in work & progress β attracting different audiences & fields α this includes β vastly different levels of support for research and β huge commercial investments & applications © Tefko Saracevic 15 How to view structure? by decomposing areas & efforts in research & practice emphasizing Technology or Informatio n © Tefko Saracevic People or 16 Part 3. Technology α Identified with information retrieval (IR) β by far biggest effort and investment β international & global β commercial interest large & growing [to Table of Content] © Tefko Saracevic 17 Information Retrieval – definition & objective “ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...” Calvin Mooers, 1951 α How to provide users with relevant information effectively? For that objective: 1. How to organize information intellectually? 2. How to specify the search & interaction intellectually? 3. What techniques & systems to use effectively? © Tefko Saracevic 18 Streams in IR Res. & Dev. 1. Information science: Services, users, use; β Human-computer interaction; β Cognitive aspects β 2. Computer science: β Algorithms, techniques β Systems aspects 3. Information industry: Products, services, Web β Market aspects β α Problem: β relative isolation – discussed later © Tefko Saracevic 19 Contemporary IR research α Now mostly done within computer science β e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM) α Spread globally β e.g. major IR research communities emerged in China, Korea, Singapore α Branched outside of information science - “everybody does information retrieval” β data mining, machine learning, natural language processing, artificial intelligence, computer graphics … © Tefko Saracevic 20 Text REtrieval Conference (TREC) α Started in 1992, now probably ending β “support research within the IR community by providing the infrastructure necessary for largescale evaluation” α Methods β provides large test beds, queries, relevance judgments, comparative analyses β essentially using Cranfield 1960’s methodology β organized around tracks γ various topics – changing over years © Tefko Saracevic 21 TREC impact α International – big impact on creating research communities α Annual conferences β report. exchange results, foster cooperation α Results β mostly in reports, available at http://trec.nist.gov/ β overviews provided as well β but, only a fraction published in journals or books © Tefko Saracevic 22 TREC tracks 2004 103 groups from 21 countries α Genomics with 4 sub α HARD (High Accuracy tracks Retrieval from Documents) α Novelty (new, nonredundant information) α Question answering α Robust (improving poorly performing topics) α Terabyte (very large collections) α Web track © Tefko Saracevic α Previous tracks: β β β β β β β β β β β ad-hoc (1992-1999) routing (92–97) interactive (94-02) filtering (95-02) cross language (97-02) speech (97-00) Spanish (94-96) video (00-01) Chinese (96-97) query (98-00) and a few more run for two years only 23 Broadening of IR – ever changing, ever new areas added α α α α α α α α α α α α Cross language IR (CLIR) Natural language processing (NLP IR) Music IR (MIR) Image, video, multimedia retrieval Spoken language retrieval IR for bioinformatics and genomics Summarization; text extraction Question answering Many human-computer interactions XML IR Web IR; Web search engines DB and IR integration – structured and unstructured data © Tefko Saracevic 24 Commercial IR α Search engines based on IR α But added many elaborations & significant innovations β dealing with HUGE numbers of pages fast β countering spamming & page rank games – adversarial IR γ never ending combat of algorithms α Spread & impact worldwide β about 2000 engines in over 160 countries β English was dominant, but not any more © Tefko Saracevic 25 Commercial IR: brave new world α Large investments & economic sector β hope for big profits, as yet questionable α Leading to proprietary, secret IR β also aggressive hiring of best talent β new commercial research centers in different countries (e.g. MS in China) α Academic research funding is changing β brain drain from academe © Tefko Saracevic 26 IR successfully effected: α Emergence & growth of the INFORMATION INDUSTRY α Evolution of IS as a PROFESSION & SCIENCE α Many APPLICATIONS in many fields β including on the Web – search engines α Improvements in HUMAN - COMPUTER INTERACTION α Evolution of INTEDISCIPLINARITY IR has a long, proud history © Tefko Saracevic 27 Part 4. Information α Several areas of investigation; β as basic phenomenon – not much progress γ measures as Shannon's not successful γ concentrated on manifestations and effects β information representation γ large area connected with IR, librarianship γ metadata β bibliometrics γ structures of literature Covered in separate lecture: What_is_information.ppt [to Table of Content] © Tefko Saracevic 28 Part 5. People α Professional services β in organization – moving toward knowledge management, competitive intelligence β in industry – vendors, aggregators, Internet, α Research β user & use studies β interaction studies β broadening to information seeking studies, social context, collaboration β relevance studies β social informatics [to Table of Content] © Tefko Saracevic 29 User & use studies α Oldest area β covers many topics, methods, orientations β many studies related to IR γ e.g. searching, multitasking, browsing, navigation α Branching into Web use studies β quantitative & qualitative studies β emergence of webmetrics © Tefko Saracevic 30 Interaction α Traditional IR model concentrates on matching not user side & interaction α Several interaction models suggested γ Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model β hard to get experiments & confirmation α Considered key to providing γ basis for better design γ understanding of use of systems α Web interactions a major new area © Tefko Saracevic 31 Information seeking α Concentrates on broader context not only IR or interaction, people as they move in life & work α Based on concept of social construction of information α Most active area, particularly in Europe, with annual conferences © Tefko Saracevic 32 Information seeking Sampling of theories, models α Why people seek information: β β β β Taylor’s stages of information need Dervin’s Sense-Making – gap, bridge Belkin’s Anomalous State of Knowledge Chatman’s life in the round – inf. poverty α How people seek information: β β β β β Wilson’s General Model of inf. seeking Bates’ berrypicking – acts in searching Kuhlthau’s information search process Chang’s browsing model Benoit’s communicative action - Habermas © Tefko Saracevic 33 Paradigm split in technology - people Part 6. α Split from early 80’s to date into two orientations System-centered γ algorithms, TREC γ continue traditional IR model Human-(user)-centered γ cognitive, situational, user studies γ interaction models, some started in TREC α These became almost separate universes – one based in computer science, the other in information science & librarianship © Tefko Saracevic [to Table of Content] 34 Critiques, cultures α Number of critiques (e.g. Dervin & Nilan) about isolated systems approach β calls for user-centered evaluation approaches, designs & α But user-centered studies did not deliver very useful design pointers, guides α Very different cultures: computer science has own, more science & technology oriented β information science more humanities oriented β C.P. Snow’s two cultures β © Tefko Saracevic 35 Human vs. system α Human (user) side: β often highly critical, even one-sided β mantra of implications for design β but does not deliver concretely α System side: β mostly ignores user side & studies β ‘tell us what to do & we will’ α Issue NOT H or S approach β even less H vs. S β but how can H AND S work together β major challenge for the future © Tefko Saracevic 36 Reconciliation? α Several efforts to provide humancentered design β but more discussion than real application α Integration of information seeking and information retrieval in context (Ingwersen & Järvelin) α Research & development toward β using search context, improving user search experiences & search quality β machine learning, incorporating semantics © Tefko Saracevic 37 Funding α Most funding goes toward systems side & computer science β most (very large %) support for system work α In the digital age support is for digital α True globally © Tefko Saracevic 38 Part 7. Digital libraries LARGE & growing area α “Hot” area in R&D β a number of large grants & projects in the US, European Union, & other countries up to now; β will it continue? It is not growing β but “DIGITAL” big & “libraries“ small α “Hot” area in practice β building digital collections, hybrid libraries, β many projects throughout the world β growing at a high rate [to Table of Content] © Tefko Saracevic 39 Technical problems α Substantial - larger & more complex than anticipated: β representing, storing library objects & retrieving of γ particularly if originally designed to be printed & then digitized β operationally managing large collections issues of scale β dealing with diverse & distributed collections γ interoperability β assuring preservation & persistence β incorporating rights management © Tefko Saracevic 40 Digital Library Initiatives in the US (DLI) α Research consortia under National Science Foundation β DLI 1: 1994-98, 3 agencies, $24M, six large projects β DLI 2: 1999-2006, 8 agencies, $60+M, 77 large & small projects in various categories α ‘digital library’ not defined to cover many topics & stretch ideas β not constrained by practice © Tefko Saracevic 41 European Union α DELOS Network of Excelence on Digital Libraries β many projects throughout European Union γ heavily technological β many meetings, workshops β resembles DLIs in the US β well funded, long range © Tefko Saracevic 42 Research issues β understanding objects in DL γ representing in many formats γ non-textual materials β β β β β β β β metadata, cataloging, indexing conversion, digitization organizing large collections federated searching over distributed (various) collections managing collections, scaling preservation, archiving interoperability, standardization accessing, using, © Tefko Saracevic 43 DL projects in practice α Heavily oriented toward a variety of institutions – primarily libraries β but also museums, professional societies, specific domains, etc etc α Main orientation: institutional missions, contexts, finances β sustainability, preservation in real world β managing growth, rights, access © Tefko Saracevic 44 Agendas α Most DL research agenda is set from top down β from funding agencies to projects β imprint of the computer science community's interest & vision α Most DL practice agendas are set from bottom up β from institutions, incl. many libraries β imprint of institutional missions, interests vision & γ providing access to specialized materials and collections from an institution (s) that are otherwise not accessible γ covering in an integral way a domain with a range of sources © Tefko Saracevic 45 Connection? α DL research & DL practice presently are conducted β mostly independent of each other, β minimally informing each other, β & having slight, or no connection α Parallel universes with little connections & interaction © Tefko Saracevic 46 8. Conclusions IS contributions α IS effected handling of inf. in society α Developed an organized body of knowledge & professional competencies α Applied interdisciplinarity α IR reached a mature stage α IR penetrated many fields & human activities α Stressed HUMAN in human-computer interaction [to Table of Content] © Tefko Saracevic 47 Challenges α Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure α Play a positive role in globalization of information α Respond to technological imperative in human terms α Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the INTERNET α Join competition with quality α Join DIGITAL with LIBRARIES © Tefko Saracevic 48 Juncture α IS is at a critical juncture in its evolution α Many fields, groups ... moving into information β big competition β entrance of powerful players β fight for stakes α To be a major player IS needs to progress in its: β β β β research & development professional competencies educational efforts interdisciplinary relations α Reexamination necessary © Tefko Saracevic 49 Thank you Miró! © Tefko Saracevic 50 © Tefko Saracevic 51 Bibliography Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science,50, 10431050. Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101108. Available: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501-531. Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73. Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051-1063. Available: http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311-330. White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972-1995. Journal of the American Society for Information Science, 49 (4), 327-355. © Tefko Saracevic 52