Principles of Searching [17:610:530] or ‘e530’ for short Overview of the course and a bit of history © Tefko Saracevic 1 Table of content 1. Summary of sundry requirements 2. Basic definitions 3. Syllabus 4. Why? Rationale & objectives 5. What? Themes and topics 6. How? Goings on 7. A bit of history © Tefko Saracevic 2 1.Summary of sundry requirements Described in detail in: “Before the start: what you need to have and know, and how to get it” and in the Syllabus and in eCollege tutorials (follow the links there) © Tefko Saracevic 3 Before • Prerequisite courses: none – but this course is a pre- or co-requisite for many other courses • Have a Rutgers University Computing Services (RUCS) email account (NetID) – full access to online resources in Rutgers University Libraries (RUL) requires using your Rutgers NetID – get a RUL card for other library services – but you can use any email address for course communication • Know how to use RUL – particularly use from home & use of electronic resources e.g. getting journal articles – many instructions on RUL site • Have a DIALOG account – will get it from the instructor – will get other accounts as time goes by © Tefko Saracevic 4 Required competencies • eCollege: – please take the eCollege tutorial before the course • Email: (of course) – be comfortable incl. with attachments • Word & PowerPoint (also "of course") – take tutorials, as necessary – (I am still taking them when I need to finesse something) • Computer, internet, the web – be comfortable, take tutorials – e.g. logins, file transfer, download • Rutgers has many computing services for students, – including myRutgers, a personalized portal – explore and use them © Tefko Saracevic 5 How to get them? • A must: MLIS bootcamp tutorials – created by MLIS students for "MLIS students on some of the many technical skills that they will need in order to have a successful school year.“ Even if you know, review them! • Required competencies could be gained and sharpened through MLIS and Rutgers tutorials, as well as many other online tutorials • Please review your competencies through these tutorials! – these topics will NOT be covered in the course, but are assumed. FULLY! • Links to all are in mentioned course documents © Tefko Saracevic 6 Communication • Email – through eCollege email functions • whole class • one-on-one • by group • eCollege Chat – discussion room or rooms • groups have own chat room • could be on different topics • could be private – ClassLive - a live chat room • By phone – instructor will provide times when available • In person – drop in to SCILS and see me or lets meet at some conference or event © Tefko Saracevic 7 Coursework • Course Home Page – announcements – Course Checklist • Class Lounge – like a blogspace; use for blog – introduce yourself • Threaded discussion – simulates class discussion – depended on module & topic • discuss, reply, comment… • Dropbox – submitting & retrieving assignments – graded assignments returned © Tefko Saracevic 8 Coursework (cont.) • Journal • place where you can make notes & record thoughts • option of sharing • Document sharing • uploading & downloading documents by instructor & students • but other documents from RUL directly • Webliography • relevant sites submitted by instructor & students; could be annotated • Calendar • schedule of course events • Gradebook • providing grades& comments © Tefko Saracevic 9 Student groups • Groups of three or four colleagues will be formed – each group will have a letter designation & a name you chose – in addition to group chat room and email you can work out among yourselves a method for communication, exchanges • Why groups? – foster easier and multiple exchanges – form a small discussion assembly – help each other, raise questions, explain, discuss … outside of more formal channels • Groups will work together as necessary & should cooperate as to exercises • A group will present some of the results together © Tefko Saracevic 10 2. Basic definitions These are really basic, and many more will be presented during the course and found in readings © Tefko Saracevic 11 prin′ci′ple [prinsÉ™p′l] (noun) (courtesy of Encarta Dictionary) 1. basic assumption an important underlying law or assumption required in a system of thought 2. ethical standard a standard of moral or ethical decision-making 3. way of working the basic way in which something works 4. source the primary source of something All fit this course, but which one fits best? © Tefko Saracevic 12 sear′ching, search [surch] (verb, noun, adjective) 1. penetrating or probing observing acutely or examining thoroughly 2. examine thoroughly to look into, over, or through something carefully in order to find somebody or something 3. examine computer file to examine a computer file, disk, database, or network for particular information 4. discover something by examination to discover, come to know, or find something by examination All fit, but no. 3 fits this course particularly well © Tefko Saracevic 13 3. Syllabus Only an outline is given here. The document is long and detailed. Basic to everything we are doing. Worth a periodic consultation, particularly as to assignments, final project, formats, bibliography, etc etc etc © Tefko Saracevic 14 Content of syllabus • Course description – as in the catalog • Rationale of the course – Why? – motivation and justification for the course • summarized in next section – all course sections – modules – start with a Why? and then go on to What? And How? • Before the start: what you need to have and know, and how to get it – we already went through this • Course purpose and objectives – summarized in section 4 • Organization of the course – summarized in section 5 © Tefko Saracevic 15 syllabus cont. • Coursework – summarized in section 6 – answers FAQ, so before asking it is good to consult syllabus • Method of assessment – how you will be graded • Academic integrity – Rutgers policy and statements on Student responsibility and Faculty responsibility – Plagiarism policy • Bibliography – readings and how to obtain them © Tefko Saracevic 16 4. Why? Rationale & objectives Now we are finally getting to the stuff that the course is all about © Tefko Saracevic 17 Why we have this course? • Details in syllabus & course outline, summary here • Number & variety of information resources is HUGE – growing at a very high rate - called “information explosion” • Great many people search for information – few do it well – even fewer know how well they are doing • As professionals, librarians were always concerned with searching for information on behalf of users – with the advent of electronic information resources and the web, searching has changed in many ways © Tefko Saracevic 18 Why? cont. • Searching has become a complex process involving interaction between people, information, & technology • A professional understands complex processes & interactions involved in searching and putting them effectively to practice • You are asking: – How do I search effectively and efficiently a variety of information resources for users? – How do I evaluate what was searched and provided? © Tefko Saracevic 19 Course objectives Integrated understanding of: • Content: Subject, structure & vocabularies of information resources • Systems: Models of information retrieval (IR) systems, search engines & digital libraries as used in searching • Human-human interaction: User information seeking as the context for searching; mediation & interviewing • Human-computer interaction: Principles for effective searching & variations in search strategies & tactics • Results: Alternatives in presentation of results to users; evaluation of results • Professional concerns: Ethical norms & life-long learning. © Tefko Saracevic 20 In order to search you need an understanding of: Content: What is in the sources? How is it organized? Systems: Where? What kinds? IR, web, dig libraries... Human-human interaction: How? You and the user Human-computer interaction: How? You and the computer Results: What & how to present; evaluate Professional conduct: ethics … © Tefko Saracevic 21 Symbolically ... Content System HHI HCI Results ? Professionally © Tefko Saracevic 22 What will the course NOT do? • Create professional searchers or “extreme searchers” out of you • Make you an expert on databases, systems, information retrieval, search engines, the web © Tefko Saracevic 23 What will the course DO? • Provide you with a practical & theoretical foundation and framework on basis of which you can then: – develop into a professional searcher or technical assistant to users – grow & evolve with the field – adjust to inevitable changes in the world of searching – eventually, depending on your other courses & life-long learning, become an expert © Tefko Saracevic 24 About the course • It is demanding – but so is searching as professional work • It is challenging – but so is searching • There is a lot of thinking • There is a lot of work • But there is a lot – that can be learned – that can be used in practice • in other courses – that will stay with you throughout your career – upon which you can build • And the course is rewarding – and so is searching professionally © Tefko Saracevic 25 5. What? Themes and modules Organization of the course as to coverage with emphasis on modules as basic units © Tefko Saracevic 26 Organization • • • Semester lasts 16 weeks Course has 16 modules – one for each week of the semester Modules are grouped into themes – there are 8 themes following objectives: A. B. C. D. E. F. G. H. © Tefko Saracevic At the start (module1) Content (modules 2 &3) Systems (modules 4, 5 & 6) Human-computer interaction (modules 7, 8 & 9) Human-human interaction (modules 10 & 11) Results (modules 12 & 13) Professional concerns (modules 14 & 15) At the end (module 16) 27 Modules Each module has an outline as to: • Title of the module • Why? the rationale for presenting this module and questions you should ask • What? a list of topics covered in the module • How? presentation and tasks for the module – elaborated in section 6 © Tefko Saracevic 28 Topics covered Theme A: AT THE START Module 1. Overview of the course and a bit of history B. CONTENT 2. Types and structures of information resources 3. Types and structures of vocabularies C. SYSTEMS 4. Information retrieval 5. Interaction in information retrieval 6. Search engines. Digital libraries © Tefko Saracevic 29 Topics covered (cont.) D. HUMAN-COMPUTER INTERACTION 7. Search techniques and effectiveness 8. Advanced searching 9. Web search and the invisible web E. HUMAN-HUMAN INTERACTION 10. Information seeking. User modeling 11. Mediation between search intermediaries and users © Tefko Saracevic 30 Topics covered (cont.) F. RESULTS 12. Evaluation of search sources and results 13. Presentation to users G. PROFESSIONAL CONCERNS 14. Ethics. Competitive intelligence 15. Keeping up: sources for lifetime learning H. AT THE END 16. Student presentations and conclusions © Tefko Saracevic 31 6. How? Goings on Coursework: Ways and means we are going about doing the course AND schedules © Tefko Saracevic 32 Mix • The course is a mix of – theory – experimentation – practice • Why theory? – base for further understanding & professional development • knowing theory separates learning from “training”, a professional from a technician or paraprofessional – nothing more practical than a good theory – theory endures through changes in systems & software • theory makes learning new systems easier – theory helps with understanding & helps learning “stick” © Tefko Saracevic 33 Structure of coursework • Each module has: 1. 2. 3. 4. • a lecture on the module topic assignments as to readings exercises for searching tips for thought There is also a term project – a semester long task focusing on providing a search service to a selected user © Tefko Saracevic 34 Schedule • Assignments and exercises for each module are done on a weekly basis starting Monday, due on the next Monday • The semester long term project is due on the Monday after the last class week, with two progress reports as scheduled (1/3 and 2/3 into the semester) • Schedule is provided on course site © Tefko Saracevic 35 Lectures • Each module has a lecture on the topic – lectures are in PowerPoint – best viewed if downloaded & then run on own computer • go to Doc Sharing; Select View: Lectures; & open, save from there – most lectures contain some links to other sites, providing further explanation, examples, or resources – some lectures slides have notes with further explanatory text • terms/phrases that have a * (asterisk) have associated notes © Tefko Saracevic 36 Assignments • Assignments refer to READINGS ONLY – associated with module topic and lecture – some readings are required – they have to be summarized and summaries turned in – other readings are for read-only and discussion or reference • Full citation to readings is in the bibliography • Readings are either at RUL, on class web site, or the web – sometimes you will have to search • (after all this is a searching class!) © Tefko Saracevic 37 Summaries (for required readings only) • Provide a brief synthesis of main ideas, facts AND – possibly a critical review e.g – relate to (points added for this): • relevant personal, professional experiences with library & information services; examples • translation/implication for practice • other readings, topics, courses, project, exercise and/or • raise questions for discussion and discuss with group • Format, style: • format as prescribed in syllabus • but style of summaries is your choice © Tefko Saracevic 38 Tips for summaries FORMAT: • Start with heading as prescribed – points deducted if not • Use APA style • Two to three pages maximum • Use 12 point font – single space – 1 inch margins • Submit on time © Tefko Saracevic CONTENT: • React to readings • Tie in with practice • Integrate w/ other knowledge, experience and course work • Demonstrate thought & learning • Include questions and criticism • Do not merely summarize readings 39 Exercises • Purpose: to obtain practical training in a variety of systems – the purpose is NOT to teach you a given system, but to provide searching experiences that can be generalized & later sharpened, improved • On a weekly basis as assigned – using DIALOG, LexisNexis, web, search engines, digital libraries … – or search for answers for given questions – or use a variety of tactics & features • Work cooperatively in groups • At times independent of lecture topic – but has its own logic in progression © Tefko Saracevic 40 Examples of first few exercises • Involves DIALOG* • Take DIALOG tutorials • LEARN & PRACTICE: – Contents of databases – Structure of databases & records - BLUE SHEETS – Basic search commands – Basic output commands – Logical operators, execution – Truncation – Searching in fields – DIALINDEX; OneSearch © Tefko Saracevic 41 Tips for thought • Informal – questions, ideas to be pondered on your own – guidelines for further learning & exploration on your own – sometimes things to lighten up • You can contribute • Can be used in group discussion • But there is nothing that is required, nothing to turn in © Tefko Saracevic 42 Term project purpose • A reality exercise designed to give you in depth experience that you will encounter in your professional life – involves every aspect of searching from start to end • Experiences to be shared among classmates, so that you can learn from each other • It will take time and effort, thus do NOT procrastinate © Tefko Saracevic 43 Term project • Select a specific user with an inf. need to do an online search – no family or significant others* • Interview the user – if necessary several times with feedback • Construct a user model • Select resources for searching • Construct search strategies & conduct searching - reiterate • Organize results for presentation • Present results to user; evaluate • Write a technical report © Tefko Saracevic 44 Term project deliverables There are two: 1. A report to the user • • suggest you follow presentation guidelines as suggested in module 13 does NOT have to be presented to the instructor or class – it is between you and your user! 2. A technical report to the instructor • discussed next and in the syllabus at length © Tefko Saracevic 45 Technical report (details in the syllabus) • Selection of user: who? • User question & model – what task? how much knows? what topics? terminology? priorities? • Mode & results of interviews • Summary of search tactics & approaches, dynamics • Changes in user model, user definition of problem • Changes in searching & you • Evaluation of your effort & learning – what does or does not work? – what effects of decisions? – what would you do differently? – this section VERY important! © Tefko Saracevic 46 7. A bit of history A short chronology rather than history © Tefko Saracevic 47 Antecedents • Europe before WWII – strong documentation movement • Universal Decimal Classification, indexing of scientific literature • In the US right after WWII concern about information explosion, particularly in science – Vannevar Bush’s classic article “As we may think” in Atlantic Monthly in 1945 stirred imagination & funding • problem: “the massive task of making more accessible a bewildering store of knowledge.” • solution: use of new technology, “Memex” as idealized model • can you find it? © Tefko Saracevic 48 Beginnings • NSF acts of 1950 & 1958 mandate support for scientific information – to this day supports research & development in this area, including digital libraries – sparked involvement from many fields & many funded projects • 1951 Calvin Mooers coined term “information retrieval” (IR) • 1950’s mechanized IR systems emerged • Societies and conferences emerged related to problems of IR and broader issues © Tefko Saracevic 49 Onto the real world • 1960s saw computer applications for IR blossoming • Also library automation emerged, incl. MARC • Late 1960’s: Medline, the online version of MEDLARS (Nat. Libr. of Medicine) came out – this was online way before the internet & web • Early 1970’s: DIALOG and ORBIT established – commercial online vendors (ORBIT later merged into other vendors) • Professional searching grew at high rate © Tefko Saracevic 50 Research • In 1960’s Gerald Salton & his students in computer science pioneered research into advanced IR methods – addressed technical or system side of IR – great many good results over decades – but it took decades before results applied commercially, but today all vendors & search engines use it – continues to this day internationally – particularly under TREC (Text Retrieval Conference) (find it?) • Research and IR still closely connected – source of advances © Tefko Saracevic 51 Research (cont.) • 1970s & 80s also saw emergence of research dealing with the human (user) side of IR – addressed users, use of information & IR systems – basic notions, such as relevance • In the 1990’s till present areas: – interaction in IR, or human-computer interaction – information seeking – human information behavior • Human and system side of research do not mesh well – still & unfortunately © Tefko Saracevic 52 Net • Internet first went live in 1969 as ARPANET, an inter-university net – in 1983 replaced by TCP/IP protocol still in use today – i.e. present internet was born – in 1990 became NSFnet, broadening reach significantly – in 1992 NSF pulled out & offered to broad public & commercial use • By 1980s it became a force – by 1990’s it took the world • In 1991 Tim Berners-Lee developed world wide web – in 1993 first browser developed (Mosaic to become Netscape) – became fastest growing & spreading technology in history • Search engines – Yahoo launched in 1993 & Google in 1999 – affected searching enormously – today over 3000 search engines in over 150 countries © Tefko Saracevic 53 Digital libraries • Emerged in mid 1990s • Involved – massive research programs ( still going on) – massive investments by libraries • changed the library landscape • particularly as to access & searching – the two don’t communicate much • Brought together IR & libraries • Today vast international presence – many institutions in addition to libraries involved, e.g. museums, societies • Major resource (& headache) for searchers – large variety of texts, images, sounds digitized all over the world © Tefko Saracevic 54 Future? © Tefko Saracevic 55 A perspective: searching is a journey of discovery © Tefko Saracevic 56 another perspective … © Tefko Saracevic 57 still another perspective © Tefko Saracevic 58