Types & structures of information resources What is out there for searching and what’s under the hood? © Tefko Saracevic 1 Definitions • resource – Encarta Dictionary “Source of help …somebody who or something that can be used as a source of help or information … adeptness at finding solutions to problems” • database – Webopedia.com “A collection of information organized in such a way that a computer program can quickly select desired pieces of data. You can think of a database as an electronic filing system.” ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 2 Definitions (cont.) • Information databases are organized by fields, records, and files. A field is a single piece of information; a record is one complete set of fields; and a file is a collection of records. For example, a telephone book is analogous to a file. It contains a list of records, each of which consists of three fields: name, address, and telephone number. ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 3 Relations • Terminology can be confusing & not consisted - so beware & do your own translation – Provider: a producer of databases; there are great many providers covering many fields • e.g. Dept. of Education produces ERIC – abstracts & indexes educational materials (articles, reports) – Vendors or aggregators: organizations or companies that get databases from providers & organize them for searching; there are a number of vendors; some providers are their own vendors • e.g DIALOG gets over 400 databases from a variety of providers, (among them ERIC) & then organizes them for searching ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 4 Example of a vendor: DIALOG • acquires databases, from information providers at a fee • organizes content according to given structures • describes the content – done in Bluesheets, a most important search tool for you • provides given searching capabilities – you have to master them for effective searching • creates some own files – e.g super indexes • provides you access at a fee – there is no such thing as free lunch ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 5 BTW – why DIALOG? • Why do we use DIALOG for so many exercises? Several reasons – – – – oldest and largest surviving vendor most comprehensive set of databases has a well developed instructional program but most importantly: serves as a good test bed to develop searching skills that are generalizable – what you will systematically learn from using DIALOG can be translated to all searching • & you get an insight into problems with searching ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 6 Other vendors/aggregators • Good number of other vendors is around • confusing?? wait, there is more… – the landscape is constantly changing – some available through RUL – examples (examine!) • • • • • LexisNexis Factiva ScienceDirect EBSCOhost Ingenta … and on – some incorporate databases from producers, others create own databases from myriad of sources ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 7 Types of information databases • Many types are available: – – – – – – – – Bibliographic Numeric Full text Directory Image Sound Multimedia Real time • Some that are in DIALOG are also available elsewhere or on their own • Some vendors have exclusive right to some databases • Many you find in RUL ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 8 Examples of databases • Over 200 available at RUL – examples that are relevant to library and information science • • • • • • • Library and Information Science Abstracts Library Literature and Information Science Information Science and Technology Abstracts ERIC IEEE Xplore ACM Digital Library but others also cover materials of interest e.g – Web of Science – INSPEC ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 9 a BIG problem • In DIALOG & some other vendors you can search a number of databases at the same time – so called federated searching – or in DIALOG search Dialindex – a meta index of databases • BUT in RUL & elsewhere there is no federated searching – you have to search each database separately • someday there will be federated searching, but at present do not hold your breath ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 10 as would imagine … ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 11 Now unto structures – getting under the hood • Each database type has its own structure – why? to describe various parts of content for computers to recognize • you can recognize that a section of a document is a title, but computer has to be told that a title is a title • so that it can (among others) search for terms in a title when you request so • Parts of documents (or objects in databases) are labeled as to as to content or function ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 12 Labeling schemes • Many structure schemes were developed that prescribed what to label & what to call the label – meta languages – by providers, vendors, organizations, authorities – in different subjects, domains – for different types of objects • Meta tags are used on the web – to describe & index – semantic web is in development, to further enable description of and searching for meaning • MARC is a form of meta language • To use these schemes for effective searching you have no choice but to get familiar ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 13 Transparency of structures • In some databases description of structure is readily available – even though it may look forbidding, complicated … • good example: Bluesheets in DIALOG • In others, structure is there but has to be discovered by surmising – even in • But clever, appropriate use of structure in searching is key to effective searching ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 14 Example: file 438 Bluesheet Library Literature and Information Science Describes the content of the file ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 15 file 438 fields - each is searchable Sample record: indicates structure ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 16 file 438: fields in Basic Index Basic index is searched by default – examples how to search fields in basic index ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 17 file 438: fields in Additional Indexes Additional index is searched by indicating the field to be searched – examples how to search them Neat trick: If you want to search the latest update only, add to search UD=9999 ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 18 file 438: fields in Limit Searches can be limited to cover documents with given attributes – examples how to limit searches ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 19 file 438: additional uses of structure Results can be sorted or ranked by given fields – examples how to sort or rank results ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 20 file 438: options in displaying of results Results can be displayed in a number of ways – examples of available formats But watch out! In real life some formats are free other cost $$$$! ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 21 Economics – tail that wags the whole dog • In class DIALOG searching is free – & you can use it for class exercises, nothing else • In real life DIALOG (as every other vendor) has an elaborate economic structure – different files have different price tags for use – time of use is calculated in DialUnits • a Byzantine structure of charges beyond understanding – in different files different formats have different price attached • some are rely hefty! ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 22 Where to find all about structure? • In DIALOG in BlueSheets – consult often! and again! and again! and again! – files have similarities and differences in structure – BlueSheets show that • For other vendors: – some have similar description as BlueSheets – some have to be dug up & surmised – in some revelation comes from checking what is available in advanced searching or in tips for searching ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 23 Structure in search engines • Mostly not readily apparent – but all have capabilities to be used in searching • Again: revelation comes from checking what is available in Advanced Search, Search Features, Search Tips, Help, & the like • Most users do NOT take advantage of using available structures in searching – professional searchers do • part of their tool kit & competencies ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 24 Example: structure from Advanced Search Records are structured at minimum by these fields ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 25 Another example: structure from Advanced Search Records are also structured at minimum by these fields ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 26 Similarities & differences • All vendors & search engines have a basic search by default & an advanced search – but defaults & advanced capabilities differ & have to be confirmed for each – once you learn, you will apply variations on the theme ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 27 Similarities & differences … • All vendors & search engines have basic & advanced Booleantype search capabilities – but how it is done & bells and whistles differ – once you master concepts you can then do an AHA! when you encounter a variation & then translate ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 28 Similarities & differences … • All vendors & search engines rank output results – but how it is done differs – DIALOG uses LIFO – Last in First Out as default, but also allows for other ways – search engines use ranking by relevance, clustering, PageRank … criteria • not easy to discern ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 29 Similarities & differences … • Most users – do not know or care about structure – do not search beyond default capabilities – do not look beyond one or two pages of results – miss many potentially relevant results – do not know what is under the hood • Professional searchers – know that structure is very much connected to searching – learn about & use available structures – understand defaults & use advanced capabilities as necessary – know “tricks” for not missing stuff or not getting to much or to much junk – explore in order to learn what is under the hood ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 30 In conclusion! Searching is more art than science, but an art that needs a lot of knowledge what is behind it ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 31 ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... ...---... © Tefko Saracevic 32