Advanced searching a variety tricks of the trade tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/ Tefko Saracevic 1 Central ideas • Searching is still much more an art than a science • Main object of searching is to be effective • Effectiveness is primarily considered in terms of retrieval that is relevant • But there is no such thing as a perfect search • This leads to various tactics to achieve certain effectiveness goals & levels Tefko Saracevic 2 ToC 1. 2. 3. 4. 5. Definitions, approaches Search tactics Advanced features 1: Using fields Advanced features 2: Using proximity Case study Tefko Saracevic 3 1. Definitions, approaches Advanced searches as heuristics Tefko Saracevic 4 Definitions Advanced (Encarta) More highly developed … at a higher stage of development or progress than other similar people or things Advanced searching that about sums it up it is searching at a higher level of complexity without which search goals of increased effectiveness cannot be achieved Tefko Saracevic 5 Definitions … Heuristic (Encarta) problem solving by trial and error a method of solving a problem for which no formula exists, based on informal methods or experience, and employing a form of trial and error (iteration) using or arrived at by a process of trial and error rather than set rules a rule of thumb commonsense rules indented to increase the probability of solving some problem Tefko Saracevic 6 Advanced searching is a HEURISTIC process • It means that searching is a trial & error process & an iterative process • It means that searcher modify a search in response to results or to user rection • It is a base for search progression toward more effective results • And it is a behind advanced search strategy and tactics Tefko Saracevic 7 Goals of advanced searching – achieve higher levels of effectiveness • getting more relevant, missing more irrelevant stuff – and at higher level of efficiency • saving on overall time, cost, effort – center search toward answers & resources most likely to be effective • also: focus unfocused searches & • get ideas how to proceed – use all available system features for goals – act as an professional (extreme) searcher Tefko Saracevic 8 Reminder A search strategy is • The entire approach to a search – selection of – files and sources to use – approaches in proceeding to search – formats for viewing results – alternative actions if search yields • too much • too little – problem-solving heuristics Tefko Saracevic Search tactics are • A query - command line entered into a system in order to retrieve relevant information & variations in – terms, operators, fields, delimiters & attributes as allowed by a given system – vocabulary & syntax used in conjunction with connectors &/or limiters to search a system 9 Advanced searching possible at several levels Strategic • using different approaches to fit circumstances or context independent of but adapted to a system used Reminder: Search strategy (big picture): – overall approach to searching of a question – decisions on search resource(s), content & format – variations in these as a search progresses Tefko Saracevic Tactical • using system features to the hilt to achieve given objectives – but as said, features may & do differ from system to system Reminder: Search tactics (action choices): – choices & variations in search statements, query – terms, connectors, attributes … – using capabilities of a system to the hilt to achieve desired results 10 2. Search tactics Various ways of approaching an advanced search Tefko Saracevic 11 Some major tactics Name Mostly used for 1. Speed search (also called Briefsearch, meatball search, quick & dirty search) Questions: usually simple Requirement for answers: brief, not comprehensive Effort: not willing to spend much. Little preparation required Extension: possibly also used as a starting point for ill defined questions or more complex searches to see what works, what is there, & for relevance feedback to proceed with other tactics 2. Building block search Questions: usually complex & fairly well defined Requirement for answers: more comprehensive Effort: willing to spend quite a bit, particularly in preparation Extension: excellent to proceed with relevance feedback to citation pearl growing or refinements 3. Citation pearl growing search Questions: usually complex & not that well defined Requirement for answers: comprehensive Effort: willing to spend a lot, particularly in examination of answers & following & evaluating citation trails Extension: good to proceed with building block tactics Tefko Saracevic 12 Speed search • Takes little planning & is fast – searcher gets on to the system quickly, & enters terms using default (or simple Boolean) operators – only a few terms are used – there is no or little reiteration & limited interaction between searcher & system • • • • Can also be used for verification purposes Results can be examined for relevance feedback Not recommended for comprehensive searches Widely used & most prefered by users generally Tefko Saracevic 13 However …for a complex search • Speed search is not a be all and end all • But it could be a very effective beginning – to do initial exploring and getting ides about sources, contents, type of documents, magnitude … – to find some relevant documents and proceed from there – and then to proceed with refining searches using other tactics • You do a speed search, examine results, maybe do more & examine again and on that basis refine succeeding searches & tactics Use it as a classic form of feedback Tefko Saracevic 14 Building block search • Commonly used search tactic – start small & then build upon results • identification: each important concept a search is identified; also facets, such as fields to be searched are identified • elaboration: for each concept further terms are identified • combination: search starts with one or just a few concepts & associated; as it progresses additional concepts & facets are connected using appropriate Boolean operators &/or attributes • iteration: as a search proceeds terms to concepts may be added, new concepts, created & combined; fields added or dropped • You build heuristically & modify the query as you go along adding, changing concepts, their elaborations, and facets/fields Tefko Saracevic 15 Building block search - illustration 1. From a question concepts A, B, C ... are identified – terms that could be further analyzed 2. For each concept search terms are added – narrower, broader, related, synonyms, near synonyms - all these are connected with OR 3. Concepts together with their terms are connected with AND 4. Fields and limits may be added to any or all concepts or terms AND O R Concept A Concept B Concept C Facets/fields Term A1 Term A2 Term B1 Term B2 Term C1 Term C2 Field/limit F1 Term An Term Bn Term Cn … Tefko Saracevic … … Field/limit F2 Field/limit Fn … 16 Dialog worksheet helps in planning Tefko Saracevic 17 Connecting tactics • Concepts in building block searches can also be identified not only from a question but from resulting documents from a speed search – thus concepts C, D … could be specified after a previous speed search , elaborated, & then added to a subsequent building block set of concepts – same with facets & fields Tefko Saracevic 18 Narrowing tactics • A search can start with using one of the concepts and its elaborations & then adding others – this way it proceeds from broad (one concept) to narrower by adding other concepts – and reviewing – facets and fields can be added still more narrowing – evaluated as one receives answers – limits/fields can be added at any search, narrowing it further – used to increase precision & focus • Same can be done in reverse from narrow to broad to by subtracting concepts from a comprehensive search – used to increase recall & focus Tefko Saracevic 19 Narrowing schematic 1st search Concept A 2nd search Concept A Concept B Term A1 Term A1 Term B1 Term A2 Term A2 Term An Term An … 3rd search Concept A Concept B Concept C Term A1 Term B1 Term C1 Term A2 Term An … + … + Term B2 Term Bn … + + Term B2 Term Bn … 4th, 5th … search Term C2 Term Cn … Facets/fields Field/limit F1 + add to any Field/limit F2 Field/limit Fn … = AND Tefko Saracevic 20 Citation pearl growing search What? aims When to use it • It means what the name implies: you start with a nugget & grow upon it • Starts with a few records of high relevance • Looks at references or who cites it to find more • Aims for more recall • Avoids subject terms, indexing & language • When word lists or thesauri are not available • When there isn’t a large recall after doing some searching • When a user has one or two good articles and wants to find more like them • When a topic is hot with a breakthrough paper Tefko Saracevic 21 It depends on citations over time Backward chaining (back in time) • Following up references in articles of interest – moving backward in successive leaps through reference lists • Could be linked to cocitation – authors cited together • Popular in social sciences, humanities Tefko Saracevic Citation tracking (forward chaining in time) • Who has cited a given document, author, journal, institution – moving forward in time from the publication of the item • Used also to indicate impact – higher citation rate assumed higher impact • Popular in sciences 22 Citation indexes • Tools giving citation links • particularly Web of Science, Scopus & Google Scholar • Invaluable for citation pearl growing – Citation indexes in various subjects (law, science …) provided that for a long time even before computers – But it exploded with automation • Now some search databases provide support for that search tactics – integrated with subject searching • e.g Scopus, even Google Scholar – easy to jump from subject searches to references to citation tracking to sources to authros Tefko Saracevic 23 3. Advanced features 1 Using fields Tefko Saracevic 24 In fact • Any & all vendors & search engines have advanced search features – none are without them • In principle most are the same in that they cover similar fields in records • But in application they differ from vendor to vendor, engine to engine – sometimes greatly • need to be learned individually. What a bummer! • cannot be taken that what & how works in one works elsewhere – even though similarities are there • but once you know them well in a few you generalize & adapt to others Tefko Saracevic 25 Fields & advanced features • Everybody has fields – & they are critical for advanced searching – it starts with fields • How displayed for searching differs greatly – now mostly in menus • added automatically – but also available as commands Tefko Saracevic • Common fields beyond subjects – author, source, year, institution, type of publication, country, etc • Some are used to search on another dimension – e.g. authors, sources • Others to limit subject & other searches – e.g. dates, language 26 Advanced features for Library Literature & Information Science in Wilson Web fields Tefko Saracevic 27 Advanced features for Web of Science fields Tefko Saracevic 28 Advanced features for Scopus fields Tefko Saracevic example 29 Advanced features for Google Here is what Google says: Detailed description in: Google Guide , particularly in Query Input by Nancy Blachman “I developed Google Guide because I wanted more information about Google's capabilties, features, and services than I found on Google's website. Google Guide is neither affiliated with nor endorsed by Google.” Tefko Saracevic 30 Advanced features for Google … fields Tefko Saracevic 31 Use of advanced features • Many studies show that users (when searching for themselves as end users) use them rarely, if at all, – they do not use Boolean capabilities, availability of searching by given fields, restricting of searching by available delimiters etc. • But professional searchers use them a lot Use of advanced features is one of the hallmarks of professional competencies Tefko Saracevic 32 4. Advanced features 2 Using proximity of terms Tefko Saracevic 33 Proximity • Searching for – terms x words apart • one after the other or in any order – terms in same sentence, paragraph, field • Improves precision – zeros in on specific names, expressions Tefko Saracevic • Important for searching – particularly for users in fields with set terminology • Connected with phrase searching • Simple idea but handled very differently in different databases – to find how handled must go to Help 34 Phrase and string searching (similar to proximity) from Help Tefko Saracevic 35 Proximity & phrase operators (from Help) Tefko Saracevic 36 from Help Tefko Saracevic 37 Stop words • Words that databases and search engines choose to ignore – for searching – they will note their position but not include in the index – some of them also for indexing – they will not index them to start with • Different databases use very different lists of stop words – and handle them differently • Dialog has 9 stop words: – AN, AND, BY, FOR, FROM, OF, THE, TO, WITH • How about others? – lets see Tefko Saracevic 38 Stop words important to know what they do NOT search automatically [from their Help pages] Tefko Saracevic 39 Stop words – handled very differently WoK has some 200 stop words that are ignored while searching even for phrases Watch out! Tefko Saracevic 40 Stop words – again handled differently Tefko Saracevic 41 5. A case study My own question & search – reality show Tefko Saracevic 42 Question & context this is a real question & reason I had Question • Search engines offer a number of features for searching. They also retrieve a large number of answers. How much are these advanced features used? How many pages do people look at? Tefko Saracevic Context • I am interested in studies that have actual data. To be used for update of bibliography in this course and for discussion in a lecture book on relevance in information science that I am currently writing – support for broader conclusion 43 Databases used • I used first Library Literature and Information Science – available at RUL – did not get anywhere really so I lost patience & switched Tefko Saracevic • Then I used Scopus – not available at RUL any more, but have class access • All results are from Scopus 44 First I did a speed search that led me to making building blocks Tactics • Selected a basic concept from the question Search search engines advanced search Results • I enlarged the search concepts & terms from index terms found in a few examined documents that seemed relevant Web Methods Web searches Transaction log analysis Online searching Web queries Search log analysis Web sessions Tefko Saracevic 45 This resulted in • Quite a broad search and a lot of results, so I went to limit to certain fields and dates • Selected to add to search as limitation: Facets/fields social sciences only last two years and later to articles with a lot of citations – shows impact Tefko Saracevic 46 One of the searches Tefko Saracevic 47 Examined about six pages of results, here are three major selections These two did not have any results, but were useful for the class, so I included them in the bibliography This was toward the end but it turned to be a mother lode, not only for having statistical results but for citations Tefko Saracevic 48 Here is the mother lode abstract with a number of features for further searching clicked on authors Tefko Saracevic 49 Leads to further things: references, index terms, cited by, related works Tefko Saracevic 50 Articles that cited it start of the list, newest ones first with a number of features to explore further Tefko Saracevic 51 Multiple use of tactics & results Search tactics used: • Speed search • Building block search • Citation pearl search • references (backward chaining) • cited by (forward chaining) • Relevance feedback Results used for: • Got a few references to include in class bibliography • Got data to include in lectures and in the future book • And example to illustrate topic for this lecture It was what Marcia Bates calls berry-picking search Tefko Saracevic 52 Conclusion: searching is both Tefko Saracevic 53