Lecture07 Advanced searching.ppt

advertisement
Advanced searching
a variety tricks of the trade
tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/
Tefko Saracevic
1
Central ideas
• Searching is still much more an art than a science
• Main object of searching is to be effective
• Effectiveness is primarily considered in terms of
retrieval that is relevant
• But there is no such thing as a perfect search
• This leads to various tactics to achieve certain
effectiveness goals & levels
Tefko Saracevic
2
ToC
1.
2.
3.
4.
5.
Definitions, approaches
Search tactics
Advanced features 1: Using fields
Advanced features 2: Using proximity
Case study
Tefko Saracevic
3
1. Definitions, approaches
Advanced searches as heuristics
Tefko Saracevic
4
Definitions
Advanced (Encarta)
More highly developed …
at a higher stage of development or progress than other
similar people or things
Advanced searching
that about sums it up
it is searching at a higher level of complexity without which
search goals of increased effectiveness cannot be
achieved
Tefko Saracevic
5
Definitions …
Heuristic (Encarta)
problem solving by trial and error
a method of solving a problem for which no formula exists, based on
informal methods or experience, and employing a form of trial and
error (iteration)
using or arrived at by a process of trial and error rather than set rules
a rule of thumb
commonsense rules indented to increase the probability of solving
some problem
Tefko Saracevic
6
Advanced searching is a
HEURISTIC process
• It means that searching is a trial & error process &
an iterative process
• It means that searcher modify a search in response
to results or to user rection
• It is a base for search progression toward more
effective results
• And it is a behind advanced search strategy and
tactics
Tefko Saracevic
7
Goals of advanced searching
– achieve higher levels of effectiveness
• getting more relevant, missing more irrelevant stuff
– and at higher level of efficiency
• saving on overall time, cost, effort
– center search toward answers & resources most likely to
be effective
• also: focus unfocused searches &
• get ideas how to proceed
– use all available system features for goals
– act as an professional (extreme) searcher
Tefko Saracevic
8
Reminder
A search strategy is
• The entire approach to a
search – selection of
– files and sources to use
– approaches in proceeding to
search
– formats for viewing results
– alternative actions if search
yields
• too much
• too little
– problem-solving heuristics
Tefko Saracevic
Search tactics are
• A query - command line
entered into a system in
order to retrieve relevant
information & variations in
– terms, operators, fields,
delimiters & attributes as
allowed by a given system
– vocabulary & syntax used in
conjunction with connectors
&/or limiters to search a
system
9
Advanced searching
possible at several levels
Strategic
• using different approaches
to fit circumstances or
context independent of but
adapted to a system used
Reminder:
Search strategy (big picture):
– overall approach to searching
of a question
– decisions on search
resource(s), content & format
– variations in these as a search
progresses
Tefko Saracevic
Tactical
• using system features to the
hilt to achieve given
objectives
– but as said, features may & do
differ from system to system
Reminder:
Search tactics (action choices):
– choices & variations in search
statements, query
– terms, connectors, attributes …
– using capabilities of a system to
the hilt to achieve desired results
10
2. Search tactics
Various ways of approaching an
advanced search
Tefko Saracevic
11
Some major tactics
Name
Mostly used for
1. Speed search
(also called
Briefsearch, meatball
search, quick & dirty
search)
Questions: usually simple
Requirement for answers: brief, not comprehensive
Effort: not willing to spend much. Little preparation required
Extension: possibly also used as a starting point for ill defined
questions or more complex searches to see what works, what is
there, & for relevance feedback to proceed with other tactics
2. Building block
search
Questions: usually complex & fairly well defined
Requirement for answers: more comprehensive
Effort: willing to spend quite a bit, particularly in preparation
Extension: excellent to proceed with relevance feedback to
citation pearl growing or refinements
3. Citation pearl
growing search
Questions: usually complex & not that well defined
Requirement for answers: comprehensive
Effort: willing to spend a lot, particularly in examination of
answers & following & evaluating citation trails
Extension: good to proceed with building block tactics
Tefko Saracevic
12
Speed search
• Takes little planning & is fast
– searcher gets on to the system quickly, & enters terms
using default (or simple Boolean) operators
– only a few terms are used
– there is no or little reiteration & limited interaction
between searcher & system
•
•
•
•
Can also be used for verification purposes
Results can be examined for relevance feedback
Not recommended for comprehensive searches
Widely used & most prefered by users generally
Tefko Saracevic
13
However …for a complex search
• Speed search is not a be all and end all
• But it could be a very effective beginning
– to do initial exploring and getting ides about sources,
contents, type of documents, magnitude …
– to find some relevant documents and proceed from there
– and then to proceed with refining searches using other
tactics
• You do a speed search, examine results, maybe do
more & examine again and on that basis refine
succeeding searches & tactics
Use it as a classic form of feedback
Tefko Saracevic
14
Building block search
• Commonly used search tactic
– start small & then build upon results
• identification: each important concept a search is identified; also
facets, such as fields to be searched are identified
• elaboration: for each concept further terms are identified
• combination: search starts with one or just a few concepts &
associated; as it progresses additional concepts & facets are
connected using appropriate Boolean operators &/or attributes
• iteration: as a search proceeds terms to concepts may be added,
new concepts, created & combined; fields added or dropped
• You build heuristically & modify the query as you go
along adding, changing concepts, their elaborations,
and facets/fields
Tefko Saracevic
15
Building block search - illustration
1. From a question concepts A, B, C ... are identified – terms that
could be further analyzed
2. For each concept search terms are added – narrower, broader,
related, synonyms, near synonyms - all these are connected with OR
3. Concepts together with their terms are connected with AND
4. Fields and limits may be added to any or all concepts or terms
AND
O
R
Concept A
Concept B
Concept C
Facets/fields
Term A1
Term A2
Term B1
Term B2
Term C1
Term C2
Field/limit F1
Term An
Term Bn
Term Cn
…
Tefko Saracevic
…
…
Field/limit F2
Field/limit Fn
…
16
Dialog worksheet helps in planning
Tefko Saracevic
17
Connecting tactics
• Concepts in building block searches can also be
identified not only from a question but from
resulting documents from a speed search
– thus concepts C, D … could be specified after a previous
speed search , elaborated, & then added to a subsequent
building block set of concepts
– same with facets & fields
Tefko Saracevic
18
Narrowing tactics
• A search can start with using one of the concepts and
its elaborations & then adding others
– this way it proceeds from broad (one concept) to narrower
by adding other concepts – and reviewing
– facets and fields can be added still more narrowing
– evaluated as one receives answers
– limits/fields can be added at any search, narrowing it further
– used to increase precision & focus
• Same can be done in reverse from narrow to broad to
by subtracting concepts from a comprehensive search
– used to increase recall & focus
Tefko Saracevic
19
Narrowing schematic
1st
search
Concept A
2nd
search
Concept A
Concept B
Term A1
Term A1
Term B1
Term A2
Term A2
Term An
Term An
…
3rd
search
Concept A
Concept B
Concept C
Term A1
Term B1
Term C1
Term A2
Term An
…
+
…
+
Term B2
Term Bn
…
+
+
Term B2
Term Bn
…
4th, 5th
…
search
Term C2
Term Cn
…
Facets/fields
Field/limit F1
+
add to
any
Field/limit F2
Field/limit Fn
…
= AND
Tefko Saracevic
20
Citation pearl
growing search
What? aims
When to use it
• It means what the name
implies: you start with a
nugget & grow upon it
• Starts with a few records of
high relevance
• Looks at references or who
cites it to find more
• Aims for more recall
• Avoids subject terms,
indexing & language
• When word lists or thesauri
are not available
• When there isn’t a large
recall after doing some
searching
• When a user has one or two
good articles and wants to
find more like them
• When a topic is hot with a
breakthrough paper
Tefko Saracevic
21
It depends on citations over time
Backward chaining
(back in time)
• Following up references in
articles of interest
– moving backward in
successive leaps through
reference lists
• Could be linked to cocitation – authors cited
together
• Popular in social sciences,
humanities
Tefko Saracevic
Citation tracking
(forward chaining in time)
• Who has cited a given
document, author, journal,
institution
– moving forward in time from
the publication of the item
• Used also to indicate impact
– higher citation rate assumed
higher impact
• Popular in sciences
22
Citation indexes
• Tools giving citation links
• particularly Web of Science, Scopus & Google Scholar
• Invaluable for citation pearl growing
– Citation indexes in various subjects (law, science …)
provided that for a long time even before computers
– But it exploded with automation
• Now some search databases provide support for
that search tactics
– integrated with subject searching
• e.g Scopus, even Google Scholar
– easy to jump from subject searches to references to
citation tracking to sources to authros
Tefko Saracevic
23
3. Advanced features 1
Using fields
Tefko Saracevic
24
In fact
• Any & all vendors & search engines have advanced
search features – none are without them
• In principle most are the same in that they cover
similar fields in records
• But in application they differ from vendor to vendor,
engine to engine – sometimes greatly
• need to be learned individually. What a bummer!
• cannot be taken that what & how works in one works elsewhere –
even though similarities are there
• but once you know them well in a few you generalize & adapt to
others
Tefko Saracevic
25
Fields & advanced features
• Everybody has fields
– & they are critical for
advanced searching
– it starts with fields
• How displayed for
searching differs greatly
– now mostly in menus
• added automatically
– but also available as
commands
Tefko Saracevic
• Common fields beyond
subjects
– author, source, year, institution,
type of publication, country, etc
• Some are used to search
on another dimension
– e.g. authors, sources
• Others to limit subject &
other searches
– e.g. dates, language
26
Advanced features for Library Literature &
Information Science in Wilson Web
fields
Tefko Saracevic
27
Advanced features for Web of Science
fields
Tefko Saracevic
28
Advanced features for Scopus
fields
Tefko Saracevic
example
29
Advanced features for Google
Here is what Google says:
Detailed description in:
Google Guide , particularly in Query Input
by Nancy Blachman
“I developed Google Guide because I wanted more information about Google's
capabilties, features, and services than I found on Google's website. Google Guide
is neither affiliated with nor endorsed by Google.”
Tefko Saracevic
30
Advanced features for Google …
fields
Tefko Saracevic
31
Use of advanced features
• Many studies show that users (when searching for
themselves as end users) use them rarely, if at all,
– they do not use Boolean capabilities, availability of
searching by given fields, restricting of searching by
available delimiters etc.
• But professional searchers use them a lot
Use of advanced features is one of the
hallmarks of professional competencies
Tefko Saracevic
32
4. Advanced features 2
Using proximity of terms
Tefko Saracevic
33
Proximity
• Searching for
– terms x words apart
• one after the other or in
any order
– terms in same sentence,
paragraph, field
• Improves precision
– zeros in on specific
names, expressions
Tefko Saracevic
• Important for searching
– particularly for users in fields
with set terminology
• Connected with phrase
searching
• Simple idea but handled
very differently in
different databases
– to find how handled must go
to Help
34
Phrase and string searching
(similar to proximity) from Help
Tefko Saracevic
35
Proximity & phrase
operators (from Help)
Tefko Saracevic
36
from Help
Tefko Saracevic
37
Stop words
• Words that databases
and search engines
choose to ignore
– for searching – they will
note their position but
not include in the index
– some of them also for
indexing – they will not
index them to start with
• Different databases use
very different lists of
stop words
– and handle them
differently
• Dialog has 9 stop words:
– AN, AND, BY, FOR, FROM, OF, THE, TO,
WITH
• How about others?
– lets see
Tefko Saracevic
38
Stop words
important to know what they do NOT search automatically
[from their Help pages]
Tefko Saracevic
39
Stop words –
handled very differently
WoK has
some 200 stop
words that are
ignored while
searching
even for
phrases
Watch out!
Tefko Saracevic
40
Stop words –
again handled differently
Tefko Saracevic
41
5. A case study
My own question & search – reality
show
Tefko Saracevic
42
Question & context
this is a real question & reason I had
Question
• Search engines offer a
number of features for
searching. They also
retrieve a large number of
answers. How much are
these advanced features
used? How many pages do
people look at?
Tefko Saracevic
Context
• I am interested in studies
that have actual data. To be
used for update of
bibliography in this course
and for discussion in a
lecture book on relevance in
information science that I
am currently writing –
support for broader
conclusion
43
Databases used
• I used first Library
Literature and
Information Science
– available at RUL
– did not get anywhere
really so I lost patience &
switched
Tefko Saracevic
• Then I used Scopus
– not available at RUL any
more, but have class
access
• All results are from
Scopus
44
First I did a speed search that led me to
making building blocks
Tactics
• Selected a basic concept
from the question
Search
search engines
advanced
search
Results
• I enlarged the search
concepts & terms from
index terms found in a few
examined documents that
seemed relevant
Web
Methods
Web searches
Transaction log
analysis
Online
searching
Web queries
Search log
analysis
Web sessions
Tefko Saracevic
45
This resulted in
• Quite a broad search
and a lot of results, so I
went to limit to certain
fields and dates
• Selected to add to
search as limitation:
Facets/fields
social sciences
only last two years
and later to
articles with a lot
of citations –
shows impact
Tefko Saracevic
46
One of the searches
Tefko Saracevic
47
Examined about six pages of results, here are
three major selections
These two did not have any results, but were useful
for the class, so I included them in the bibliography
This was toward the end but it turned to be a mother lode,
not only for having statistical results but for citations
Tefko Saracevic
48
Here is the mother lode abstract with
a number of features for further searching
clicked on authors
Tefko Saracevic
49
Leads to further things:
references, index terms, cited by, related works
Tefko Saracevic
50
Articles that cited it
start of the list, newest ones first with a number of features to explore further
Tefko Saracevic
51
Multiple use of tactics & results
Search tactics used:
• Speed search
• Building block search
• Citation pearl search
• references (backward
chaining)
• cited by (forward chaining)
• Relevance feedback
Results used for:
• Got a few references to
include in class bibliography
• Got data to include in
lectures and in the future
book
• And example to illustrate
topic for this lecture
It was what Marcia Bates calls berry-picking search
Tefko Saracevic
52
Conclusion:
searching is both
Tefko Saracevic
53
Download