Lecture01 Overview.ppt

advertisement
Searching
Overview – providing a framework
A bit of history
tefkos@rutgers.edu; http://comminfo.rutgers.edu/~tefko/
Tefko Saracevic
1
Central ideas
• Searching is a complex, interactive process aimed at
finding & retrieving relevant information
– this, of course, raises a number of questions as to what do
we mean by:
•
•
•
•
•
information
relevant
finding
retrieving
interaction
• Modern searching has deep roots in historical
attempts to deal with information explosion
Tefko Saracevic
2
ToC
1.
2.
3.
4.
5.
6.
Basics - definitions
Complexity – elements involved
Searching & search interaction
Professional changes in searching
A bit of history
Conclusions
Tefko Saracevic
3
1. Basics
A few definitions
We all know them, but sometimes we should think
about them
Tefko Saracevic
4
Information
Generally
• “Information” has many
meanings
– depending on context
– but it is universally well
understood
• it is a primitive concept – one
does not have to explain it other concepts, definitions are
then built upon it
– but many definitions on the Web
From “informare” (Latin):
to fashion, shape, or create, to
give form to
Tefko Saracevic
Context of searching:
several layers or strata
– Narrow: information as a
property of the message (text,
record, document, image …)
– Broader: as property of
cognition - affects or changes
the state of a mind
– Broadest: also connected to the
expansive social context or
horizon, such as culture, work,
task, or problem-at-hand
• We must consider inf. not
only as “message” but in its
cognitive & contextual sense
5
Oh well…
“Information is a difference that makes a difference.” Gregory
Bateson
"Where is the wisdom we have lost in knowledge? Where is the
knowledge we have lost in information?" T. S. Eliot
"With so much information now online, it is exceptionally easy to
simply dive in and drown." Alfred Glossbrenner
"The stone age was marked by man's clever use of crude tools; the
information age, to date, has been marked by man's crude use
of clever tools." Source Unknown
Tefko Saracevic
6
Relevant, relevance
Generally
Merriam Webster (2005)
“Relevant: having significant
and demonstrable bearing on
the matter at hand.”
“Relevance: the ability (as of an
information retrieval system)
to retrieve material that
satisfies the needs of the
user.”
SYNONYMS: pertinent, useful,
of utility, germane, material,
applicable, appropriate …
Tefko Saracevic
In the context of searching:
several layers or strata:
Information which is
connected with a user’s
(searcher’s, inquirer’s …)
information need AND
– cognitive state
– as related to given task, or
problem at hand
– & given affective state –
motivation, intention …
• Relevance has always a
connection: “to”
7
Finding, findability
Generally
In the context of searching:
information has to foundable
• To realize, understand, or
locate something especially
by studying or observing
• To make a special effort to
gather something together
or summon something up
• To discover something or
somebody after a search
Tefko Saracevic
Findability: (Morville, 2005)
• The quality of being locatable
or navigable.
• The degree to which a
particular object is easy to
discover or locate.
• The degree to which a system
or environment supports
navigation & retrieval.
8
Information retrieval (IR)
Generally
“Information retrieval (IR) is
finding material (usually
documents) of an
unstructured nature (usually
text) that satisfies an
information need from within
large collections (usually
stored on computers).”
Manning, Raghavan, & Schütze, (2008)
Other definitions on the Web
Tefko Saracevic
In the context of searching:
Searching of & retrieval from
– abstracting & indexing
databases & services
– specialized databases & sites
– search engines
– directories, portals
– digital libraries
– OPACs
– reference resources
– and the like …
• All can be labeled as IR
systems – that is what they do
9
2. Complexity
Elements involved in searching
Tefko Saracevic
10
Searching is… (repeated)
People
Information
Technology
Tefko Saracevic
… a complex process
involving interaction &
feedback between and
among
PEOPLE,
INFORMATION, &
TECHNOLOGY
11
People
People
Users
• Generally:
– People who accesses & use an
information system
• In information retrieval (IR):
– people with an information
need that may be satisfied by a
search of an IR system
• End users:
– people who use an IR system
directly to retrieve information.
Tefko Saracevic
Professional searchers - you
• Experts with knowledge &
competencies for
performing effective &
efficient searches in a
variety of sources, systems
& media
– searches may be done on
behalf of people, institutions,
tasks  mediated searching
– searchers must follow ethical
guidelines
12
Information
Information
Content
Organization
Objects that potentially may
convey information as:
Ways & means by which the
objects are organized to
facilitate access & searching
– texts, documents, images
recordings , sites …
– often refered to as records
• So far majority are texts &
documents
– even images & recordings are
mostly labeled (tagged) with
texts
• Many systems collect them
as to sources
– vocabularies (free, controlled),
indexes, fields, abstracts,
summaries, classification,
clustering, links, sites …
• great many now created
automatically e.g. terms
extracted from texts
– many types of organization
exist, more on the horizon
– essential for searching
– e.g. journals, areas …
Tefko Saracevic
13
Technology:
Technology
two components or layers
Hardware & software
• A variety of information &
communication technologies
– most importantly, includes
networks
• Software: many applications
available
– most are taken as given by users
& searchers
Systems
• Systems that handle
information objects by:
– identifying, collecting,
organizing, storing, managing
providing access …
– & provide capabilities to
search, retrieve, navigate,
browse,
• We label them information
retrieval (IR) systems
Again, the two are different things but are closely connected.
Professional searchers need to know how to use both.
IN THIS COURSE WE DEAL ONLY WITH SYSTEMS!
Tefko Saracevic
14
3. Searching & search interactions
Kinds, components, dynamics
Tefko Saracevic
15
Interaction
with information there is no such thing as not to interact (yes, a double negative)
General
• We concentrate on behavior
of people in the use of
information embedded in
systems, services, networks,
and devices
• More broadly & recently
this also includes
cooperative activities
among dispersed people
and resources
Tefko Saracevic
Various interactions
• Human-information
interaction
• Human-human information
interaction
– User-searcher interaction
• Human-computer
interaction
16
Human-information interaction
General
• Information interaction is the
process that people use to
communicate & act
reciprocally with an
information system,
particularly in relation to its
content
• It is a dynamic process mostly
mediated by technology
– involving feedback
– reiteration; reciprocal action
– evaluation
Tefko Saracevic
Context of searching
• How & why people access
information is highly
dependent on the context of
their interaction
– this context is influenced by a
range of factors such as
• the time, place, and history of
interaction
• the tasks motivating the
interaction
• the technical possibilities of the
information systems.
(from Information Interaction in Context, 2008)
17
Components in human-information interaction:
a reiterative process with feedback
Tefko Saracevic
18
Human-information interaction:
Components defined
Job, situation at hand; incl. demographic & other
characteristics & affective states of user. All in context
Inf. need
Somewhat nebulous & subjective concept. Mostly
refers to cognitive state, gaps in knowledge (Dervin),
or anomalous state of knowledge (Belkin).
Instrumental- goal oriented. Uncertainty. Necessity.
Question
Verbal (written or oral) representation of the
information need and/or problem at hand.
Query
Question translated into a search statement as
allowed/prescribed by a system. Variations.
Search
Process of submission of a query and conduct of the
search as prescribed by a system. Variations.
Response
Responses or answer(s) by the system. Could be
rearranged, reformatted. Evaluation of responses
Tefko Saracevic
Feedback
Reiteration
Problem,
Task
19
Role of searchers - you
• Not only to do the searching, but also (or in addition)
to assist, lead, instruct, help a user in
– defining, specifying the problem, task at hand
• particularly in terms of informational aspects & resources
– articulating the information need – diagnosis
• guide from possibly visceral to expressive to be searchable
–
–
–
–
formulating of question(s) – clarifying, defining concepts
translating into query(ies); choosing variations
evaluating responses; eliciting feedback to steer reiteration
guiding toward further action, resources on their own
There is much more to searching than searching
Tefko Saracevic
20
Human-human inf. interaction
General
Communication between or
joint activity involving two
or more people with a goal
of obtaining or exchanging
information
– reciprocal action
– goal directed
• Most people still get most
information from other
people
Tefko Saracevic
Context of searching
User-searcher interaction
– part of mediated searching
• searchers acting on behalf of
other people or institutions
• On part of searcher involves
user modeling:
determining users’ inf.
needs & requirements, &
their characteristics as
related to effective searches
– predated by reference
interview
21
Human-computer interaction (HCI)
General
HCI is the study & practice of
interaction between people
(users) and computers
– relationship between humans
and computers
HCI is concerned with the design, evaluation
& implementation of interactive
computing systems for human use and
with the study of major phenomena
surrounding them.
( Association of Computing Machinery,
Special Interest Group on Computer-human
Interaction (SIG CHI))
Tefko Saracevic
Context of searching
Study & practice of using
computers, particularly
interfaces, in searching for
& retrieval of information
– often concentrates on using
particular information
systems, interfaces &
algorithms
– evaluation of the
effectiveness & efficiency of
interactions – algorithms,
systems, interfaces
22
Reiteration in searching
( copy from Hembrook et al. 2005)
Tefko Saracevic
23
4. Professional changes
Dramatic shifts & evolving
directions
Tefko Saracevic
24
Mediated searching
• Of interest in librarianship for over a century
– reference became a major component of library practice
• With advent of information & communication
technology mediated online searching became a
major professional & research activity
– even mainstream of many information centers
– publications, conferences, inf. industry oriented to it
• Searching, meaning mediated searching, became a
big deal
– well, we are teaching it for decades
Tefko Saracevic
25
Information industry
• Starting from early 1960’s an information industry
developed dealing with computerized abstracting &
indexing services & database available for searching
– earliest ones were government sponsored (e.g. Medline),
then transformed within professional organizations (e.g.
Chemical Abstracts), then private industry (e.g. Dialog)
• By 1970’s inf. industry became strong & global
• Most databases & services from inf. industry were
oriented toward professional searchers
– who then offered searching to users in their institutions,
companies, public – mediated searching
Tefko Saracevic
26
Changes due to search engines
• But, search engines have radically changed the way
people search for information
– mediated searching including reference questions, have
declined drastically over the last decade
– users became end users – searching for themselves
– end user searching of search engines exploded globally
• Reference questions drastically fallen off
– between 1995 & 2006 reference transactions declined 54% in ARL
libraries (source: Assoc. Research Libraries statistics)
• Mediated searching followed – done much less than
a decade ago
Tefko Saracevic
27
General changes in library use
• Libraries have added great many digital resources
– including digital resources & databases for end user searching
• As a result today's users have changed use of libraries
– virtual use is skyrocketing while physical use is plummeting
• users don’t vote anymore with their feet but their fingers
– electronic transactions are growing rapidly
– physically users are not in the library but library use is going up
& up & up (again see ARL statistics)
• We do not have statistics how many searches are done
on databases available in libraries, but must be a LOT!
Tefko Saracevic
28
Oh, well…
“Many years ago, the esteemed Barbara Quint offered an
estimate that Google answered as many reference queries in
half an hour as all the reference librarians in the world did in 7
years.”
Abram, S. (2008), Searcher, 16(8)
I have no idea of the source of the statistics, or if they are right at all, but it seems OK
“While they [users] may be absent they are not inactive.
Networked electronic resources via library portals and the
Internet have provided users with benefits that go far beyond
anything available when physical use was the only
alternative.”
Martell, C. (2008), The Journal of Academic Librarianship, 34(5)
Tefko Saracevic
29
Web & changes in inf. industry
• Web changed architecture & orientation of many
databases & changed inf. industry in a big, big way
– old databases restructured significantly e.g. Web of Science
– new databases emerged - some very large e.g. Scopus
– aggregators or publishers of journals became databases for
searching – e.g. EBSCOhost, Wilson
– they went with great gusto after end users
• and with it after a much bigger & different market
• Now libraries & inf. centers buy time-based licenses
from databases for access to their users
• e.g. RUL provides access to close to 300 databases in every field
Tefko Saracevic
30
Changes for searchers - you
• Searchers are now
also involved with
licenses, library Web
systems, & access
provision, plus:
• New orientation &
services emerged &
are still being
developed, refined
(as already mentioned in
previous lecture):
knowledge navigation - supporting the user
in locating and retrieving relevant
information in the global information
environment
cooperative searching – with users & projects
source recommendation – acting as
recommenders
source evaluation – assessing value, quality
& suitability
impact investigation – search for evaluative
data of use in assessing outputs & impacts
of research, institutions, researchers …
user assistance and training - incl.
information literacy
But no matter what you still have to master searching
Tefko Saracevic
31
5. A bit of history
A short chronology rather than
history
Tefko Saracevic
32
Antecedents
• Europe before WWII:
– strong documentation movement
– Universal Decimal Classification, indexing of scientific literature,
utilitarian integration of technology & technique toward social goals
• In the US right after WWII concern about information
explosion, particularly in science
– Vannevar Bush’s classic article “As we may think” in Atlantic
Monthly in 1945 stirred imagination & funding
– problem: “the massive task of making more accessible a bewildering store
of knowledge.”
– solution: use of new technology, suggested a machine named “Memex”
as idealized model
• Technological imperative became a norm for solving inf.
explosion problems – followed to this day
Tefko Saracevic
33
Beginnings
• National Science Foundation (NSF) act of 1950 (&
amendments) mandated support for scientific &
technical information (STI) for effective use
later
– from the start in 1950s to this day NSF supports research &
development in this area, including digital libraries
• now through Division of Information & Intelligent Systems (IIS)
– sparked involvement of many fields; many projects were funded
• Other government agencies got involved
– e.g. National Institutes of Health in supporting mechanization of the
National Library of Medicine to Medline & now MedlinePlus
• Other governments, first in Europe, USSR, and later
globally started supporting similar activities
Tefko Saracevic
34
Information as strategic resource
• Key idea in providing support for STI activities from
the end of Second World War to this date:
– effective dissemination of information considered of
strategic value for progress in science & technology
• Spread to all other fields & human endeavors
• Bedrock of information industry
• Searching fits right in there:
– affected importance & increase of online searching as
a professional activity
– affected spread of searching to wide populace
Tefko Saracevic
35
Idea of information as strategic
resource
Affected evolution of
information age
– global economy's shift in
focus away from the
production of physical
goods (as exemplified by
the industrial age) and
towards the
manipulation of
information
Tefko Saracevic
And information society
– in which the creation,
distribution, diffusion,
use, integration and
manipulation of
information is a
significant economic,
political, and cultural
activity
36
Information science
Information retrieval
• 1951 Calvin Mooers coined term “information
retrieval” (IR) to label a burgeoning activity
– by mid 1950’s computerized IR systems emerged & later proliferated
fast in many fields even outside of science & globally
– among others, their searching became a professional activity
• Societies and conferences proliferated globally
related to problems of IR and broader issues of
information science
– e.g. very influential 1958 International Conference on
Scientific Information (with really great Proceedings)
Tefko Saracevic
37
IR research
• From the 1960’s & onwards Gerald Salton & his students
in computer science pioneered research into advanced
IR methods
– addressed technical or system side of IR
– great many good results over decades
• but it took decades before results applied commercially
• today all vendors & search engines use it
– IR research continues to this day internationally
• particularly under TREC (Text Retrieval Conference)
• and reported by Special Interest Group on IR (SIGIR)
• Research and IR are still closely connected
– source of advances, but now also proprietary
Tefko Saracevic
38
Research (cont.)
• 1970s & 80s also saw emergence of research dealing
with the human (user) side of IR
– addressed users, use of information & IR systems
– basic notions, such as relevance
• In the 1990’s till present growing research in areas:
– interaction in IR, or human-computer interaction
– human information behavior (Wilson, 2000)
– information seeking & searching (Bates, 2002)
• Human and system side of research do not mesh well
– still & unfortunately
Tefko Saracevic
39
Onto the real world
• 1960s saw computer applications for IR blossoming
– also library automation emerged, incl. MARC (go to RUL then ERIC to
retrieve the report)
• Late 1960’s: Medline, the online version of MEDLARS
(National Library of Medicine) came out
• this was online way before the Internet & the Web
through commercial time-sharing networks, such as
Tymnet & Telenet
• Professional searching became firmly established
– grew at high rate
– most access for users was through mediated searching
– but end user searching grew slowly
Tefko Saracevic
40
Onto the real world (cont.)
• Early 1970’s: Dialog and ORBIT established – large
commercial online vendors
• Dialog after a number of changes in owners is still in business; ORBIT
later merged with other vendors & disappeared
– they provided online access to an ever growing number of
databases – became information supermarkets
– later joined by a number of other vendors more specialized
• e.g LexisNexis, STN, EBSCOhost, CSA, etc. etc.
• or new giants, such as Scopus (already mentioned; link is to an overview)
• Magazines, such as Information Today & Searcher
dutifully record & comment on what is going on in information
industry & the profession
Tefko Saracevic
41
the Net
• Internet first went live in 1969 as ARPANET, an interuniversity net
– in 1983 TCP/IP protocol was adopted, free & still in use
globally today – i.e. present Internet was born
– in 1986 NSFnet was created, broadening reach significantly
– in 1995 NSF pulled out & offered to broad public &
commercial use
• Internet infrastructure is now provided commercially
• By 1980s it became a force
– by 1990’s it took the world
• Internet has a colorful history (from the Internet Society)
– timeline shows rapid growth & development
Tefko Saracevic
42
WWW
• In 1991 Tim Berners-Lee invented the World Wide
Web – a hypermedia initiative for global information sharing
– in 1993 first Web browser was developed by Marc
Andreessen - Mosaic to become Netscape
• it popularized the Web
• WWW became the fastest growing & spreading
technology in history
• Search engines
– Yahoo launched in 1993 & Google in 1999
– affected searching enormously
– today over 3000 search engines in over 150 countries
• but a few large ones dominate in every market e.g Baidu in China
Tefko Saracevic
43
Digital libraries
• Emerged in mid 1990s
• Since then involved
– massive research & development programs
• e.g the National Science Digital Library (NSDL)
– massive investments by libraries
• changed the library landscape
• particularly as to access & searching
• for most libraries digital library portions of budget skyrocketed
• Brought together IR & libraries
• Today vast international presence
– many institutions in addition to libraries involved
• e.g. museums, societies, professional organizations
Tefko Saracevic
44
Digital libraries and searching
• Major resource for searchers
– large variety of texts, images, sounds digitized all over the
world
• rich source of many (and many unusual) resources not found through
databases or the Web
• At the same time major headache for searching
– search mechanisms not well developed & integrated
– federated searching (covering multiple databases at once)
still in infancy
– e.g. RUL has close to 300 databases (see under Research resources –
Indexes & databases) yet almost all have to be searched individually
– at RUL federated searching through Searchlight can be done on 8
databases only
Tefko Saracevic
45
6. Conclusions
A few parting thoughts on changes
Tefko Saracevic
46
New world for searching
• Everybody is a searcher now
– searching is a mass sport
• whoever has a computer or other communication devices also
searches
– however few do it well
– even fewer can assess how well they are doing
• horror stories abound
• Search engines are constantly enlarging & refining
their reach, coverage, specialization (e.g. Google Scholar)
• But still the major flaw: Web is value neutral
• diamonds & rubbish, true & untrue, good & evil are all equal
Tefko Saracevic
47
Opening for searchers - YOU
(and libraries & information centers)
• New opportunities & challenges
• They are providing value added services
– and could so even more
• Connecting in different ways with users
• Their basic worth:
TRUST – that is where ethics play a major role
PROFESSIONAL COMPETENCE – that is where your life
long education plays a major role
• This whole course is just a beginning
Tefko Saracevic
48
Tefko Saracevic
49
Download