Information Seeking

advertisement
Augmenting Information Seeking on the
World Wide Web Using Collaborative
Filtering Techniques
Don Turnbull
1. Introduction
The internet has opened a channel of access to a interwoven labyrinth of information over
an almost ubiquitous platform - the World Wide Web (the Web). Graphical Web
browsers have enabled all types of users to access and share information with one
another. However, once the initial thrill of Web access is over, most users don't surf the
web, they use it as an information source.
This paper seeks to take Information Seeking research and apply it as a framework for
understanding the World Wide Web environment and to identify opportunities for
augmenting information seeking by applying Bibliometric analysis, filtering techniques,
and collaborative technologies to Web usage data that can, in turn leverage a Web user's
Information Seeking behavior.
1.1 Overview
This paper reviews several areas of study in order to form an extensive view of the issues
involved in understanding and improving how a World Wide Web browser user (Web
user) can discover new information on the World Wide Web.
There are seven main sections to this paper:
Section 1: Introduction
This Introduction and Overview, intended to explain and layout the overall topics
of this paper.
Section 2: Applying Information Seeking to Electronic Environments
This section reviews the important models and studies in Information Seeking and
Bibliometrics to understand and analyze Information Systems use.
Section 3: The Internet and the World Wide Web
This section introduces the Internet and World Wide Web, some of their basic
standards and functionalities. Also included are descriptions and reviews of the
data sources and measurement methods currently available to understand Web
usage activity.
Section 4: Collaborative Filtering
This section provides a general introduction to Collaborative Filtering and
presents recent significant studies and systems for Information Filtering using
both the Internet and the World Wide Web. Also included are studies that
illustrate general Collaborative Filtering techniques and a review of current
Collaborative Filtering systems for both the Internet and the World Wide Web.
Section 5: Conclusion
This section concludes the research overview and summarizes the general ideas in
the paper.
Section 6: Suggested Research Projects
This section proposes three research projects, each designed to answer questions
about improving Information Seeking on the World Wide Web.
Section 7: Bibliography and Appendix A
A list of the works cited in this paper and explanatory information presented in
the appendices.
2. Applying Information Seeking to
Electronic Environments
This section reviews the important models and studies in Information Seeking and
Bibliometrics (which can be seen as another way to model Information Seeking patterns)
to understand and analyze Information Systems use.
2.1 Information Seeking Overview
This section focuses on Information Seeking in electronic environments, namely the
World Wide Web. My goal is to explore an Information Seeking model that shows
elements that can be augmented with Collaborative Filtering techniques developed
through data collection and analysis. The Web environment, with its masses of
unstructured and inconsistently coordinated information, is more suited to being
interpreted by people than by machines. Collaborative Filtering is a quantitative way to
develop qualitative data about information on the Web, thus maximizing both people and
computer resources.
Due to the personal subjectivity and seemingly endless amount of Web information to
examine, it is more useful to focus on perceptual and cognitive recognition via browsing
the Web than determining precision of Web searches via Information Retrieval
techniques. However, this is not simple, Information Seeking on the Web is difficult to
measure because a user can never know he is finished. There is no definite ending point.
Information Seeking as a problem seems natural to augment with Information Retrieval
ideas, but should be additionally leveraged with other users' Information Seeking
behavior. At worst, Information Seeking and Information Retrieval can be scaffolded
over each other to gradually build to a refinement of a user's information need.
Marchionini gives us an appropriate definition of Information Seeking: "a process in
which humans purposefully engage in order to change their state of
knowledge".(Marchionini 1995)
2.1.1 Information Seeking and Information Retrieval
Many studies point out the close relation between Information Seeking and Information
Retrieval. Most notably, Saracevic, et. al's comprehensive analysis of Information
Seeking and Retrieval provides excellent starting points for ideas about observation and
collection of data that help establish a sense for context and classification of user
questions; cognitive characteristics and decision making of users; and comparisons of
different searches for the same question. The measures and methods of user effectiveness
and searching provide a rich framework for further studies. (Saracevic and Kantor 1988a;
Saracevic and Kantor 1988b; Saracevic et al. 1988)
These general differences contrast Information Retrieval research from Information
Seeking research:
Information Retrieval:






historically, concentrated on the system
focuses on planning the use of information sources and systems
implies that the information must have been already known
relies on the concrete definition of query terms
involves subsequent query reformulations
centers on the examination of results and their accuracy.
Information Seeking:


historically, concentrated on the user
focuses on understanding the heuristic and dynamic nature of browsing through
information resources




implies that the information is sought to increase knowledge
follows a more opportunistic, unplanned search strategy
involves recognizing relevant information
centers on an interactive approach to make browsing easy.
From a behavioral perspective, the primary difference between Information Retrieval and
Information Seeking is searching vs. browsing. The focus of each domain is in the actions
studied. As computer technology matures, Information Retrieval and Information
Seeking studies are moving closer. In 1996 Saracevic states that "interaction became
THE most important feature of information retrieval" as the access to Information
Retrieval systems has become more dynamic.(Saracevic 1996) Essentially, the
interactivity provides the ability to support more browsing-like approaches for finding
information.
Therefore, to design a system for augmenting Information Seeking, a more robust
understanding of the user and his interactions are in order. The measurement of
successful Information Seeking requires more analyzing these subtler measures to gauge
success. Again, this makes augmenting Information Seeking via collaboration more
probable for success. Instead of relying wholly on Information Retrieval metrics,
recording and comparing a user's interactions with a system can be used to enhance the
information seeker's success.
New technologies, such as the easy-to-use World Wide Web browser, will promote more
Information Seeking use (and attract new users). However, new interfaces alone will not
help us find everything we seek, but we might believe so as we often think electronic
information is more accurate or complete (Liebscher and Marchionini 1988). In a way,
utilizing more collaboration between users can make up for some of the shortcomings of
technical systems. Blending the different perspectives and experience levels of a pool of
users can result in a larger body of resources discovered. Fidel points out two styles of
expert searchers, the operationalists who understand the system and use high-precision
searches and the conceptualists who focus on concepts and terminology to then combine
results to form more complete searches (Fidel 1984). This combination of users
cooperating can form a powerful team to enhance each other's Information Seeking.
2.1.1 Information Seeking Models
The influence of new technology on Information Seeking is also providing a new set of
alternative models that more accurately describe the Information Seeking process as a
dynamic activity. Models of Information Seeking attempt to describe the process a user
follows to satisfy an information need. The Information Seeking models in this section
focus on the behavior of Information Seeking activities.
2.1.1.1 Ellis' Model of Information Seeking
The primary model used in this research will be based on Ellis' work - initially, his model
with six categories (Ellis 1989). Since Ellis has stated that these activities are applicable
to hypertext environments (of which the World Wide Web is one), I will use examples
from Web browsing to illustrate each category:
Starting is identifying the initial materials to search through and selecting starting points
for the search. Starting, as its name implies, is usually undertaken at the beginning of the
Information Seeking process to learn about a new field. Starting could also include
locating key people in the field or obtaining a literature review of the field. It is also
common to rely on personal contacts for informal starting information. For example, in
the Web environment, the activity of starting could involve going to the Yahoo! site to
find the general category listing of links related to the field of inquiry and looking for
overviews, FAQs (Frequently Asked Question files - a commonly-used informal
document describing a particular subject), or reputable reference sites. Another
possibility is going to a bookmarked page that has proved to be useful in previously
looking for similar information or consulting a colleague's own Web page or one he
might have recommended.
Chaining is following leads from the starting source to referential connections to other
sources that contribute new sources of information. Common chaining techniques are
following references from a particular article obtained by recommendation or a literature
search to references in other articles referred to in the first article. It's also quite natural to
pursue the works of a particular author when following these chains. There are two kinds
of chaining:
1. backward chaining is following a pointer or reference from the initial source.
For example, going to an article mentioned in the initial source's bibliography.
2. forward chaining is looking for new sources that refer to the initial source. For
example, using a citation index to find other sources that reference the initial
source.
The only real constraints to chaining are time available and confidence in pursuing a line
of research further. For example, using a Web browser, backward chaining would be
following links on the starting page (be it a online document or collection of links which
we can assume are related in some way) to other sites. Forward chaining could involve
using a search engine to look for other Web pages that link to the initial Web page.[1]
Browsing is casually looking for information in areas of interest. This activity is made
easy by the nature of documents to have tables of contents, lists of titles, topic headings,
and names of persons or organizations. Browsing is being open to serendipitous findings;
finding new connections or paths to information; and learning, which can cause
information needs to change. While on the Web, browsing is particularly unconstrained
as the most-common way to follow a link is simply clicking the mouse. With link
availability and adequate access speed, pursuing a new connection is quite simple. Only
the worry of getting lost in an ocean of links might constrain browsing through the Web.
A common example of browsing on the Web would be finding an online journal article
and following its link back to the overall journal table of contents to an entire other
article. This might in turn lead to a page linking to all of the journal's various contributing
authors, its editorial board, or supporting organization`s home pages.
Differentiating is selecting among the known sources by noting the distinctions of
characteristics and value of the information. This activity could be ranking and
organizing sources by topic, perspective, or level of detail. Differentiating is heavily
dependent on the individual's previous or initial experiences with the source or by
recommendations from colleagues or reviews. A Web-oriented example would be
organizing bookmarks into topic categories and then prioritizing them by the depth of
information they present.
Monitoring is keeping up-to-date on a topic by regularly following specific sources.
Using a small set of core sources including key personal contacts and publications,
developments can be tracked for a particular topic. A Web browser monitoring activity
could be returning to a bookmarked source to see if the page has been updated or
regularly visiting a journal's Web site when it is scheduled to publish its new Web
edition.
Extracting is methodically analyzing sources to identify materials of interest. This
systematic re-evaluation of sources is used to build a historical survey or comprehensive
reference on a topic. With a Web browser, extracting might be saving the Web page as a
file or printing the Web page for use in an archive or for a segment of an overview
document.
In follow-up studies, Ellis adds two more features to his model: verifying, where the
accuracy of the information is checked and ending, which typifies the conclusion of the
Information Seeking process such as building final summaries and organizing notes.(Ellis
1991) These changes not only reflect further studies, but I believe that as Information
Seeking has become more mechanical, its processes are easier to note. However, despite
refining the processes and the relationships between features of his model, Ellis also
agrees that the boundaries between the features are very soft.(Ellis 1996) In using the
Web, verifying might involve extracting keywords from a source and searching for
corroborating information on another Web page. Admittedly, the Web's newness and
large percentage of un-branded information make verification of information difficult. I
suspect that often information is verified by checking traditional sources, not other Web
pages.
Currently (Ellis 1997), Ellis has modified his model's features somewhat, improving
starting to surveying. Surveying further stresses the activity of obtaining an overview of
the research terrain or locating key people operating in the field. Differentiating has been
refined to distinguishing, where information sources are ranked. Distinguishing also
includes noting the channel where information comes from. Ellis points out that informal
channels, such as discussions or conversations, are normally ranked higher as well as
secondary sources, such as tables of contents or abstracts, than full text articles. This is
most likely due to the increased use of electronic resources and their capacity to overload
a user. For the user some kind of hierarchy of results must be formed to place order on
the Information Seeking process. With the Web, it is either easy to discover the channel
of information (a Web site owned by an organization) or quite difficult to confirm (a
resource included on a personal Web page) due to the ease of moving and presenting
information on the World Wide Web.
Another new feature of the model has also been added-- filtering, which capitalizes on
personal criteria or mechanisms to increase information precision and relevancy. Typical
examples of filtering are restricting a search by time or keyword. This idea of filtering, in
more than name, points out that Information Filtering is a crucial element of study in
Information Seeking. In a Web browser session, filtering would likely involve restricting
a search for information (using a Web search engine or even on a particular Web site) by
the date published or carefully noting the URL[2] of the Web page. When combined with
distinguishing, where resources are actually ranked and sorted, we also begin to see how
Information Retrieval is alluded to in Ellis' model of Information Seeking. This figure
illustrates Ellis' current Information Seeking model. Note that the overall structure of the
process could be contained inside each activity, implying the fractal-like nature of the
processes.
Figure 1. Ellis` Information Seeking Model
2.1.1.2 Applying Ellis' Model
I propose that the Information Seeking process is fractal-like in nature. Each feature
follows the overall feature set within itself. For example, within surveying, there surely
must be chaining, browsing, differentiating, not to mention ending that formalizes the
completion of the step. Like a fractal, even the smallest change in a sub-feature (as I shall
now call them) can impact not only its parent feature, but have substantial impact on the
entire Information Seeking process. This is more than just refinement of a search, the
very features of Information Seeking can take on a different mapping as the seeker, the
sources, and technology change. It is these variations that make collaboration in all three
of these domains where Information Seeking can be substantially improved.
For example, collaboration among seekers is the most obvious area of improvement of
the process and the focus of this paper. Different users can share previous findings or
cooperate to minimize future work. Sources can be more easily linked and shared as more
become available digitally. Improved technology can enable more automation of
monitoring; combining and comparing results; and distribution of user profiles or
programs that can provide starting points for Information Seeking.
Ironically, as resources become more plentiful due to technology, they are also being
more loosely, if at all, classified. The resource demands of publishing information are far
less than direct expert classification and often exclude indexing. Without common
organization among electronic resources, more individual work will be needed to build
maps of a research terrain. Again, Collaborative Filtering can help in an ad hoc way by at
least establishing operational classifications of information by communities of users who
pool their resources. Their resources can not help but become classified in some form: by
user, by implicit or explicitly agreed-upon language, or by usage ranking as resources fall
prey to limited attention.
2.1.3 Kuhlthau's Model of the Information Search Process
Kuhlthau provides an additional model which focuses on the information search process
from the user's perspective. Her six stages in the Information Search Process (ISP) Model
are:
1. initiation - beginning the process, characterized by feelings of uncertainty and
more general ideas with a need to recognize or connect new ideas to existing
knowledge.
2. selection?- choosing the initial general topic with general feelings of optimism by
using selection to identify the most useful areas of inquiry.
3. exploration - investigating to extend personal understanding and reduce the
feelings of uncertainty and confusion about the topic and the process.
4. formulation?- focusing the process with the information encountered
accompanied by feelings of increased confidence.
5. collection - interacting smoothly with the information system with feelings of
confidence as the topic is defined and extended by selecting and reviewing
information.[3]
6. presentation - completing the process with a feeling of confidence or failure
depending how useful the findings are.(Kuhlthau 1991)
2.1.4 Belkin's Information Seeking Process Model
Belkin provides another view of the Information Seeking process, described as
Information Seeking Strategies (ISS). This view can be perceived of as a more taskoriented overlay of either Kuhlthau or Ellis' model. The set of tasks are:

browsing?- scanning or searching a resource



learning - expanding knowledge of the goal, problem, system or available
resources through selection.
recognition - identifying relevant items (via system or cognitive association).
metainformation?- interacting with the items that map the boundaries of the task
(Belkin, Marchetti, and Cool 1993).
Again, this model is not linear or like a typical waterfall flow of process. Belkin even
stresses this non-linearity in that he suggests that the model should support "graceful
movements" among the tasks.
2.1.5 Belkin's Anomalous States of Knowledge
Belkin also provides some useful perspectives with the Anomalous State of Knowledge
(ASK) theory, "the cognitive and situational aspects that were the reason for seeking
information and approaching an IR system" (Saracevic 1996). Belkin proposes that a
search begins with a problem and a need to solve it - the gap between these is defined as
the information need. The user gradually builds a bridge of levels of information, that
may change the question or the desired solution as the process continues (Belkin,
N.Oddy, and Brooks 1982).
In other words, this view of information seeking is as a dynamic process with varying
levels of expertise growing in regard to knowledge about the solution and in using
capabilities of the particular information system itself. Taking these ideas, Belkin
advocates a systems design using a network of associations between items as a means of
filling the knowledge gap. By establishing relationships between individual pieces of
knowledge, a bridge of supporting information can be used to cross the knowledge gap.
Using a collection of associations in this manner provides a framework that can be
applied to designing Collaborative Filtering mechanisms, which work from building
associations between users.
The full article is located at
http://www.gslis.utexas.edu/~donturn/research/augmentis.html#Heading4
Download