another paper - USF Computer Science

advertisement
Navigating the Personal Web
David Wolber, Chris Brooks
University of San Francisco
2130 Fulton Avenue
San Francisco, CA., 94117
(415) 422-6451
ABSTRACT
This paper presents a system for seamlessly navigating from one’s
own personal space to external information sources and to the
personal spaces of other users. We present techniques for peer-topeer peer knowledge sharing and zero-input publishing, as well as
a context view that combines searching, browsing, associative file
management, and blog-like features.
Categories and Subject Descriptors
H.3.3 [Information Search and Retrieval]
General Terms
Algorithms, Human Factors, Standardization, Experimentation,
Languages.
Keywords
personalization, contextualization, collaboration,
peer-to-peer.
1. INTRODUCTION
The context for our research includes two emerging phenomena:
1) an explosion in the availability of general purpose and domainspecific document collections (digital libraries), and 2) the
pervasiveness of incredibly powerful computing, storage, and
networking capabilities available to ordinary computer users. The
purpose of our research is to leverage these phenomena in order to
improve the research and creative process.
We take an inward-out approach, grounding our tools and
techniques in what we call the personal web. The term actually
has dual connotations: 1) providing a user with a personal view of
the WWW, and 2) considering the personal information space,
including all documents, bookmarks, and links, as a highly
interconnected space that extends seamlessly to the external
world.
Our primary focus is in exposing the associations that exist
between documents. Traditional software tools tend to ignore
most associations and place strict boundaries between the
personal space and the external world. File managers display only
parent-child folder relationships, and keep all other associations,
including explicit hyper-link associations, hidden. Search engines
typically only consider external documents, and do not consider
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
JCDL Conference’04, Month 1–2, 2004, City, State, Country.
Copyright 2004 ACM 1-58113-000-0/00/0004…$5.00.
hyperlink relationships. Google provides an inward-link viewing
facility, but it is in a separated panel within the advanced search
page. One cannot, for instance, perform a keyword search than
click on a result to see the documents that link to it.
In this paper we describe a system that allows seamless navigation
between the personal web and external information sources,
including other personal webs. In the original version of WebTop
[48], the “external world” consisted of web pages collected using
keyword search and inward link queries to the Google API.
Recently, we re-implemented the system so that a user can
navigate to documents in any “digital library” that provides access
through a publicly defined associative sources protocol.
Associative sources can include general-topic search engines like
Google, domain-specific ones like ResearchIndex, and sources
consisting of the personal webs of other users (see Figure 1).
We call personal web to personal web communication peer-topeer knowledge sharing. Such sharing presents a number of
interesting social, political, and privacy issues, all of which we’ll
address. We’ll also present a technique called zero-input
publishing which addresses the challenge of motivating people to
share by reducing or eliminating the effort necessary.
2. THE PERSONAL WEB
A document is not an island to itself, but has a rich context
including all the documents and resources associated to it in some
manner. One can define “associated” in various ways. A strict
definition would define the one-degree neighborhood of a
document as those directly hyperlinked from it. One could also
consider a looser definition, such as one that also considers the
documents that point to a given one (its inward links). One could
of course go further and consider a wide variety of associations—
similar content, co-citation, same author, collaborative filtering,
etc.
Just as with a single document, the collection of all documents
that make up a user’s personal space is also highly connected,
both internally and in relation to the external world. We use the
term personal web to refer to this collection of documents and
associations. By definition, we say a document is in one’s
personal web if it is located in the personal space or if it is
referred to from a document in the personal space.
The personal web neighborhood can then be defined to be the
documents associated to those in the personal web. We define the
one-degree neighborhood as the documents directly associated to
those in the personal web, the two-degree neighborhood as the
documents in the one-degree neighborhood plus their associated
documents, and so on. As the following section will illustrate, we
have designed a user interface that allows for easy navigation of
this neighborhood.
3. WEBTOP: AN ASSOCIATIVE CLIENT
Associative agents serve as virtual library assistants, peeking over
the user’s shoulder as the user writes or browses, analyzing what
associative information would be helpful, and then scurrying off
to virtual libraries (information sources) to gather data. Also
known as reconnaissance agents [29], personal information
assistants [9],
and attentive systems [31], the goal is to augment the user’s
associative thinking capabilities and thereby improve the creation
and discovery process.
Figure 2 shows a screenshot of the current version of our
associative agent, WebTop. The user can browse web documents
or edit MS-Word documents in the right panel. As the user works,
associative links are displayed in the left panel, which we call the
context view. The ‘I’,’O’, and ‘C’ icons specify the type of
association. ‘I’ stands for inward link, i.e. the document points to
the working one, ‘O’ stands for outward link, and ‘C’ means the
document has similar content. When the user clicks on any
expander (+) in the context panel, associations at the next degree
of separation from the open document are displayed.
Based on our study of associative agents as well as the experience
we have gained building and using them, we have identified
several features that seem to be effective in helping users locate
and manage information. They include:
Zero-input interface [29]. In the traditional desktop, creation and
information retrieval are two distinct processes. When a creator is
in need of information, he or she switches from the current task,
opens a search engine, formulates an information query and then
invokes the query.
Zero-input interfaces seek to integrate creation and information
retrieval. The agent underlying the interface analyzes the user’s
working document(s) and automatically formulates and invokes
information queries. One common zero-input task uses TFIDF [3]
to identify the most characteristic words in the document, then
sends those as keywords to information source searches (this is
how the ‘C’ links in figure 1 are generated). The results of such
queries are listed on the periphery of the user’s focus. The user
periodically glances at the suggested links and interrupts the
working task only when something is of interest.
Because zero-input interfaces are always formulating associative
queries, impromptu information discovery is facilitated. There is
no need for users to stop their current task and switch contexts
Figure 2. The WebTop Associative Client
and applications in order to search for related work. The
challenge, however, is that many users find non-initiated changes
in the user interface disruptive. Our client uses a ramping
interface [38] to iteratively and cooperatively expose information
to the user, and only modifies the context panel when a new url is
loaded or a node in the panel is expanded. We also provide a
search box for traditional search.
Graph/tree view of retrieved information. Search engines
typically provide results in a linear fashion. The user can select a
link to view the corresponding page, but there is no way to
expand the link to view documents related to it, and there is no
mechanism for viewing a set of documents and their relationships.
A more flexible approach, taken by WebTop, is to display
retrieved links in a file-manager-like tree view. When the user
expands a node in the tree, the system retrieves information
associated with that link and displays it at the next level in the
tree. By expanding a number of nodes, the user can view a
collection of associated documents, e.g., the citation graph of a
particular research area.
Mixing of association types. Search engines and file managers
typically focus on one type of association. For instance, Google’s
standard search retrieves content-related links, that is, links
related to a set of keywords. In the separate advanced search page,
a Google user can view inward links of a URL. However, there
are no queries or views that integrate content-related and linkrelated associations. Similarly, file managers focus on one
association type—parent-child relationships of folders and
documents—and ignore hyper-link associations and contentrelated associations. The early associative agents [9, 38] also
focused on one association type—content-related links—and
ignored explicit links and other associations.
WebTop integrates various association types, e.g., folder-child,
link, content-relation, and in general results from any query
available in the associative sources API which we have defined
(see next section). Associations from each type can be listed at
each level of the tree, allowing a user to view various multipledegree associations, e.g., the documents that point to the contentrelated links of a document, or the inward links of the outward
links of a document (its siblings).
Note that when the sources of the links are personal webs, this
essentially allows the client user to navigate into the personal
space of another user. When a document from another’s personal
web is expanded, the system will display outward, inward, and
content-related links from that same source. Outward and inward
links from a personal web includes folders as well as other
documents, so the client user can navigate both the folder
hierarchy and the links within the personal space of the other user.
Mixing of external and internal documents. In the traditional
desktop, there are tools that work with web documents (search
engines) and tools that work with local documents (file managers
and editors). There is generally little integration between the two.
WebTop de-emphasizes the distinction between local and external
documents by integrating both into a single context tree view, and
by considering links from local to external documents. For
instance, if a local document contains a hyperlink to a web
document, the agent will display that relationship. If an external
document has similar content to that of a local document, that
association will be displayed. By considering both the user’s own
documents and documents from external sources, the associative
agent serves as both a remembrance agent [38] and a
reconnaissance agent [29].
Associative saving. Users can also save documents within the
context panel, so the agent also serves as an associative file
manager. The system provides the ability to create edge links,
which associate documents without modifying the internals of
either document. WebTop stores these links as metadata and
displays them in the context panel. One use edge links facilitates
is the ability to add links to unowned web pages (bookmarks).
Such integration of previously separated tools is beginning to
occur in commercial systems, e.g., one can now blog within
Google. In that case, browsing and blogging (annotated, published
bookmarking) are integrated, but saving to the user’s personal
space is not. WebTop integrates all of these features—when a user
links a document into the personal web, it is saved locally and, if
within the shared personal web space, made available to other
users. This feature is what we call zero-input publishing—just by
bookmarking and saving documents, the user can disseminate
knowledge.
uniformity, client application must talk to each associative source
using a different protocol. This prohibits a developer who has
written a client for the Google service from reusing the code used
to access the Amazon service.
More importantly, the lack of a uniform API prohibits the use of
polymorphic lists of associative sources. This is important for
helsources, such as the WebTop system described above. Without
polymorphism, the choice of which sources to make available in a
client application must be set at development time, and the enduser of the client application is restricted to those chosen. An enduser cannot access a newly created or discovered source without
the code of the client being changed.
A standardized API and registry system is clearly the solution.
Initiative for standardization of search-like protocols exist in both
the web meta-search areas with START, SDLIB, and SDART
[18], and in the digital library world with, for instance, the OAI
[35], OCI [36], and XLink[47].
Our particular goal is to define an API based on the XML/Soap
web service protocol and the accompanying Web Service
Description Language (WSDL [46]) and Universal Description,
Discovery, and Integration (UDDI [44]) specifications. We also
plan to explore various associative methods within our API(s),
thus we have not conformed to any of the existing protocols in
this version of our software.
Aggregation of Multiple Sources The newest version of WebTop
allows a user to select the active sources from the dynamic list of
all registered associative sources. For each chosen source, the user
also specifies the queries that should be invoked (e.g., keyword
search, inward link, outward link) including the number of results
returned from each. When a new URL or document is loaded in
the browser, or a node in the context tree is expanded, the userspecified sources and queries are invoked. The results are then
displayed in both the order and count specified by the user.
Instead, we have defined a common API for an “associative
source”, and a public registry system for such sources [49]. The
API is specified in a publicly available WSDL file. It contains
various associative methods, including keyword and citation
search (see Figure X). The methods allow the client to specify the
number of links to be returned and to set restrictions (e.g., date,
country) on the elements that should be considered. Results are
returned in a generic list of Metadata objects, where the Metadata
class is defined to contain the Dublin Core fields and a URL.
4. ASSOCIATIVE SOURCES
With this open system, any organization or individual can expose
a digital collection as an associative information source. If there is
already a web service for the collection, the owners or a thirdparty can write a wrapper (adapter) service that conforms to the
associative sources API but makes calls to the existing service.
In many domains, web service providers are agreeing on standard
programmatic interfaces so that information consumer
applications need not re-implement client code to access each
particular service. For instance, Microsoft has published a WSDL
interface to which securities information providers can conform
[41].
Our system applies this standardization in a cross-domain fashion
by considering web services that provide similar “associative”
functionality but are not generally within the same topic domain.
In particular, we consider a class of web service which we call
associative information sources. Such services associate
documents with keywords, documents with other documents,
authors with documents, and in general information resources
with other resources. The Google and Amazon web services are
prime examples of services in this class, as are domain-specific
information sources like FindLaw, the Modern Languages
Association (MLA) page for literature, and CiteSeer and the ACM
Digital library for computer science.
Currently such services either provide only a web page interface
that must be scraped by an agent, or they provide a web service
based on their own programmatic interface (API). For instance,
the Google and Amazon web services both provide a search
method that accepts keywords and returns a list of links, but a
different method signature is used by each. Because of this non-
After the source is implemented and deployed, it can be registered
using a web page interface that we provide. The registry parses the
WSDL files of the sources that register to determine which of the
associative source API methods are implemented.
Aggregator agents like WebTop use a web service interface to the
registry to access the list of registered sources available to the
user. The objects returned from the registry web service method
contain URLs referencing the WSDL and endpoint of the source,
the particular associative methods that the source provides, and
metadata about the source. The aggregator can list the available
sources for the user to choose from, or intelligently choose the
source(s) for the user. In either instance, the list of sources is
dynamic, allowing users to benefit from newly developed sources
as soon as they are available.
To bootstrap the system, we developed a number of associative
source web services, including ones that access data from Google,
Amazon, Citeseer, the Meerkat RSS feed site, and FindLaw. We
have also developed sample C# and Java web service code that
can be downloaded and used to build new associative sources. In
implementing these sources, a number of interoperability
problems were faced, primarily because of immaturity with the
WSDL and XML/SOAP protocols and difference in the .Net and
Java web service development tools. Through much trial and
error, these problems were solved so that services developed in
both the Java and .Net platforms can be called generically by our
agent.
5. SHARING THE PERSONAL WEB
A key component of our project is the idea of a personal web. A
personal web consists of the collection of documents and
bookmarks on the user’s local hard disk or server space. On initial
startup, WebTop users specify the root folders to be analyzed for
the personal web, e.g., “My Documents”. The system iterates
through the file system, identifying hyperlinks between documents
and the characteristic words of each document, and building an
inverse index for full-text search of the space. As a user works,
this personal web metadata is updated so that it is always
consistent with the file system. For example, when a user
bookmarks a web page or adds a link from a document to a web
page, that association information is recorded.
One use of the personal web information is as a remembrance
agent for the user —when a document is opened in the browser,
documents from the user’s own personal web that point to or are
content-related to the open document are displayed in the context
panel.
A more interesting use, of course, is for peer-to-peer knowledge
sharing. It should be noted that the personal web is not just a
collection of documents that can be searched. Instead, it consists
of documents and associations, including hyperlinks and folderdocument relationships. Thus, each time the user categorizes a
document by placing it in a folder, create a folder, or adds a
hyperlink within a document, he is adding to the richness of the
information.
To expose personal webs for sharing, we have implemented a web
service which conforms to the associative source API and returns
information from a personal web. We are currently completing
implementation of a mechanism so that, on initial start-up of
WebTop, users will be asked if they want to expose their personal
spaces as information sources, and, if so, specifications as to
which folders should be shared with whom. If a user chooses
exposure, the system will automatically register the personal web
as an associative source and, each time the user logs on to the
system, deploy the web service exposing the methods to the
outside world.
Once the users specifies the shared folders, the user will be able to
share without effort—all document saving, bookmarking, and link
creation will automatically create shared knowledge? We
hypothesis that zero effort publication will lead to more sharing,
and that much of the information in personal spaces is hidden
from others not for privacy reasons, but because publishing the
information as a web page or a blog takes effort.
We realize, of course, that privacy is a complicated issue in both
the corporate and academic settings. The challenge will be to
provide a privacy specification mechanism that is flexible enough
to provide for the various needs of individuals and organizations,
but easy enough that people actually use it instead of choosing
“share all” or “share none”. The W3C P3P effort [45] should be
of help here, along with efforts such as [1] and [5] Our plan is to
implement a fairly simple mechanism, make the system available,
then use an iterative approach using user feedback to refine the
privacy mechanism.
6. PERSONALIZED PAGE RANKING AND
SOURCE SELECTION
Multiple information sources exacerbate the already challenging
information overload problem of single source search engines.
Clustering of results can help [11], as can personalized page
ranking [20, 23, 39] and automated source selection.
WebTop currently requires the user to explicitly specify the
number of results to be returned from each source, and the
ordering. We are currently implementing extensions which will
also provide automated page ranking and source selection. The
algorithm we have designed combines content similarity, link
analysis, and source reputation measurements in choosing sources
and links. Prior to information retrieval, context information,
including the open documents and those near them in the personal
web, can be compared against characteristic terms from
prospective sources. In the post-processing phase, the context
information can be compared to the results returned from the
various sources. In both cases, links from the personal web and
the personal webs of others can be used to compute a personalized
PageRank
We also plan to incorporate source reputation into the algorithms.
The reputation of a source can be computed specifically for the
user by measuring the percentage of listed links from the source
that the user actually chooses. Collaborative filtering can also be
used to take into account the source’s reputation vis-à-vis other
users. Such automated source reputation measures have proven
helpful in blogging systems and peer-to-peer systems [19].
7. SCENARIOS OF USE
This section current and envisioned uses of the WebTop system.
7.1 Impromptu Information Discovery
A key facet of the integrated WebTop system is that information
can be discovered in an impromptu manner. For instance, when a
user opens one of his own research papers in the browser, the
system will display the inward links to that paper. If a new web
page has linked to it or cited it, that page will appear.
The first author was using WebTop and opened up his list of
publications page. An interesting inward link appeared that the
author had not seen before. He clicked on the link, and was taken
to a page written in German. Fortunately, the page had some
English links pointing to interesting work related to the first
author’s work.
7.2 Navigating an Expert’s Personal Web
Imagine an expert, say Henry Lieberman of the MIT Media Lab,
renowned computer scientist, guru in various areas including
artificial intelligence, software agents, and human-computer
interaction, and a person that has a profound effect on the people
he meets, one of those people with whom a five-minute
conversation can revolutionize ones thinking, change the course
of ones research, trigger a thousand new ideas for thought.
Now imagine Henry sitting in his office at MIT, or on a crosscountry airplane, sitting and working on his computer, reading
some papers, browsing the web, bookmarking particular texts,
writing notes, adding links between a paper from one of his
research communities (e.g., HCI) to another (e.g., Artificial
Intelligence). And imagine that you are able to look over his
shoulder and observe him, or better yet, all the work he is doing is
recorded in an easily digestible form, one that you can browse at
your leisure. Instead of asking Google what it thinks about, say,
the semantic web, you can ask Henry—you can search within his
personal collection of bookmarks and notes, you can browse his
directories, you can navigate through the links he has layered on
top of the documents in his collection.
Then to take it a step further, besides navigating the document
links within Henry’s collection, you can also navigate people
links, people links that have been created through an automated
analysis of Henry’s documents, so now you are not only picking
Henry’s brains, but sitting in a room of experts, seamlessly
floating from one expert’s brain to another. And all this is made
true because the selection, filtering, and notation work that Henry
and the others have been performing is available in the public
domain.
7.3 WebTop as Groupware
You are part of a small research group at a small law firm. Unlike
the scenario above, you only want to share your day-to-day
knowledge creation with members of the group, but it is crucial
that the members of the group do not duplicate their efforts and
that your team is able to produce a cohesive and exhaustive
survey of the topic in a limited time.
In this case, you choose the personal webs of the other group
members as the sources for your associative agent, as well as a
law citation source. Whenever you open a new document, the
context panel displays any notes or cases that the other members
have associated with the document, as well as cases that cite or are
cited by the open document.
8. ECONOMIC AND POLITICAL
MOTIVATION
Though important in today’s relatively free Internet environment,
distributed and peer-to-peer knowledge sharing may prove even
more important as the freedom of that environment is challenged.
We must not be fooled by the free and creative beginnings of the
Internet, or the “good citizen” approach of Google’s founders
[McHugh]—the economic and political forces of our society can
easily render these historical anomalies. Instead, a Zinn-like [51]
view must be taken: this is a battle for the Internet between
corporate interests and the general populace.
Neither business nor political interests are motivated to create a
more knowledgeable society. As opposed to free thinkers, both
are better served by a placid population of consumers.
Corporations do not care what we see, as long as we view their
interstitial advertisements. The government that protects those
corporations fears anything that threatens their power.
As the portal to information is centralized through Google (a
Google-Opoly[33], the company that buys Google, or some other
monopoly that emerges, the dangers to information freedom will
grow. As Lawrence Lessig has argued[28], this danger will not
present itself in law or stated policy, but in code! Our freedom
will be dependent on the source choosing and page ranking
algorithms hidden within the centralized server.
Two dangers lurk: the infiltration of advertisement within our
computers, and thought control through the careful dispensation
and withholding of information. In terms of advertisement, one
need only consider the historical precedent of television, and how
the commercial time per hour is risen steadily.
Powerful forces would like to see the Internet go in the same
direction. Consider the following Kafkaesque scenario which the
first author recently experienced. As he was browsing,
advertisements—to Netscape, Great Beginnings, and AOL, among
other “legitimate” companies—began appearing on his computer
screen. Note that these were not the normal pop-ups that the
author, in his infinite wisdom, had removed months before using
anti-popup software. In this case, the popup did not emanate from
the pages the author was visiting. Instead, an “adware” agent,
maliciously installed on the author’s computer, was responsible.
The agent was not only popping up interstitial messages, it was
monitoring the user’s browser behavior so as to “personalize” the
ads that he received (no comment on how this personalization
manifested itself).
Being a somewhat savvy computer user, the author realized what
was occurring and opened up the “Control Panel” to remove it.
He searched his list of installed components and found one he
didn’t recognize. With some fear that he was deleting some
component of some application that was important, the author
selected the unknown application for removal.
A dialog appeared stating that the application should not be
removed, but if you really want to, click OK. Of course, the dialog
disappeared before the OK button could be chosen. After many
attempts, the author deftly reached the OK button to specify his
choice. Unfortunately, and obviously, clicking the button had no
effect—the program was not removed.
The point of this anecdote is not that some idiot was able to
infiltrate the computer in this way, but that legitimate companies
were accepting enough of such a strategy to buy into it. It is
certainly proof of the passion and desperation corporations have
in making Internet advertisement work, as well as their
expectation of what consumers will accept.
The other threat which centralized access to the web poses, is
information access control through page ranking or other software
mechanisms. Encoded in complex algorithms, such access control
would significantly hinder freedom of thought. Google has
already acknowledged reacting to pressure from the Chinese
government:
Chinese who use Google to search on terms like "falun
gong" or "human rights in china" receive a standardlooking results page. But when they click on any of the
results, either their browsers are redirected to a blank or
government-approved page, or their computers are blocked
from accessing Google for an hour or two.[33]
One is left to wonder: does Google facilitate similar access control
for our government, but in less obvious ways?
Clearly, know matter who is in charge, centralized control of
information is not in the public’s interest. Systems that aggregate
information from separate collections are less prone to such
control. Those based on a centralized registry, such as in the
WebTop system we have described, are not immune from control,
even though the centralization is at the information source level
and does not rank or suggest documents directly. Ideally, both
source identification and document discovery should probably use
the peer-to-peer model [39,42]
9. RELATED WORK
Reconnaisance Agents. The idea of an agent that assists users in
their browsing and discovers new links on their behalf has been
explored in the past. One of the most well-known systems is
Lieberman's Letizia system [30], which helped users browse web
pages by looking ahead at the links on each page and suggesting
ones that match the user's working profile. Letizia performed a
personal crawl [12] seeded from the current document, and built
the user's working profile based on information recorded during
the current session. A successor system, PowerScout [29], used
longer-term user profiling information, and also recognized the
need for multiple profiles representing a user's various
personalities and interests.
Margin Notes [38] was a just-in-time information system. It used
TFIDF to find the most characteristic words in each section of a
document, and then sent those words to both a general-purpose
search engine and a search facility for the user's local documents.
The resulting links were listed in the margins of the document,
providing just-in-time and up-to-date annotations of the
document. Since the annotations could come from the user's local
files as well as the web, the agent serves as a remembrance agent
as well as a reconnaissance agent.
Whereas Letizia helped users browse, Margin Notes provides
associative links both within the browser context and a word
processing context. This latter component is important in that
associative information is pushed to the user during the creative
process. Margin Notes is also different than Letizia in that it uses
a general-purpose search engine to search for related items on the
entire web, as opposed to searching only in the neighborhood of
the current document.
Watson [9] is an information management assistant similar to
Margin Notes. Its goal, similar to that of SUITOR [31], is to be
attentive to all of a user's everyday applications. Watson
researchers also explored automatically selecting the source of a
search using the terms from the working document [27].
The WebTop associative agent was motivated by all of the above
systems. Like Watson, it finds related information from multiple
information sources. It also performs a search in the neighborhood
of the working document, like Letizia, and searches the personal
space, like Margin Notes. WebTop is different from all these
systems in the types of associations it considers and the way it
relaxes conventional distinctions between local and web
documents, and between various types of associations.
Personal Knowledge Base Systems. TheBrain [43] is billed as an
associative computing system. It allows users to create “things”,
associate them, and view a graph of those things. A user can
create a thing from a URL, but the system is not integrated with a
file manager, browser, or web graphing system.
Haystack [22] offers an integrated personal platform based on
XML and RDF associations. Haystack is at the systems level, with
XML-compliant applications running on top of it. Thus,
applications like email and file managers speak the same
language, enabling the system to offer associative features not
possible in the traditional environment.
MBiblio [37] offers a personalized interface to a federation of
digital libraries. All libraries in the federation follow the OAIPMH standard. Like WebTop, use of a standard allows for
cohesive meta-search. Note, however, that MBiblio is not
integrated with a browser or file manager, and does not provide
views of different types of associations.
Meta-search. There are a number of commercial and research
metasearch systems. 37.com provides access to 37 different search
engines from a single interface. dogpile (www.dogpile.com)
combines various types of information source, including
newsgroups and white pages. Inquiris [17], [50], and MetaSpider
[13] use link analysis as well as content in clustering the results
from various sources. Inquiris also augments queries with user
information. SavvySearch [14] uses the user’s past choices to help
choose the sources that should be searched on each query. The
digital library system described in [34] performs both meta-search
and reference linking from multiple sources.
Source Discovery. Source discovery is aided by sources exposing
characteristic terms or some other sort of profile. START and
SDARTS [18] define standards for exposing such a profile.
Similar efforts exist in the UDDI world. DAML-S [2] is a web
service specific language to describe semantics of particular
methods that a source provides.
5.3. Collaboration. The idea of developing a multi-agent system
that allows users to share information or collaborate has also been
explored by other researchers. Chau, et. al. [11] describe a system
in which users are able to annotate and share the results of Web
searches. They found that user performance was reduced
compared to a single-user system when the number of other
collaborators was small, but that once a threshold number of other
collaborators and searches was reached, sharing and annotation
became a worthwhile task.
Research in Collaborative Filtering systems has also considered
the problem of allowing users to share recommendations about
web pages, movies, or books. Collaborative filtering is an
extremely active research area, producing both research projects
such as GroupLens [25] among many others, as well as
commericial products such as the Alexa Toolbar. Collaborative
filtering typically works by having a large set of users rate a
number of documents, and then relating a new user to these users,
so that the new user is 'close' to users with similar tastes.
Community Formation. One of the more intriguing possibilities is
the spontaneous emergence of communities of people with shared
interests discovering each other. Flake, et. al., [15] have studied
the identification of self-organizing communities on the web.
Using techniques from graph theory, in particular network flow
algorithms, they identify clusters of web pages that are highly
interconnected, thereby forming a community. Often these
communities are emergent; they form through a series of local
interactions, rather than through some supervising process. Flake,
et al's work differs from ours in that they are concerned with
identifying communities that already exist, rather than bringing
new communities into existence. We are also interested in
forming communities of users, rather than documents.
Nevertheless, their work on identifying network structures that
enable community formation will help us in determining
successful methods to help aid in community formation.
There has also been work in the multi-agent area concerning the
formation of coalitions and congregations of agents [7,8,40,52]
and communities [16,32]
Personal Spiders and Focused Crawlers. Focused crawlers [10]
and Personal spiders [12,13] crawl the web beginning with a set
of seed documents and a profile. The plan for WebTop is to run a
personal crawler with seed documents taken from the personal
web.
10. CURRENT STATUS AND FUTURE
WORK
Only a prototype of our newest WebTop version exists. It is
incomplete, buggy, and has yet to be formally evaluated, but
informal observation of users suggests it is extremely powerful as
a research tool. It, the implemented associative sources, and the
registry, have not yet been made public. Our plan is to release a
public version in March of 2004.
Once the system is available, we plan to study both explicitly
designed groups of WebTop users, as well as grassroots uses of
the system. Will communities of users evolve? Will experts share?
Will users free-ride the system, as many do with Gnutella and
other file sharing systems?
The system currently offers no help in source discovery, other
than a description provided by each source. We plan to explore
mechanisms for automated source discovery, as well as automated
congregation of sources.
As mentioned, we are also implementing both automated source
selection and personalized page ranking using a user model based
on the personal web We also are completing implementation of a
personal spider that emanates from the documents in the personal
web. Whereas the context panel allows the user and agent to
collaboratively navigate, the personal spider works on its own (as
the user sleeps!), collecting metadata about the documents near
the personal space. Our plan is to perform user tests of the quality
of n-degree neighborhoods, both for the user’s own
neighborhood, and for the neighborhoods of others (some expert
or group of experts). Will such neighborhoods provide better
search results than Google in some instances?
11. SUMMARY
The key contributions of our work are the introduction of
 An associative tree view of documents that can
be“programmed” by the end-user, i.e. the user can choose the
associations shown on node expansion.
 A system that integrates browsing, searching, citation
analysis, and blogging.
 A working version of an “associative source” API and
registry.
 A GUI that cohesively integrates the personal space, external
information sources, and the personal spaces of others.
Though the work is presented as a web application, most of the
ideas apply to the concept of a personal digital library as well.
12. ACKNOWLEDGMENTS
We would like to thank the 2003 senior project team at the
University of San Francisco for implementing the WebTop system
described in this paper.
13. REFERENCES
[1] Agrawal, R., Kiernan, J., Skirkant, R. Xu, Y., An XPathbased Preference Language for P3P, Proceedings of the
World Wide Web Conference, WWW2003, Budapest,
Hungary, 2003.
[2] DAML-S Coalition: Ankolekar, A. Burstein, M. Hobbs, J.
Lassila, O. Martin, D. McIlraith, S., Narayanan, S., Paolucci,
M. Payne, T. Sycara, K. and Zeng, H., DAML- S: Semantic
markup for Web services. In Proc. Int. Semantic Web
Working Symposium (SWWS), 411–430, 2001
[3] Baeza-Yates, R., Ribeiro-Neto, B., Modern Information
Retrieval. ACM Press, New York, 1999.
[4] Billsus, D., Pazzani, M.,Learning Probabilistic Models,.
Workshop Notes of "Machine Learning for User Modeling",
Sixth International Conference on User Modeling, Chia
Laguna, Sardinia, 1997.
[5] Bretzke H., Vassileva J., Motivating Cooperation in Peer to
Peer Networks, in P.Brusilovsky, A. Corbett, F.De Rosis
(eds.) Proceedings of the 9th International Conference, on
User Modelling, UM03, Johnstown, PA, Springer LNCS,
218-227, 2003
[6] Brin, S., Page, L. The anatomy of a large-scale hypertextual
Web search engine", Computer Networks and ISDN Systems,
30(1), pp107-117, 1998.
[7] Brooks, C., Durfee, E., Armstrong, A., .An Introduction to
Congregating in Multiagent Systems,. Proceedings of the
Fourth International Conference on Multiagent Systems, pp
79-86, 2000
[8] Brooks, C. Durfee, E.,Congregation Formation in Multiagent
Systems,. Autonomous Agents and Multiagent Systems,
Special Issue on Infrastructure for Agents, Multi-Agent
Systems and Scalable Multi-Agent Systems, 7(1-2),
July/September, 2003.
[9] Budzik, J., Hammond, K., "Watson: Anticipating and
Contextualizing Information Needs," 62nd Annual Meeting
of the American Society for Information Science, Medford,
NJ, 1999.
[10] Chakrabarti, S., van den Berg, M., and Dom, B.: Focused
Crawling: A New Approach to Topic-Specific Web Resource
Discovery. In Proceedings of the 8th International World
Wide Web Conference, Toronto, Canada, May 1999.
[11] Chau, M., Zeng, D., Chen, H., Huang, M.,Nendriawan, D.,
Design and Evaluation of a Multi-agent Collaborative Web
Mining System, Decision Support Systems, 988, 2002.
[12] Chen, H., Chung, Y., Ramsey, M., Yang, C., An Intelligent
personal spider (agent) for dynamic internet/intranet
searching,. Decision Support Systems, 23(1), pp. 41-58,
1998.
[13] Chen, H., Fan. H., Chau, M., and Zeng, D.: MetaSpider:
Meta-searching and Categorization on the Web. Journal of
the American Society of Information Science & Technology,
52(13) (2001), 1134-1147.
[14] Dreilinger, D. Howe, A., Experiences with selecting search
engines using metasearch, ACM Transactions on Information
Systems (TOIS), v.15 n.3, p.195-222, July 1997
[15] Flake, G., Lawrence, S., Giles, C, Coetzee, F., "SelfOrganization of the Web and Identification of Communities",
IEEE Computer, 35(3), pp 66-71, 2002.
[16] Foner, L. N. Yenta: A MultiAgent, ReferralBased
Matchmaking System. In Proceedings of The First
International Conference on Autonomous Agents, 301307, ACM Press, 1997.
[17] Glover, E., Tsioutsiouliklis, K., Lawrence, S. Pennock, D.,
Flake, G., Using Web Structure for Classifying and
Describing Web Pages, Proceedings of WWW02, Honolulu,
HA, 2002.
[18] Green, N., Ipeirotis,P., Gravano, L., SDLIP + STARTS =
SDARTS a protocol and toolkit for metasearching,
Proceedings of the first ACM/IEEE-CS joint conference on
Digital libraries, p.207-214, January 2001, Roanoke,
Virginia, United States.
[19] Gupta, M., Judge, P., Ammar, M., Peer to peer systems: A
reputation system for peer-to-peer networks, Proceedings of
the 13th international workshop on Network and operating
systems support for digital audio and video, 2003.
[20] Haveliwala, T., Topic-sensitive PageRank. In Proceedings of
the Eleventh International World Wide Web Conference,
Honolulu, Hawaii, May 2002.
[21] Huang, Z., Chung, W., Ong, T., Chen, H., A Graph-Based
Recommender System for Digital Library, in: Proceedings of
the Second ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL'02), Portland, Oregon, July 14-18, 65-73,
(2002).
[22] Huynh, D., Karger, D., and Quan, D. Haystack: a platform
for creating, organizing and visualizing information using
RDF. Semantic Web Workshop, WWW2002 (May 2002).
[23] Jeh, G., Widom, J., Scaling personalized web search,
Proceedings of the twelfth international conference on World
Wide Web, May 20-24, 2003, Budapest, Hungary.
[24] Kleinberg, J., Authoritative Sources in a Hyperlinked
Environment. J. ACM 46(5): 604-632 (1999).
[25] Konstan, J. Miller, B., Maltz, D. Herlocker, J., Gordon, L,
and Riedl, J., GroupLens: Applying Collaborative Filtering
to Usenet News, Communications of the ACM, 40, 3,1997.
[26] Lawrence, S., Giles, C., "Text and Image Metasearch on the
Web", Proceedings of the International Conference on
Parallel and Distributed Processing Techniques and
Applications, pp 829-835, CSREA Press, 1999.
[29] Lieberman, H., Fry, C., Weitzman, L., Exploring the Web
with Personal Reconnaissance Agents,. Communications of
the ACM, 44(8), August, 2001.
[30] Lieberman, H., Letizia: An agent that assists Web browsing,.
Proceedings of the International Joint Conference on
Artificial Intelligence( IJCAI-95), Montreal, 1995.
[31] Maglio, P., Barrett, R., Campbell, C., Selker, T., SUITOR:
An Attentive Information System,. 2000 International
Conference on Intelligent User Interfaces,, New Orleans,
LA, ACM Press.
[32] Marsh, S. and Masrour, Y. 1997. Agent Augmented
Community Information — The ACORN Architecture. In
Proceedings of CASCON’97, Meeting of Minds, 1997.
[33] McHugh, J., “Google vs. Evil”, Wired Magazine, 11.01,
http://www.wired.com/wired/archive/11.01/google_pr.html,
Janurary 2003.
[34] Mischo, W., Habing, T., Cole, T., Integration of
simultaneous searching and reference linking across
bibliographic resources on the web, Proceedings of the 2003
Joint Conference on Digital Libraries, 2003.
[35] Open Archives Initiative,
http://www.openarchives.org/
[36] Open Citation Project, http://opcit.eprints.org/
[37] Reyes-Farfan, N., Sanchez, J., Personal Spaces in the
Context of OAI, proceedings of the Joint Conference on
Digital Libraries, 2003.
[38] Rhodes, B., Maes, P., Just-in-time information retrieval
Agents, IBM Systems Journal, 39,(3-4), pp685-704, 2000
[39] Shi, S., Yu, J., Yang, G., Wang, D., Distributed Page
Ranking in Peer-to-Peer Systems, Proceedings of 2003
International Conference on Parallel Processing , October,
2003
[40] Shehory, O., Kraus,S., Mehods for Task Allocation via
Agent Coalition Formation., Artificial Intelligence, 101,
pp165-200, 1998.
[41] Short, S., Building XML Web Services for the Microsoft .Net
Platform, Microsoft Press, 2002.
[42] Suel, T., Mathur, C. Wu, J. Zhang, J., A Peer-to-Peer
Architecture for Scalable Web Search and Information
Retrieval, Proceedings of WWW 2003, 2003.
[43] TheBrain, www.thebrain.com.
[44] UDDI Home Page, http://www.uddi.org.
[45] WC3 P3P Group, http://www.w3.org/P3P.
[46] WC3 WSDL Specification,
http://www.w3.org/TR/wsdl
[47] XLink Language Specification, http://www.w3.org/TR/xlink/
[27] Leake, D., Scherle, R., Budzik, J., Hammond, K., .Selecting
Task-Relevant Sources for Just-in-Time Retrieval (1999).,
Proceedings of the AAAI-99 Workshop on Intelligent
Information Systems), AAAI Press.
[48] Wolber, D., Kepe M., Ranitovic, R., Exposing Document
Context in the Personal Web,. Proceedings of the
International Conference on Intelligent User Interfaces (IUI
2002), San Francisco, CA..
[28] Lessig, L., The Future of Ideas: The Fate of the Commons in
a Connected World, Random House, 2001.
[49] Wolber, D., Brooks, C., Associative Agents and Sources,
submitted to the World Wide Web Conference (WWW 2004).
[50] Yu, C., Meng, W., Wu, W., Liu, K., Efficient and Effective
Metasearch for Text Databases Incorporating Linkages
among Documents. In Proc. of SIGMOD 01, CA, 2001.
[51] Zinn, H., The People’s History of the United States, Harper
and Row, 1980.
[52] Zlotkin, G., Rosenschein, J., Coalition, Cryptography, and
Stability: Mechanisms for Coalition Formation in Task
Oriented Domains,. Proceedings of the National Conference
on Artificial Intelligence, Seattle, WA, pp 432-437, 1994.
Download