Word_Template.rtf - The Stanford University InfoLab

advertisement
To appear in ACM Hypertext'98, Pittsburgh, USA, June 1998
Dynamic Bookmarks for the WWW
Managing Personal Navigation Space
by Analysis of Link Structure and User Behavior
Hajime Takano and Terry Winograd
Computer Science Department, Stanford University
Gates Building, Stanford, CA 94305, USA
E-mail: htakano@db.stanford.edu, winograd@cs.stanford.edu
ABSTRACT
hub pages are also important clues to extend PNS.
This paper describes a management tool to support revisiting
WWW pages, which we call "WWW Dynamic Bookmark
(WDB)." WDB watches and archives a user's navigation
behavior, analyses the archive, and shows analyzed results
as clues for revisiting URLs. We have integrated link
analysis and user behavior analysis to evaluate WWW page
importance. WDB presents a list of sites that a user has
visited, in importance order, via a landmark list in each site,
and showing relationships among sites. Experimental
implementation shows that importance calculation and
structure displays help users to pick up useful URLs.
KEYWORDS: WWW navigation, bookmark, link analysis,
user behavior analysis
Bookmark Problems
REQUIREMENTS FOR WDB
A variety of services have been developed, which help users
find relevant URLs, such as search engines, and WWW or
E-mail magazines. WWW users pick up some of these URLs
and start exploring information space from them. Therefore,
it is difficult for users to remember URLs they have visited.
WWW Dynamic Bookmark (WDB) needs to support users
in finding URLs that they have visited before but did not
realize were important enough to add to a bookmark.
Personal Navigation Space
Through exploring WWW space, users come to understand
the structures and characteristics of the space they have
explored. We call this subset of the WWW the user's
"Personal Navigation Space (PNS)." To revisit URLs, a user
selects a site from his/her global view of PNS, and decides
on a navigation direction by his/her memory about the site in
PNS. [1] actually reported that about 60 % of URLs a person
accesses have already been visited by that user.
Furthermore, [1] also showed that typical navigation patterns
to find new pages often use "hub pages" such as a home
page, a directory page, or a search result. Therefore, such
A bookmark function in a WWW browser could be a tool
for PNS management. However the usage of bookmarks in
exploration is not so frequently. The main reason why users
make infrequent use of bookmarks is the management
overhead, such as: (1) interrupting navigation to add a new
URL, (2) varying the rule for whether URLs should be
automatically stored, (3) managing the continually growing
number of URL, (4) adding structure to manage many
URLs, and (5) removing old and useless URLs later.
Management Strategies
To solve the problems mentioned above, WDB
automatically generates a bookmark, as part of a wellstructured PNS. The PNS consists of three layers,
partitioning the archive into sites, calculating importance of
page and sites, and finding relations among sites.
ALGORITHMS FOR THREE-LAYER MANAGEMENT
Partitioning
The first step is to divide navigation history into clusters,
each of which is a set of WWW pages having a thematic
organization. To determine cluster boundaries, we use URL
structure such as hostname, and directory, because authors
of WWW sites tend to organize their documents by using
directory structures. Partitioning is performed as follows:
(1) Gathering pages in the same WWW server (site cluster)
(2) Counting the number of pages in the same directory
(3) If the number of pages in a directory is more than the
threshold, those pages construct a subcluster under the
site cluster.
After making clusters, a home page, which is a
representative of each cluster, is selected by using a
heuristic. If there is a file named "/", "/welcome.html",
"/index.html", or "/home.html", the file becomes the home
page of the cluster. If there isn't, the first page the user
encountered in the cluster becomes the home page.
Page Rank
Even after this partitioning, each cluster still has many
URLs. To give the user the most relevant ones, we calculate
page rank by analysis of link structure and user behavior.
This is because the pages that a user thinks of as important
are frequently revisited and have many relations with other
visited pages. Actually, [2] reported that second-order
connectedness is a useful measure to define landmark pages.
The second-order connectedness is the number of pages a
user can reach from or to a page in a distance of two or
fewer links.
The equation of our page rank algorithm extends the one in
[2] with consideration of the number of visited or unvisited
links. That is, instead of simply counting existing links, our
algorithm uses the number of visits for the count of visited
link and gives lower weight to the number of unvisited links.
Therefore, the equation for calculating page rank PRi is:
PRi  w1  ( FOCi  SOCi )  w2  ( BFOCi  BSOCi )  w3  VCi
where w1  w2  w3  1 . And each of parameters are
calculated as followings:

FOCi  VLik  log(ULi )


SOCi   FOCk (VL0 )  log(  FOCk (VL0 ) )
BFOCi  VLki  log(UBi )

BSOCi   BFOCk (VL 0 )  log(  BFOCk (VL 0 ) )

VCi is the number of visit on Page i
where VLik is the number of visit on Link from Page i to
Page k, ULi is the number of unvisited link from Page i, and
UBi is the number of unvisited Link to Page i.
After passing a pre-defined expiration period, the visit count
of a visited link is reduced and finally is reduced to being an
unvisited link. Therefore, over time, the page rank of an
unused page will decline. Since the score of “hub pages”
such as directory pages (e.g., Yahoo) becomes much higher
if even a few links are followed, we use a logarithmic scale
to reduce the count of unvisited links.
Site Rank
After calculating page ranks of every page, a site rank is
calculated as the sum of page rank of pages in the same
cluster. A site including many pages a user has visited or
highly important pages will have high site rank.
Resources
Pages having links beyond boundaries of a site cluster are
resources of the site. Such resource pages are useful for
jumping from one site to another, or finding related pages
that are linked from them.
EXPERIMENTAL IMPLEMENTATION
The WDB prototype consists of three components, Tracking
Proxy, URL Database, and Bookmark Viewer. The tracking
proxy is placed between the WWW browser and WWW
servers. It tracks every page access, and stores a record into
the URL Database. The database archives the entire
navigation history and analyzes it using the three-layer
management algorithms.
As shown in Figure.1, the bookmark viewer shows user's
navigation history in three layers. The left area shows home
page titles of sites, ordered by the site rank. The user can
select one of these and see its landmark pages in the center
area, organized by link structure. The font color represents
page rank of a landmark, since it cannot be used for ordering
without losing the structure. When a user clicks on the title
of any page, it is shown in a WWW browser. The right area
shows resource pages that have links to pages in the site user
specified in the left area. This list is also ordered by page
rank. The number of items in each area can be changed by
slider at the top of the area. When a user moves it to
maximum, s/he can see all sites, all pages in the site, or all
pages linking to the site are shown.
CONCLUSION
We are still evaluating the usability and efficiency of WDB,
and evaluating parameters for the page rank calculation.
From our initial experiments, we are encouraged that WDB
gives users a valuable tool for navigating in a Personal
Navigation Space.
REFERENCES
1. Abrams, David and Baecker, Ron, How People Use WWW
Bookmarks, in Proc. CHI'97
2. Sougata Mukherjea and Yoshinori Hara, Focus+Context Views of
World-Wide Web Nodes, in Proc. ACM Hypertext'97, ACM Press,
Figure 1. Bookmark Viewer of WDB
pp.187-196.
Download