To appear in ACM Hypertext'98, Pittsburgh, USA, June 1998 Dynamic Bookmarks for the WWW Managing Personal Navigation Space by Analysis of Link Structure and User Behavior Hajime Takano and Terry Winograd Computer Science Department, Stanford University Gates Building, Stanford, CA 94305, USA E-mail: htakano@db.stanford.edu, winograd@cs.stanford.edu ABSTRACT hub pages are also important clues to extend PNS. This paper describes a management tool to support revisiting WWW pages, which we call "WWW Dynamic Bookmark (WDB)." WDB watches and archives a user's navigation behavior, analyses the archive, and shows analyzed results as clues for revisiting URLs. We have integrated link analysis and user behavior analysis to evaluate WWW page importance. WDB presents a list of sites that a user has visited, in importance order, via a landmark list in each site, and showing relationships among sites. Experimental implementation shows that importance calculation and structure displays help users to pick up useful URLs. KEYWORDS: WWW navigation, bookmark, link analysis, user behavior analysis Bookmark Problems REQUIREMENTS FOR WDB A variety of services have been developed, which help users find relevant URLs, such as search engines, and WWW or E-mail magazines. WWW users pick up some of these URLs and start exploring information space from them. Therefore, it is difficult for users to remember URLs they have visited. WWW Dynamic Bookmark (WDB) needs to support users in finding URLs that they have visited before but did not realize were important enough to add to a bookmark. Personal Navigation Space Through exploring WWW space, users come to understand the structures and characteristics of the space they have explored. We call this subset of the WWW the user's "Personal Navigation Space (PNS)." To revisit URLs, a user selects a site from his/her global view of PNS, and decides on a navigation direction by his/her memory about the site in PNS. [1] actually reported that about 60 % of URLs a person accesses have already been visited by that user. Furthermore, [1] also showed that typical navigation patterns to find new pages often use "hub pages" such as a home page, a directory page, or a search result. Therefore, such A bookmark function in a WWW browser could be a tool for PNS management. However the usage of bookmarks in exploration is not so frequently. The main reason why users make infrequent use of bookmarks is the management overhead, such as: (1) interrupting navigation to add a new URL, (2) varying the rule for whether URLs should be automatically stored, (3) managing the continually growing number of URL, (4) adding structure to manage many URLs, and (5) removing old and useless URLs later. Management Strategies To solve the problems mentioned above, WDB automatically generates a bookmark, as part of a wellstructured PNS. The PNS consists of three layers, partitioning the archive into sites, calculating importance of page and sites, and finding relations among sites. ALGORITHMS FOR THREE-LAYER MANAGEMENT Partitioning The first step is to divide navigation history into clusters, each of which is a set of WWW pages having a thematic organization. To determine cluster boundaries, we use URL structure such as hostname, and directory, because authors of WWW sites tend to organize their documents by using directory structures. Partitioning is performed as follows: (1) Gathering pages in the same WWW server (site cluster) (2) Counting the number of pages in the same directory (3) If the number of pages in a directory is more than the threshold, those pages construct a subcluster under the site cluster. After making clusters, a home page, which is a representative of each cluster, is selected by using a heuristic. If there is a file named "/", "/welcome.html", "/index.html", or "/home.html", the file becomes the home page of the cluster. If there isn't, the first page the user encountered in the cluster becomes the home page. Page Rank Even after this partitioning, each cluster still has many URLs. To give the user the most relevant ones, we calculate page rank by analysis of link structure and user behavior. This is because the pages that a user thinks of as important are frequently revisited and have many relations with other visited pages. Actually, [2] reported that second-order connectedness is a useful measure to define landmark pages. The second-order connectedness is the number of pages a user can reach from or to a page in a distance of two or fewer links. The equation of our page rank algorithm extends the one in [2] with consideration of the number of visited or unvisited links. That is, instead of simply counting existing links, our algorithm uses the number of visits for the count of visited link and gives lower weight to the number of unvisited links. Therefore, the equation for calculating page rank PRi is: PRi w1 ( FOCi SOCi ) w2 ( BFOCi BSOCi ) w3 VCi where w1 w2 w3 1 . And each of parameters are calculated as followings: FOCi VLik log(ULi ) SOCi FOCk (VL0 ) log( FOCk (VL0 ) ) BFOCi VLki log(UBi ) BSOCi BFOCk (VL 0 ) log( BFOCk (VL 0 ) ) VCi is the number of visit on Page i where VLik is the number of visit on Link from Page i to Page k, ULi is the number of unvisited link from Page i, and UBi is the number of unvisited Link to Page i. After passing a pre-defined expiration period, the visit count of a visited link is reduced and finally is reduced to being an unvisited link. Therefore, over time, the page rank of an unused page will decline. Since the score of “hub pages” such as directory pages (e.g., Yahoo) becomes much higher if even a few links are followed, we use a logarithmic scale to reduce the count of unvisited links. Site Rank After calculating page ranks of every page, a site rank is calculated as the sum of page rank of pages in the same cluster. A site including many pages a user has visited or highly important pages will have high site rank. Resources Pages having links beyond boundaries of a site cluster are resources of the site. Such resource pages are useful for jumping from one site to another, or finding related pages that are linked from them. EXPERIMENTAL IMPLEMENTATION The WDB prototype consists of three components, Tracking Proxy, URL Database, and Bookmark Viewer. The tracking proxy is placed between the WWW browser and WWW servers. It tracks every page access, and stores a record into the URL Database. The database archives the entire navigation history and analyzes it using the three-layer management algorithms. As shown in Figure.1, the bookmark viewer shows user's navigation history in three layers. The left area shows home page titles of sites, ordered by the site rank. The user can select one of these and see its landmark pages in the center area, organized by link structure. The font color represents page rank of a landmark, since it cannot be used for ordering without losing the structure. When a user clicks on the title of any page, it is shown in a WWW browser. The right area shows resource pages that have links to pages in the site user specified in the left area. This list is also ordered by page rank. The number of items in each area can be changed by slider at the top of the area. When a user moves it to maximum, s/he can see all sites, all pages in the site, or all pages linking to the site are shown. CONCLUSION We are still evaluating the usability and efficiency of WDB, and evaluating parameters for the page rank calculation. From our initial experiments, we are encouraged that WDB gives users a valuable tool for navigating in a Personal Navigation Space. REFERENCES 1. Abrams, David and Baecker, Ron, How People Use WWW Bookmarks, in Proc. CHI'97 2. Sougata Mukherjea and Yoshinori Hara, Focus+Context Views of World-Wide Web Nodes, in Proc. ACM Hypertext'97, ACM Press, Figure 1. Bookmark Viewer of WDB pp.187-196.