On Usage Metrics for Determining Authoritative Sites Abstract Explosion in the number of ‘relevant’ sites (on any topic) on the Web creates a need to identify additional measures, such as quality or reliability, to rank content on the Web. Recent research has used the term authoritative to represent a broader notion of the goodness of a web page. An authoritative page is one that is both relevant and reliable, in the spirit of an “authority” on a subject. Conventional methods to rank authoritative content on the Web do so based on content interpretation of sites (e.g. most search engines), structural relationships between sites (e.g. Google) and experts’ analyses (e.g. Forrester’s rankings). In this paper we argue for an alternative approach to rank authoritative content, one based on the actual usage of the Web. In addition to lending itself to new automated approaches to rank content, preliminary results from our approach indicate that a usage-based approach may, in fact, be superior to conventional approaches for this problem. In this paper we present a new framework to identify usage metrics and list several usage metrics based on this framework. Based on actual usage data of 30,000 users’ surfing behavior, we demonstrate that a simple aggregate usage measure can in fact characterize actual bookings at travel sites better than other conventional methods that rank online travel. On Usage Metrics for Determining Authoritative Sites 1. Introduction Different Web pages, even if very similar in subject matter and content, need not be equally “good”. Recent studies (O’Leary 1999, Lee et al. 1998, Garcia-Molina et al. 1997, Goh et al. 1999) have addressed issues relating to the reliability of information provided on a page – for example, not all pages deemed “relevant” based on content search are “good”. Implicit is the notion that automatically inferring “quality” solely based on the interpreted relevance of the content is questionable. Hence, a more general notion than relevance is needed to characterize information on the Web. In this paper we adopt the term “authoritative”, first proposed in (Kleinberg 1999), to represent a broader notion of the goodness of a web page. Intuitively an authoritative page is one that is both relevant and reliable (in the spirit of an “authority” on a subject). Conventional methods to rank authoritative content on the Web do so based on content interpretation of sites (e.g. most search engines), structural relationships between sites (e.g. Google) and experts’ analyses (e.g. Forrester’s rankings). In this paper we argue for an alternative approach to rank authoritative content, one based on actual usage of the Web. Usage characterizes “bestowed authority” of a certain kind - authority that is conferred upon a site by users who choose to visit the site. We present a new framework to identify usage metrics and list several usage metrics based on this framework. Based on actual usage data of 30,000 users’ surfing behavior, we demonstrate that a simple aggregate usage measure can, in fact, characterize actual bookings at travel sites better than other conventional methods that rank online travel. 1 2. Usage Metrics In this section we present a framework to identify usage metrics that indicate authoritativeness of web content and use this framework to identify specific usage metrics. In terms of using these metrics to identify authoritative content, we propose the use of filters to combine these metrics in individualized ways and present an example of one such filter - a simple aggregate measure that combines various individual metrics into a single metric that can rank content on the Web. It is important to note that these metrics should be derived from user-centric data as opposed to site-centric data. Site-centric data is data collected by individual sites about users’ accesses to pages within that site – this data is usually stored in logfiles (Sen et al. 1998) at each site. Taken individually, site-centric data represents an incomplete picture of user behavior on the Web since it does not capture a user’s activity on external sites. User-centric data, on the other hand, is data collected at the user level and thus captures entire histories of web surfing behavior for each user. This data is available through market data vendors, such as MediaMetrix and Netratings, who provide a randomly chosen subset of users various incentives to permit tracking software to be installed on the client machines. Note that the purpose of usage-metrics is to compare different sites or pages, hence these metrics need to be computed for each page/site in the population. Theses metric can, therefore, be viewed as attributes of pages or sites that are computed from user-centric data. Hence the metric hits/session, for example, measures the average number of hits of a certain site (or page) per session and this metric can therefore be used to rank sites based on this average for each site. For simplicity in representation we will use the term hits/session as opposed to hits/session(site). In order to develop a set of usage-based metrics that measure different aspects of the authoritativeness of a site, we first present a framework within which these different metrics can 2 be identified. First, we note that user-centric data on Web access can be viewed at different levels of granularity – hits, sessions and users. A hit is a single page accessed, a session is a collection of hits by a single user during a specified time span1 and a user is an individual whose browsing behavior can be viewed as a collection of sessions. For each level of granularity, we identify quantity and quality related metrics in the following manner. Quantity metrics are identified by asking how many hits/sessions/users for a given page/site. Quality related metrics are identified by asking how ‘good’ are the hits/sessions/users from each page/site’s point of view? Table 2.1 presents this framework to identifying usage metrics and lists 12 usage metrics that can be identified using this approach. Quality Metrics Granularity Level Hits Sessions Metric Time per hit Time per session Hits per session Entry rate Exit rate Peak rate Users Time per user Hits per user Sessions per user Definition (Total time spent on p) / (Total no. of hits to p) (Total time spent on p) / (Total no. of sessions with p) (Total no. of hits to p) / (Total no. of sessions with p) (No. of sessions that begin with p) / (Total no. of sessions with p) (No. of sessions that end with p) / (Total no. of sessions with p) (No. of sessions in which the user spends maximum time on p) / (Total no. of sessions with p) (Total time spent on p) / (Total number of users of p) (Total no. of hits to p) / (Total no. of users of p) (Total no. of sessions with p) / (Total no. of users of p) Quantity Metrics Metric Definition Number of hits Number of sessions No. of hits of a page/site No. of sessions in which a page/site was accessed Number of users No. of users who accessed a given page/site Table 2.1. Framework for deriving usage metrics. (Each metric is derived for a specific page/site ‘p’) 1 A common rule of thumb to determine how to aggregate hits into sessions is to observe the time associated with each hit and group consecutive hits that are within 30 minutes of each other into a session. 3 These 12 metrics are not claimed to be ‘complete’ in any sense but are illustrative examples of usage metrics that can be derived from this framework. The purpose here is to identify usage metrics that contribute to some aspect of authoritativeness of a site. Much prior work provides arguments supporting one or more of the metrics listed in Table 2.1. In particular, research in diverse areas such as decision theory, information systems, marketing and computer science provides different types of justification for using these metrics. Rather than discussing each of them in detail we refer the reader to Tables 2.2 through 2.4 for a brief summary. Studies Theories (Related) Summary Fredrickson and Kahneman (1993), Kahneman (1994) Peak and end Rule Tversky and Kahneman (1974), Van Bruggen et al. (1998), Wansink (1998) Anchoring and Adjustment Individuals heavily weight the peak moment during an episode, and also strongly attend to how the episode ended. Decisions are characterized by excess reliance on the starting point and insufficient adjustment for subsequently considered information. Related Web Usage Metrics in Table 2.1 Peak and Exit rates. Entry Rates Table 2.2. Relevant theories in Decision Science and derived web usage metrics Studies Bailey & Pearson (1983), Davis (1989); Igbaria, Guimaraes and Davis (1995); Igbaria and Zinatelli (1997) Lee (1986) Raymond (1986) Main Theories Suggested Metrics to Measure IT Usage The Tech. Acceptance Model (TAM) holds that beliefs influence attitudes, which in turn lead to intentions, which then generate behaviors. Thus IT usage is determined by a behavioral intention to use a system. The actual amount of time spent, frequency of use, number of software packages, number of business tasks, daily use Related Web Usage Metrics in Table 2.1 Time per hit/session/user and frequency related metrics such as hits/session, sessions/user, hits/user. Time spent per day Frequency of use Table 2.3. Related theories in IT Adoption and derived web usage metrics In this section, thus far, we presented a framework to identify usage metrics that determine authoritativeness of web content, identified several individual metrics and presented a brief summary of research in different areas that can be used to motivate these metrics. In terms of 4 using these metrics to identify authoritative content on the Web we need further research to develop aggregate measures that combine these metrics in different ways. Each aggregate measure that combines different usage metrics to rank content can be thought of as a ‘filter’. In this paper we do not argue for any specific filter, but rather demonstrate that a simplistic filter based on usage metrics outperforms current commercial rankings of content on the Web in this particular experiment, which we have no reason to think is unrepresentative or atypical. Studies Novak & Hoffman (1997), DreZe & Kalyanam (1998), Korgaonkar & Wolin (1999) Pitkow (1999), Cooley et al. (1997, 1999) Main Theories Suggested Metrics Related Web Usage Metrics in Table 2.1 Metrics for Web Marketing . Web Usage Analyses Frequency, reach, page views/hits, depth, number of sessions, session duration, click rate/ click-through rate, average time on page. Lists several site-centric and user-centric metrics such as hits, reach, site length, duration, frequency, reading time, entry site, exit site, session length, path length, number of sessions. All 12 metrics in Table 2.1 measure equivalent concepts as suggested in these studies. Table 2.4. Related theories in Marketing and Web Usage Analyses Below we describe a simple linear combination (SLC) of the rankings from the various metrics to determine one single aggregate usage metric that can be used to rank content on the Web. Each of the m metrics can be individually used to rank sites. This results in m independent rankings – one from each metric. A simple approach to combining the different rankings is to compute a score for each site in the following manner. Assume that rankings, based on each metric, up to rank p are considered useful (below that the site is ranked too low and can be equivalently dropped from the ranking). In the entire set of m rankings, for each site, let cp be the number of rankings in which the site is ranked pth. An aggregate score for the site that equally weights all the rankings can be computed as p*c1 + (p-1)*c2 + … + 1*(cp). Let this score be denoted as the UsageSLC score for the site. A ranking of sites based on this UsageSLC score is one 5 possible filter that can be used to construct an aggregate ranking scheme from the individual set of usage metrics. In the next section we show that even this simplistic filter outperforms several widely accepted commercial methods for ranking authoritative content. 3. Results The results described in this paper use user-centric click-stream data provided by a leading market data vendor. The data cover user-centric web surfing behavior for a panel of approximately 30,000 users over a period of fourteen months. We restrict our focus to identifying authoritative sites related to online travel for the following reasons. First, online travel was an early Internet success story and accounts for a significant percentage of ecommerce revenues today. Second, presumably travel sites being authoritative should translate into more bookings for the site. Hence rankings of travel sites can be validated by studying how closely they characterize actual bookings at various sites. The data vendors had pre-categorized URLs into appropriate categories, one of which was travel. We preprocessed the data such that only sessions with access to at least one travel site were retained. For each travel site in the domain the UsageSLC score was computed assuming p =20, i.e. any ranking of a site below 20th for any metric would not contribute to the aggregate authority score. In Table 3.1 below we present the rankings based on the usage metric and compare it with rankings of sites derived from three popular search engines (Altavista, Google and DirectHit) based on a query of ‘online travel agents’. Under the fact that results using search engines could differ significantly based on the specific query, the results in Table 3.1 demonstrate the superiority of the UsageSLC metric-based ranking over search engine rankings at least in the context of finding authoritative online travel sites. 6 1 2 3 4 5 6 7 8 9 10 UsageSLC Ranking Altavista Google DirectHit travelocity yahoo expedia aol mapquest netscape excite mapsonus preview amtrak travelocity 4airlines travelzoo smartpages gochicagoland sabrebts eastmanvoyages biotactics indiatourisme mytravelguide Yahoo Astanet a2btravel budgettravel Quik airlinesonline e-tik chattanoogaful 1cruise Palaces-tours travelocity ten-io flifo iecc web-travel-secret bargainholidays airlinesonline travelenvoy travelagentrate travelinfoonline Table 3.1 Comparison of UsageSLC ranking and ranking based on popular search engines. We also compared the usage-based rankings to popular approaches based on a combination of data and experts’ opinions. Three such rankings are based on data published by Forrester Research Group, Lycos Top 5% and Go2Net’s Hot100 Rankings. Forrester is a leading internet research firm, Dreze & Zufryden (1997) used Lycos Top 5% sites as a dependent variable in measuring web-site efficiency and Go2Net’s 100Hot is the first industry-by-industry site ranking service (its ranking is based on usage analysis by compiling its listings from the analysis of log files from cache servers located at strategic points throughout the Internet)2. The methodology used by these firms in ranking online travel sites is not disclosed and the rankings are widely used in industry to determine the best online firms and in strategic decision making. Compared to automatic search engines, these rankings have much higher face validity since they are semimanual and contain well-known travel sites. Hence comparison of the UsageSLC rankings to these other rankings requires the use of a proxy to measure the authoritativeness of sites. In the case of online travel a reasonable assumption is to use actual bookings data as a proxy for how authoritative a site is since the business model of online travel is based on generating bookings. 2 http://www.go2net.com/corporate/advertising/100hot/ 7 However user-centric data does not directly contain information on actual bookings made at travel sites – the data contains only URLs accessed by users at various travel sites. In conjunction with the market data vendor, heuristics were developed that identified bookings based on patterns in specific URL strings. Initially these heuristics were computed for thirteen sites only since identifying these involved actually completing a booking at the site. Table 3.2 lists the heuristics for these 13 sites. We acknowledge the limitations of using such heuristics but currently data collected by market data vendors do not have information on actual purchases made and hence such heuristics are necessary. Further, these heuristics are actually used by market research firms. In Table 3.2 we compare the rankings based on actual bookings at various sites to rankings from Forrester, Lycos and Go2Net. Site travelocity expedia previewtravel priceline.com Itn.net cooltravelassistant cheapTickets thetrip flifo onetravel nwa americanair Lowest fare Booking Heuristics Lycos Top Go2Net Session assumed to contain a booking if a URL Booking Forrester 5% Ranking 100Hot in the session indicates access in the https secure Ranking Ranking Ranking mode and the URL contains the following string: 1 2 6 19 2 3 4 5 6 7 8 9 10 11 12 13 3 11 20 33 18 22 35 36 30 16 8 28 3 1 11 27 12 16 20 22 23 10 13 15 2 21 17 28 50 98 25 100 99 13 32 63 “REVIEW.CTL?” “CONFIRM.ASP" or “RESERVAT.HTML” “STATE=CONFIRM” “/CONFIRMATION” “STORE?STAMP=” “QSCR=RESV” “AIRFARESUBMIT.CTL?” “RESGUEST” “CU.CGI?” “CU.CGI?” “CONFIRMATION” “HOLD_FLAG=Y” “RESERVE.CTL?” Table 3.2: The rankings by Forrester, Lycos Top 5% and Go2Net/MetaCrawler 100Hot 8 To test how the usage-based approach compares with the other 3 approaches, we performed 4 Wilcoxon Signed-Rank tests3 separately to determine if the rankings induced by each measure are significantly different from the rankings induced by bookings volume. The results are summarized in Table 3.3. Based on these results it appears that only rankings based on UsageSLC are not significantly different from rankings induced by actual bookings. Hence based on preliminary results it appears that even a simple linear combination of usage metrics presented in this paper outperforms industry standard rankings in the context of predicting bookings at online travel sites in this specific experiment. H0-A: UsageSLC vs. Bookings H0-B: Forrester vs. Bookings H0-C: Lycos Top 5% vs. Bookings H0-D: Go2Net 100Hot vs. Bookings N 13 13 13 13 V 62 80 83 85 Adjusted Z P-Value 1.140 0.273 2.696 0.007 2.938 0.003 3.114 0.002 Table 3.3: The results of the Wilcoxon Signed-Rank Test (=0.05) To summarize, we have presented results from comparing a simplistic usage-based approach to ranking sites to traditional, industry-standard approaches in the context of online travel. Preliminary results indicate that simple usage-based metrics seem to outperform standard ranking methods in the context of characterizing bookings in the online travel industry. These results seem to indicate that usage-based metrics could be invaluable if incorporated in actual search techniques on the Internet. No current search engine provides such features today. In current research we are working on methods to build various usage-based filters to identify authoritative content and on usage-driven search technologies. 3 Wilcoxon signed-rank test is to test whether there is difference between the two-paired measurements when they are ordinal or rank-ordered (Hollander & Wolfe 1999). 9 References Bailey, J. E. and Pearson, S. W., “ Development of a Tool for Measuring and Analyzing User Satisfaction,” Management Science, 29 (5), P530-545, 1983 Cooley, Robert, Bamshad Mobasher and Jaideep Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web", Proceedings of ICTAI, Nov. 1997 Cooley, Robert, Bamshad Mobasher and Jaideep Srivastava, "Data Preparation for Mining World Wide Web Browsing Patterns", Knowledge and Information System, 1(1), Spring,1999 Davis, F., “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology”, MIS Quarterly, 13 (3) 319-340, 1989 Dreze, X., and F. Zufryden, "Testing Web Site Design and Promotional Content." J. of Advert. Resch. 37 (2), 1997 Fredrickson, B. L. and Kahneman, D., “Duration Neglect in Retrospective Evaluation of Affective Episodes,” Journal of Personality and Social Psychology, 65, 45-55, 1993 Garcia-Molina, H.; Papakonstantinou, Y.; Quass, D.; Rajaraman, A.; Sagiv, Y.; Ullman, J.; Vassalos, V.; and Widom, J., “The TSIMMIS Approach to Mediation: Data Models and Languages,” Journal of Intelligent Information Systems (8:2), 1997, pp. 117-132. Goh, C.; Madnick, S. and Siegel, M., “Context Interchange: New Features and Formalisms for the Intelligent Integration of Information”, In ACM Transactions on Information Systems, July 1999. Hoffman, Donna; Novak, Thomas and Schlosser, Ann, “ Consumer Control in Online Enviroments”, Working Paper, http://ecommerce.vanderbilt.edu , 2000 Hollander, Myles and Wolfe, A., Douglas, “Nonparametric Statistical Methods”, Wiley-Interscience, 1999 Igbaria, Magid; Guimaraes, T.; Davis, Gordon; “Testing the Determinant of Microcomputer Usage via a Structural Equation Model”, Journal of Management Information Systems, 11 (4), 87-114, 1995 Igbaria, Magid; Zinatelli, N.; Paul Cragg; Angele L M Cavaye, “Personal computing acceptance factors in small firms: A structural equation model”, MIS Quarterly; 21 (3), Sept. 1997 Kahneman, D., “New Challenges to the Rationality Assumption”, Journal of Institutional and Theoretical Economics, 150(1), P18-36, 1994 Kleinberg, J., “Authoritative Sources in a Hyperlinked Environment”, Association for Computing Machinery Journal of the Association for Computing Machinery (46:5) 1999, pp. 604-632 Korgaonkar, Pradeep K. and Lori D. Wolin, “A Multivariate analysis of Web usage,” J. of Advert. Resch, 39, 1999 Lee, D. S., “Usage Patterns and Sources of Assistance to Personal Computer Users”, MIS Quarterly, 10 (4), 1986 Lee, T.; Bressan, S., and Madnick, S., “Source Attribution for Querying Against Semi-Structured Documents,” 1st Workshop on Web Information and Data Management, ACM Conf. on Info. and Knowledge Management, 1998 Novak, P. Thomas and Donna L. Hoffman, "New Metrics for New Media: Toward the Development of Web Measurement Standards", World Wide Web Journal, Project 2000: www2000.ogsm.vanderbilt.edu, 2(1), 1997 O’Leary, E. Daniel, “Internet-Based Information and Retrieval Systems, Decision Support Systems, 27, 1999. Pitkow, E. James, “Summary of WWW Characterizations”, Xerox Palo Alto Research Center Working Paper, 1999 Raymond, L., “Organizational Characteristics and MIS Success in the Context of the Small Business”, MIS Quarterly, 9(1), 37-52, 1986 Sen, Shahana, Padmanabhan, Balaji et al., “The Identification And Satisfaction Of Consumer Analysis-Driven Information Needs Of Marketers on The WWW”, European Journal Of Marketing, Vol. 32 (7/8), 1998 Tversky, A. and Kahneman, D., "Judgement under Uncertainty: Heuristics and Biases", Science, Vol. 185, Sep.1974 Van Bruggen, Gerrit H.; Smidts, Ale; Wierenga, Berend, “Improving Decision Making by Means of a Marketing Decision Support System”, Management Science, 44(5): 645-65, May 1998 Wansink, Bria,”An Anchoring and Adjustment Model of Purchase Quantity Decisions”, Journal of Marketing Research, Chicago; 35 (1) 71-82, Feb 1998 Xavier DreZe and Kirthi Kalyanam, “ The Ecological Inference Problem in Internet Measurement: Leveraging Web Site Log Files to Uncover Population Demographics and Psychographics”, 1998 10