On Usage Metrics for Determining Authoritative Sites

advertisement
On Usage Metrics for Determining Authoritative Sites
Abstract
Explosion in the number of ‘relevant’ sites (on any topic) on the Web creates a need to
identify additional measures, such as quality or reliability, to rank content on the Web.
Recent research has used the term authoritative to represent a broader notion of the
goodness of a web page. An authoritative page is one that is both relevant and reliable, in
the spirit of an “authority” on a subject.
Conventional methods to rank authoritative content on the Web do so based on content
interpretation of sites (e.g. most search engines), structural relationships between sites (e.g.
Google) and experts’ analyses (e.g. Forrester’s rankings). In this paper we argue for an
alternative approach to rank authoritative content, one based on the actual usage of the
Web. In addition to lending itself to new automated approaches to rank content,
preliminary results from our approach indicate that a usage-based approach may, in fact, be
superior to conventional approaches for this problem. In this paper we present a new
framework to identify usage metrics and list several usage metrics based on this
framework. Based on actual usage data of 30,000 users’ surfing behavior, we demonstrate
that a simple aggregate usage measure can in fact characterize actual bookings at travel
sites better than other conventional methods that rank online travel.
On Usage Metrics for Determining Authoritative Sites
1. Introduction
Different Web pages, even if very similar in subject matter and content, need not be equally
“good”. Recent studies (O’Leary 1999, Lee et al. 1998, Garcia-Molina et al. 1997, Goh et al.
1999) have addressed issues relating to the reliability of information provided on a page – for
example, not all pages deemed “relevant” based on content search are “good”. Implicit is the
notion that automatically inferring “quality” solely based on the interpreted relevance of the
content is questionable.
Hence, a more general notion than relevance is needed to characterize information on the
Web. In this paper we adopt the term “authoritative”, first proposed in (Kleinberg 1999), to
represent a broader notion of the goodness of a web page. Intuitively an authoritative page is one
that is both relevant and reliable (in the spirit of an “authority” on a subject).
Conventional methods to rank authoritative content on the Web do so based on content
interpretation of sites (e.g. most search engines), structural relationships between sites (e.g.
Google) and experts’ analyses (e.g. Forrester’s rankings). In this paper we argue for an
alternative approach to rank authoritative content, one based on actual usage of the Web. Usage
characterizes “bestowed authority” of a certain kind - authority that is conferred upon a site by
users who choose to visit the site. We present a new framework to identify usage metrics and list
several usage metrics based on this framework. Based on actual usage data of 30,000 users’
surfing behavior, we demonstrate that a simple aggregate usage measure can, in fact, characterize
actual bookings at travel sites better than other conventional methods that rank online travel.
1
2. Usage Metrics
In this section we present a framework to identify usage metrics that indicate authoritativeness of
web content and use this framework to identify specific usage metrics. In terms of using these
metrics to identify authoritative content, we propose the use of filters to combine these metrics in
individualized ways and present an example of one such filter - a simple aggregate measure that
combines various individual metrics into a single metric that can rank content on the Web.
It is important to note that these metrics should be derived from user-centric data as
opposed to site-centric data. Site-centric data is data collected by individual sites about users’
accesses to pages within that site – this data is usually stored in logfiles (Sen et al. 1998) at each
site. Taken individually, site-centric data represents an incomplete picture of user behavior on
the Web since it does not capture a user’s activity on external sites. User-centric data, on the
other hand, is data collected at the user level and thus captures entire histories of web surfing
behavior for each user. This data is available through market data vendors, such as MediaMetrix
and Netratings, who provide a randomly chosen subset of users various incentives to permit
tracking software to be installed on the client machines.
Note that the purpose of usage-metrics is to compare different sites or pages, hence these
metrics need to be computed for each page/site in the population. Theses metric can, therefore,
be viewed as attributes of pages or sites that are computed from user-centric data. Hence the
metric hits/session, for example, measures the average number of hits of a certain site (or page)
per session and this metric can therefore be used to rank sites based on this average for each site.
For simplicity in representation we will use the term hits/session as opposed to hits/session(site).
In order to develop a set of usage-based metrics that measure different aspects of the
authoritativeness of a site, we first present a framework within which these different metrics can
2
be identified. First, we note that user-centric data on Web access can be viewed at different
levels of granularity – hits, sessions and users. A hit is a single page accessed, a session is a
collection of hits by a single user during a specified time span1 and a user is an individual whose
browsing behavior can be viewed as a collection of sessions. For each level of granularity, we
identify quantity and quality related metrics in the following manner. Quantity metrics are
identified by asking how many hits/sessions/users for a given page/site. Quality related metrics
are identified by asking how ‘good’ are the hits/sessions/users from each page/site’s point of
view? Table 2.1 presents this framework to identifying usage metrics and lists 12 usage metrics
that can be identified using this approach.
Quality Metrics
Granularity
Level
Hits
Sessions
Metric
Time per hit
Time per
session
Hits per
session
Entry rate
Exit rate
Peak rate
Users
Time per user
Hits per user
Sessions per
user
Definition
(Total time spent on p) / (Total no. of
hits to p)
(Total time spent on p) / (Total no. of
sessions with p)
(Total no. of hits to p) / (Total no. of
sessions with p)
(No. of sessions that begin with p) /
(Total no. of sessions with p)
(No. of sessions that end with p) /
(Total no. of sessions with p)
(No. of sessions in which the user
spends maximum time on p) / (Total
no. of sessions with p)
(Total time spent on p) / (Total
number of users of p)
(Total no. of hits to p) / (Total no. of
users of p)
(Total no. of sessions with p) / (Total
no. of users of p)
Quantity Metrics
Metric
Definition
Number of
hits
Number of
sessions
No. of hits of a
page/site
No. of sessions in
which a page/site was
accessed
Number of
users
No. of users who
accessed a given
page/site
Table 2.1. Framework for deriving usage metrics.
(Each metric is derived for a specific page/site ‘p’)
1
A common rule of thumb to determine how to aggregate hits into sessions is to observe the time associated with
each hit and group consecutive hits that are within 30 minutes of each other into a session.
3
These 12 metrics are not claimed to be ‘complete’ in any sense but are illustrative examples of
usage metrics that can be derived from this framework. The purpose here is to identify usage
metrics that contribute to some aspect of authoritativeness of a site. Much prior work provides
arguments supporting one or more of the metrics listed in Table 2.1. In particular, research in
diverse areas such as decision theory, information systems, marketing and computer science
provides different types of justification for using these metrics. Rather than discussing each of
them in detail we refer the reader to Tables 2.2 through 2.4 for a brief summary.
Studies
Theories
(Related) Summary
Fredrickson and Kahneman
(1993), Kahneman (1994)
Peak
and
end Rule
Tversky and Kahneman
(1974), Van Bruggen et al.
(1998), Wansink (1998)
Anchoring
and
Adjustment
Individuals heavily weight the peak
moment during an episode, and also
strongly attend to how the episode
ended.
Decisions are characterized by excess
reliance on the starting point and
insufficient adjustment for subsequently
considered information.
Related Web Usage
Metrics in Table 2.1
Peak and Exit rates.
Entry Rates
Table 2.2. Relevant theories in Decision Science and derived web usage metrics
Studies
Bailey & Pearson (1983),
Davis (1989); Igbaria,
Guimaraes and Davis
(1995); Igbaria and
Zinatelli (1997)
Lee (1986)
Raymond (1986)
Main Theories
Suggested Metrics to
Measure IT Usage
The Tech. Acceptance
Model (TAM) holds that
beliefs influence attitudes,
which in turn lead to
intentions, which then
generate behaviors. Thus
IT usage is determined by
a behavioral intention to
use a system.
The actual amount of time
spent, frequency of use,
number
of
software
packages,
number
of
business tasks, daily use
Related Web Usage
Metrics in Table 2.1
Time per hit/session/user
and frequency related
metrics
such
as
hits/session, sessions/user,
hits/user.
Time spent per day
Frequency of use
Table 2.3. Related theories in IT Adoption and derived web usage metrics
In this section, thus far, we presented a framework to identify usage metrics that determine
authoritativeness of web content, identified several individual metrics and presented a brief
summary of research in different areas that can be used to motivate these metrics. In terms of
4
using these metrics to identify authoritative content on the Web we need further research to
develop aggregate measures that combine these metrics in different ways. Each aggregate
measure that combines different usage metrics to rank content can be thought of as a ‘filter’. In
this paper we do not argue for any specific filter, but rather demonstrate that a simplistic filter
based on usage metrics outperforms current commercial rankings of content on the Web in this
particular experiment, which we have no reason to think is unrepresentative or atypical.
Studies
Novak & Hoffman (1997),
DreZe & Kalyanam (1998),
Korgaonkar & Wolin (1999)
Pitkow (1999), Cooley et al.
(1997, 1999)
Main
Theories
Suggested Metrics
Related Web Usage
Metrics in Table 2.1
Metrics
for Web
Marketing
.
Web
Usage
Analyses
Frequency, reach, page views/hits, depth,
number of sessions, session duration, click
rate/ click-through rate, average time on
page.
Lists several site-centric and user-centric
metrics such as hits, reach, site length,
duration, frequency, reading time, entry site,
exit site, session length, path length, number
of sessions.
All 12 metrics in Table
2.1 measure equivalent
concepts as suggested
in these studies.
Table 2.4. Related theories in Marketing and Web Usage Analyses
Below we describe a simple linear combination (SLC) of the rankings from the various metrics
to determine one single aggregate usage metric that can be used to rank content on the Web.
Each of the m metrics can be individually used to rank sites. This results in m independent
rankings – one from each metric. A simple approach to combining the different rankings is to
compute a score for each site in the following manner. Assume that rankings, based on each
metric, up to rank p are considered useful (below that the site is ranked too low and can be
equivalently dropped from the ranking). In the entire set of m rankings, for each site, let cp be the
number of rankings in which the site is ranked pth. An aggregate score for the site that equally
weights all the rankings can be computed as p*c1 + (p-1)*c2 + … + 1*(cp). Let this score be
denoted as the UsageSLC score for the site. A ranking of sites based on this UsageSLC score is one
5
possible filter that can be used to construct an aggregate ranking scheme from the individual set
of usage metrics. In the next section we show that even this simplistic filter outperforms several
widely accepted commercial methods for ranking authoritative content.
3. Results
The results described in this paper use user-centric click-stream data provided by a leading
market data vendor. The data cover user-centric web surfing behavior for a panel of
approximately 30,000 users over a period of fourteen months. We restrict our focus to
identifying authoritative sites related to online travel for the following reasons. First, online
travel was an early Internet success story and accounts for a significant percentage of ecommerce
revenues today. Second, presumably travel sites being authoritative should translate into more
bookings for the site. Hence rankings of travel sites can be validated by studying how closely
they characterize actual bookings at various sites.
The data vendors had pre-categorized URLs into appropriate categories, one of which
was travel. We preprocessed the data such that only sessions with access to at least one travel site
were retained. For each travel site in the domain the UsageSLC score was computed assuming p
=20, i.e. any ranking of a site below 20th for any metric would not contribute to the aggregate
authority score. In Table 3.1 below we present the rankings based on the usage metric and
compare it with rankings of sites derived from three popular search engines (Altavista, Google
and DirectHit) based on a query of ‘online travel agents’. Under the fact that results using search
engines could differ significantly based on the specific query, the results in Table 3.1
demonstrate the superiority of the UsageSLC metric-based ranking over search engine rankings at
least in the context of finding authoritative online travel sites.
6
1
2
3
4
5
6
7
8
9
10
UsageSLC Ranking
Altavista
Google
DirectHit
travelocity
yahoo
expedia
aol
mapquest
netscape
excite
mapsonus
preview
amtrak
travelocity
4airlines
travelzoo
smartpages
gochicagoland
sabrebts
eastmanvoyages
biotactics
indiatourisme
mytravelguide
Yahoo
Astanet
a2btravel
budgettravel
Quik
airlinesonline
e-tik
chattanoogaful
1cruise
Palaces-tours
travelocity
ten-io
flifo
iecc
web-travel-secret
bargainholidays
airlinesonline
travelenvoy
travelagentrate
travelinfoonline
Table 3.1 Comparison of UsageSLC ranking and ranking based on popular search engines.
We also compared the usage-based rankings to popular approaches based on a combination of
data and experts’ opinions. Three such rankings are based on data published by Forrester
Research Group, Lycos Top 5% and Go2Net’s Hot100 Rankings. Forrester is a leading internet
research firm, Dreze & Zufryden (1997) used Lycos Top 5% sites as a dependent variable in
measuring web-site efficiency and Go2Net’s 100Hot is the first industry-by-industry site ranking
service (its ranking is based on usage analysis by compiling its listings from the analysis of log
files from cache servers located at strategic points throughout the Internet)2. The methodology
used by these firms in ranking online travel sites is not disclosed and the rankings are widely
used in industry to determine the best online firms and in strategic decision making. Compared to
automatic search engines, these rankings have much higher face validity since they are semimanual and contain well-known travel sites. Hence comparison of the UsageSLC rankings to
these other rankings requires the use of a proxy to measure the authoritativeness of sites. In the
case of online travel a reasonable assumption is to use actual bookings data as a proxy for how
authoritative a site is since the business model of online travel is based on generating bookings.
2
http://www.go2net.com/corporate/advertising/100hot/
7
However user-centric data does not directly contain information on actual bookings made
at travel sites – the data contains only URLs accessed by users at various travel sites. In
conjunction with the market data vendor, heuristics were developed that identified bookings
based on patterns in specific URL strings. Initially these heuristics were computed for thirteen
sites only since identifying these involved actually completing a booking at the site. Table 3.2
lists the heuristics for these 13 sites. We acknowledge the limitations of using such heuristics but
currently data collected by market data vendors do not have information on actual purchases
made and hence such heuristics are necessary. Further, these heuristics are actually used by
market research firms. In Table 3.2 we compare the rankings based on actual bookings at various
sites to rankings from Forrester, Lycos and Go2Net.
Site
travelocity
expedia
previewtravel
priceline.com
Itn.net
cooltravelassistant
cheapTickets
thetrip
flifo
onetravel
nwa
americanair
Lowest fare
Booking Heuristics
Lycos Top Go2Net Session assumed to contain a booking if a URL
Booking Forrester
5% Ranking 100Hot in the session indicates access in the https secure
Ranking Ranking
Ranking mode and the URL contains the following string:
1
2
6
19
2
3
4
5
6
7
8
9
10
11
12
13
3
11
20
33
18
22
35
36
30
16
8
28
3
1
11
27
12
16
20
22
23
10
13
15
2
21
17
28
50
98
25
100
99
13
32
63
“REVIEW.CTL?”
“CONFIRM.ASP" or
“RESERVAT.HTML”
“STATE=CONFIRM”
“/CONFIRMATION”
“STORE?STAMP=”
“QSCR=RESV”
“AIRFARESUBMIT.CTL?”
“RESGUEST”
“CU.CGI?”
“CU.CGI?”
“CONFIRMATION”
“HOLD_FLAG=Y”
“RESERVE.CTL?”
Table 3.2: The rankings by Forrester, Lycos Top 5% and Go2Net/MetaCrawler 100Hot
8
To test how the usage-based approach compares with the other 3 approaches, we performed 4
Wilcoxon Signed-Rank tests3 separately to determine if the rankings induced by each measure
are significantly different from the rankings induced by bookings volume. The results are
summarized in Table 3.3. Based on these results it appears that only rankings based on
UsageSLC are not significantly different from rankings induced by actual bookings. Hence based
on preliminary results it appears that even a simple linear combination of usage metrics
presented in this paper outperforms industry standard rankings in the context of predicting
bookings at online travel sites in this specific experiment.
H0-A: UsageSLC vs. Bookings
H0-B: Forrester vs. Bookings
H0-C: Lycos Top 5% vs. Bookings
H0-D: Go2Net 100Hot vs. Bookings
N
13
13
13
13
V
62
80
83
85
Adjusted Z P-Value
1.140
0.273
2.696
0.007
2.938
0.003
3.114
0.002
Table 3.3: The results of the Wilcoxon Signed-Rank Test (=0.05)
To summarize, we have presented results from comparing a simplistic usage-based approach to
ranking sites to traditional, industry-standard approaches in the context of online travel.
Preliminary results indicate that simple usage-based metrics seem to outperform standard
ranking methods in the context of characterizing bookings in the online travel industry. These
results seem to indicate that usage-based metrics could be invaluable if incorporated in actual
search techniques on the Internet. No current search engine provides such features today. In
current research we are working on methods to build various usage-based filters to identify
authoritative content and on usage-driven search technologies.
3
Wilcoxon signed-rank test is to test whether there is difference between the two-paired measurements when they
are ordinal or rank-ordered (Hollander & Wolfe 1999).
9
References
Bailey, J. E. and Pearson, S. W., “ Development of a Tool for Measuring and Analyzing User Satisfaction,”
Management Science, 29 (5), P530-545, 1983
Cooley, Robert, Bamshad Mobasher and Jaideep Srivastava, "Web Mining: Information and Pattern Discovery on
the World Wide Web", Proceedings of ICTAI, Nov. 1997
Cooley, Robert, Bamshad Mobasher and Jaideep Srivastava, "Data Preparation for Mining World Wide Web
Browsing Patterns", Knowledge and Information System, 1(1), Spring,1999
Davis, F., “Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology”, MIS
Quarterly, 13 (3) 319-340, 1989
Dreze, X., and F. Zufryden, "Testing Web Site Design and Promotional Content." J. of Advert. Resch. 37 (2), 1997
Fredrickson, B. L. and Kahneman, D., “Duration Neglect in Retrospective Evaluation of Affective Episodes,”
Journal of Personality and Social Psychology, 65, 45-55, 1993
Garcia-Molina, H.; Papakonstantinou, Y.; Quass, D.; Rajaraman, A.; Sagiv, Y.; Ullman, J.; Vassalos, V.; and
Widom, J., “The TSIMMIS Approach to Mediation: Data Models and Languages,” Journal of Intelligent
Information Systems (8:2), 1997, pp. 117-132.
Goh, C.; Madnick, S. and Siegel, M., “Context Interchange: New Features and Formalisms for the Intelligent
Integration of Information”, In ACM Transactions on Information Systems, July 1999.
Hoffman, Donna; Novak, Thomas and Schlosser, Ann, “ Consumer Control in Online Enviroments”, Working
Paper, http://ecommerce.vanderbilt.edu , 2000
Hollander, Myles and Wolfe, A., Douglas, “Nonparametric Statistical Methods”, Wiley-Interscience, 1999
Igbaria, Magid; Guimaraes, T.; Davis, Gordon; “Testing the Determinant of Microcomputer Usage via a Structural
Equation Model”, Journal of Management Information Systems, 11 (4), 87-114, 1995
Igbaria, Magid; Zinatelli, N.; Paul Cragg; Angele L M Cavaye, “Personal computing acceptance factors in small
firms: A structural equation model”, MIS Quarterly; 21 (3), Sept. 1997
Kahneman, D., “New Challenges to the Rationality Assumption”, Journal of Institutional and Theoretical
Economics, 150(1), P18-36, 1994
Kleinberg, J., “Authoritative Sources in a Hyperlinked Environment”, Association for Computing Machinery Journal of the Association for Computing Machinery (46:5) 1999, pp. 604-632
Korgaonkar, Pradeep K. and Lori D. Wolin, “A Multivariate analysis of Web usage,” J. of Advert. Resch, 39, 1999
Lee, D. S., “Usage Patterns and Sources of Assistance to Personal Computer Users”, MIS Quarterly, 10 (4), 1986
Lee, T.; Bressan, S., and Madnick, S., “Source Attribution for Querying Against Semi-Structured Documents,” 1st
Workshop on Web Information and Data Management, ACM Conf. on Info. and Knowledge Management, 1998
Novak, P. Thomas and Donna L. Hoffman, "New Metrics for New Media: Toward the Development of Web
Measurement Standards", World Wide Web Journal, Project 2000: www2000.ogsm.vanderbilt.edu, 2(1), 1997
O’Leary, E. Daniel, “Internet-Based Information and Retrieval Systems, Decision Support Systems, 27, 1999.
Pitkow, E. James, “Summary of WWW Characterizations”, Xerox Palo Alto Research Center Working Paper, 1999
Raymond, L., “Organizational Characteristics and MIS Success in the Context of the Small Business”, MIS
Quarterly, 9(1), 37-52, 1986
Sen, Shahana, Padmanabhan, Balaji et al., “The Identification And Satisfaction Of Consumer Analysis-Driven
Information Needs Of Marketers on The WWW”, European Journal Of Marketing, Vol. 32 (7/8), 1998
Tversky, A. and Kahneman, D., "Judgement under Uncertainty: Heuristics and Biases", Science, Vol. 185, Sep.1974
Van Bruggen, Gerrit H.; Smidts, Ale; Wierenga, Berend, “Improving Decision Making by Means of a Marketing
Decision Support System”, Management Science, 44(5): 645-65, May 1998
Wansink, Bria,”An Anchoring and Adjustment Model of Purchase Quantity Decisions”, Journal of Marketing
Research, Chicago; 35 (1) 71-82, Feb 1998
Xavier DreZe and Kirthi Kalyanam, “ The Ecological Inference Problem in Internet Measurement: Leveraging Web
Site Log Files to Uncover Population Demographics and Psychographics”, 1998
10
Download