Measuring Web Site Traffic: Panel vs. Audit

advertisement
Measuring
Web Site Traffic:
Panel vs. Audit
I/PRO, A TopicalNet Company
444 Spear Street, Suite 200
San Francisco CA 94105
tel: 415-512-7470
fax: 415-512-7996
email: info@ipro.com
Web: www.ipro.com
Measuring Web Site Traffic: Panel vs. Audit
Measuring Web Site Traffic: Panel vs. Audit
1. Executive summary – “Don’t Guess, Count”
Compared with web log analysis and auditing, panel-based web traffic
measurement is a relatively inaccurate way to understand how many page views a
web site displays. Actual counting gives a much better picture. The highest level
of counting accuracy is achieved by auditing.
The attention of human auditors with strict guidelines and the independence of a
third party audit agency provides monthly snapshots of web site traffic that are
verified for public distribution to management, investors, and advertisers.
The amount of traffic that the panel-based approach misses is quite large.
•
Panel vs. web log auditing
85% of the cases show a panel error of more than 10%
−
30% of the cases have a panel error of more than 50%
−
Nearly 60% of the cases show an undercount
−
The figure below presents these findings graphically.
Web Site Traffic: Magnitude of Panel-Based Underreported Traffic vs. Audit - May 2001
700,000,000
Underreported Traffic (page views)
600,000,000
500,000,000
400,000,000
300,000,000
200,000,000
100,000,000
0
Greater than -50%
Between -50% and -10%
Between -10% and 0%
Error Relative to Audit
I/PRO, Internet Profiles Corporation
Page 2
Measuring Web Site Traffic: Panel vs. Audit
2. Background
Consider three media supported by advertising: television, magazines, and the
Web. For all three, there exists a need on the part of management, investors, and
advertisers for traffic numbers that are both accurate and verifiable. That is, they
need external audits. Problems in fulfilling this need—and how these problems
are overcome—are unique to each medium.
2.1 Television
Television is a "create once, sell many" medium. Production costs are the same
regardless of how many people ultimately watch a particular program. The
nature of television is that producers cannot directly measure (count) their
audience. While they can certainly survey for internal management purposes,
such surveys do not satisfy external investors and advertisers.
To address this need, third party panel-based measurement companies like
Nielsen survey a representative sampling of viewers and the channels to which
their television sets are tuned. They then extrapolate to estimate audience size as
a whole. This extrapolation works because the number of channels available for
viewing at any given time is relatively limited: certainly no more than hundreds.
The panel-based approach does not completely solve the accuracy and
verification problems, but, in this case is “good enough.”
2.2 Magazines
The situation is quite different for magazine publishers, who face a "create many,
sell many" situation. They know how many copies they print and how many they
sell. Unlike television, they can count. There is currently no way, however, to
monitor which magazines are actually opened, much less read. Moreover,
internal counts, no matter how accurate, again do not satisfy external investors
and advertisers.
Third party companies such as the Audit Bureau of Circulation and BPA verify
circulation numbers based on audits of financial documents, mailing lists, postal
receipts, printing bills, and other indicators. In theory, survey techniques can be
used to measure readership, but compared with television, the task is much
harder because the number of magazines is much larger than the number of
television channels: sample sizes would need to be quite large. The difficulty is
compounded because there is no magazine analog to the Nielsen set top box,
which records the actual program being displayed; surveys of magazine
readership would require that the readers accurately remember what they read.
2.3 The Web
Like television, the Web is “create once, sell many.” The Web enables an
interesting combination of the two measurement approaches: supply side (like
I/PRO, Internet Profiles Corporation
Page 3
Measuring Web Site Traffic: Panel vs. Audit
magazines) and demand side (like television). Web publishers can use web log
analysis to provide the accuracy part of the equation (analogous to a magazine
publisher counting the number of magazines that he or she prints). Panel-based
survey companies can install measurement software on user computers to
provide the accuracy (in a manner similar to panel-based television viewing
measurement).
Dramatically more so even than magazines, however, the fact that there are tens
of millions of web sites and billions of web pages requires that prohibitively large
samples be used. The difficulty is again compounded, this time by the fact that
representative samples are impractical to put together. Because companies,
educational institutions, and other large organizations forbid the installation of
measurement software on their users' computers, it is not practical to develop
truly representative samples. These troubles are exemplified by the dramatically
different traffic numbers that the various web panel measurers report.
As an example, in a press release dated August 16, 2001, Gannet Online released
“Unique Visitors Per Month” and “Percentage Reach of Internet Audience”
numbers from two panel-based measurement services. Both were from
“Home/Work Panels Combined” data sets. The table below is a graphic example
of the problems with a panel-based approach.
Reporting Company
Unique Visitors/Month
% Reach
Nielsen/NetRatings
9,199,000
8.2%
Media Metrix
7,712,000
8.4%
The Web enables auditing. It is the first advertising medium which supports this
level of accuracy in reporting. In short, “Don’t Guess, Count.”
The remainder of this paper outlines the components of web site auditing and
examines how panel-based traffic estimates differ from the actual audited traffic.
3. Components of web auditing
3.1 What is a web audit?
The purpose of an audit is to report not on what a web site serves, but what its
visitors see. Audits provide management, investors, advertisers, and others with
a credible measure of a web site’s traffic. A web site audit is a validation of traffic
by an independent audit agency.
I/PRO, Internet Profiles Corporation
Page 4
Measuring Web Site Traffic: Panel vs. Audit
3.2 What are the elements of an I/PRO audits?
Web log analysis and auditing begins with a “raw” web log file recorded by a web
server. These web logs contain a record (or hit) of each file that the web server
serves to a user via a web browser. These files include the following.
•
HTML files (.htm, .html)
•
Server side code that generates HTML (.cgi, .jsp, .asp, .cfm, .php, …)
•
Framesets (.frm)
•
Images (.gif, .jpg, ,jpeg, .bmp, …)
•
Multimedia files (.mpg, .mpeg, .mp3, .wav, .swf, …)
•
Stylesheets (.css)
•
Customized web site extensions
For each file, the web log may record the following information depending on the
web server software and the web log file format selected.
•
The file requested
•
The time and date that the file was served
•
The Cookie (unique identifier) accepted by the user
•
The IP address to which the file was served
•
The success of the delivery of the file (the status code)
A pre-audit counting stage is applied to turn Raw Hits into Qualified Hits and
then Qualified Pages. Counts for visits, visit length, and unique Visitors are
derived from Qualified Pages. The auditing stage consists of an examination of
the remaining pages to remove those that are not valid, resulting in either a
Document Requests or Page Requests metric.
The various removals throughout the audit stages are listed below.
•
Web log file transfer
Raw Hits
−
•
Pre-audit counting stage
Qualified Hits
−
Removal of invalid status code pages
>
Removal of internal traffic pages
>
Removal of spider and robot pages
>
Qualified Pages
−
Removal of non-HTML generating pages (images, multimedia
>
files, style sheets, etc.)
I/PRO, Internet Profiles Corporation
Page 5
Measuring Web Site Traffic: Panel vs. Audit
•
Auditing stage
Document Requests
−
Removal of blank pages
>
Removal of redirection pages
>
Removal of administrative/test pages
>
Removal of custom error web pages
>
Removal of other non-viewable files
>
Page Requests
−
Multiple frames reduced to one
>
Removal of WAP and PDA pages
>
Removal of passive pages
>
Removal of other error pages
>
Removal of include pages
>
I/PRO, Internet Profiles Corporation
Page 6
Measuring Web Site Traffic: Panel vs. Audit
Web site traffic: Percentage of Panel-Based Cases in Error vs. Audit - May 2001
50%
Percentage of Panel-Based Cases in Error
45%
40%
35%
30%
25%
20%
15%
10%
5%
0%
Greater than -50%
Between -50% and -10% Between -10% and 0%
Between 0% and 10%
Between 10% and 50%
Greater than 50%
Panel Error Relative to Audit
Figure 1
4. Panel estimation vs. web log auditing
Web log analysis counts web site traffic with a level of accuracy that panel-based
measurement cannot match. I/PRO compared page view numbers for its audit
customers against those from a leading panel-based measurement company. A
comparison of three months is examined: May 2001, April 2001, and December
2000. The selected metric is number of pages viewed: “page views” in the
parlance of the panel-based measurer, “Page Requests” in the case of I/PRO.
Page Requests represent the most conservative counting of traffic.
4.1 Percentage of panel-based cases in error vs. log auditing
Figure 1 shows the percentage of panel cases in error (vertical axis) in each of six
error bands (the horizontal axis).
•
Panel traffic lower than audit by more than 50% (red)
•
Panel traffic lower than audit by between 10% and 50% (green)
•
Panel traffic lower than audit by between 0% and 10% (blue)
•
Panel traffic higher than audit by between 0% and 10% (blue)
•
Panel traffic higher than audit by between 10% and 50% (green)
•
Panel traffic higher than audit by more than 50% (red)
I/PRO, Internet Profiles Corporation
Page 7
Measuring Web Site Traffic: Panel vs. Audit
Web Site Traffic: Panel-Based Page View Error vs. Audit - May 2001
10,000,000,000
+50%
+10%
Audit
-10%
1,000,000,000
Page Views per Month
-50%
Panel-Based
100,000,000
10,000,000
1,000,000
100,000
Representative Customer Web Sites
Figure 2
For May 2001, more than 30% of the cases have a panel error of more than 50%.
That figure climbs to more than 85% for a panel error of more than 10%. And
nearly 60% of the cases show an undercount. Similar results are found for April
2001 and December 2000.
4.2 Amount of panel-based error vs. log auditing
The number of cases in error as shown in Figure 1 represents a miscounting of
page views relative to web log auditing. Figure 2 shows how this miscounting
stacks up. The central blue line shows the vertical axis audit traffic for each case
laid out on the horizontal axis. The empty circles show the panel traffic. The
vertical distance between the circles and the blue line represents the error for a
given case.
The green lines just above and below the center blue line show the +/- 10% levels.
Any circles between one of the green lines and the blue line means that the panel
approach has less than a 10% error. Similarly, the red lines show the +/- 50%
levels. Circles between the red and green lines have an error between 10% and
50%. Circles above the top red line or below the bottom red line have an error of
more than 50%. Note that the vertical is a logarithmic scale, meaning that the
error is visually compressed.
I/PRO, Internet Profiles Corporation
Page 8
Measuring Web Site Traffic: Panel vs. Audit
Web Site Traffic: Magnitude of Panel-Based Underreported Traffic vs. Audit - May 2001
700,000,000
Underreported Traffic (page views)
600,000,000
500,000,000
400,000,000
300,000,000
200,000,000
100,000,000
0
Greater than -50%
Between -50% and -10%
Between -10% and 0%
Error Relative to Audit
Figure 3
4.3 Magnitude of panel-based underreported
For the undercount panel versus audit cases, it is instructive to understand how
much traffic has been missed. Figure 3 examines the lower three error bands.
•
Panel traffic lower than audit by more than 50% (red)
•
Panel traffic lower than audit by between 10% and 50% (green)
•
Panel traffic lower than audit by between 0% and 10% (none)
For the less than 50% cases, the actual traffic is represented by the height of the
white bar on the left: just over 600 million page views per month. The red bar
shows how much traffic the panel-based approach shows for the same month:
under 200 million page views, or one-third.
For the 10% to 50% cases, the actual traffic is represented by the white bar on the
right: over 200 million page views. The green bar shows the panel-based count:
an equal amount under 200 million page views, or a bit over two-thirds.
There were no cases with an undercount of between 0% and 10%. In total, for the
undercount cases, the panel-based approach captures only 40% of the actual
traffic.
I/PRO, Internet Profiles Corporation
Page 9
Measuring Web Site Traffic: Panel vs. Audit
Web Site Traffic: Panel-Based Error vs. Audit - Dec 2000, Apr 2001, and May 2001
Panel-Based Error vs. Log Analysis: Range Reflects Panel Error One Month vs. the Next
0%
100%
200%
300%
400%
500%
600%
700%
Representative Customers
-100%
Panel-based error > +/- 50%
+/- 50% > Panel-based error > +/- 10%
+/- 10% > Panel-based error
Figure 4
4.4 Variation across multiple months
Figure 4 shows how the panel-based error varies with time. The different cases
are presented along the vertical axis. The month-to-month variation is expressed
by the width of the band along the horizontal axis. The cases are ranked top to
bottom from largest to smallest month-to-month variation.
The color of the band indicates the extent of the error. Error cases that go beyond
the +/- 50% range are red; those in the +/- 10% to +/- 50% are green. All of the
cases except one (the second one up from the bottom) have an error of greater
than +/- 10% for at least one of the months.
Taking the second case down as an example, it can be seen that the error in one
of the three months was about –30% versus more than +600% in another. Nine
of the cases show overcounting in at least one month versus undercounting in
another.
I/PRO, Internet Profiles Corporation
Page 10
Measuring Web Site Traffic: Panel vs. Audit
5. Conclusion
Panel-based web traffic measurement is an inaccurate way to understand how
many page views a web site displays. Counting gives a much better picture. The
highest level of counting accuracy is achieved by auditing. The attention of
human counters with strict guidelines and the independence of a third party
audit agency provides monthly snapshots of web site traffic that are verified for
public distribution to management, investors, and advertisers.
•
Panel vs. log auditing
−
30% of the cases have a panel error of more than 50%
−
85% of the cases show a panel error of more than 10%
Nearly 60% of the cases show an undercount
−
Perhaps most telling, the amount of traffic that the panel-based approach misses
is quite large.
•
Missed traffic: panel vs. log auditing
2/3 of traffic missed in serious undercount cases
−
1/3 of traffic missed for milder undercount cases
−
In short, “Don’t guess, count.”
I/PRO, Internet Profiles Corporation
Page 11
Download