Conceptual and Operational Issues in Measurement of Internet Use

Conceptual and Operational
Issues in the Measurement
of Internet Use*
@
* Funded
Jonathan Zhu
City University of Hong Kong
enjhzhu@cityu.edu.hk
by the UGC of HKSAR (CityU1152/00H)
CNNIC Symposium 2003
1
Background: the Diffusion of the Internet
in Hong Kong, Beijing and Guangzhou
% of 18-74 Population
60
50
40
Hong Kong
Beijing
Guangzhou
30
20
10
0
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
Source: J. H. Zhu (2003)
CNNIC Symposium 2003
2
% of Adult Population
Internet Penetration Rate in East
Asia
50%
Japan
CNNIC Symposium 2003
49%
Hong Kong
41%
38%
Singapore
37%
Taiwan
33%
BJ/GZ
Macau
3
Wired Internet Use vs. Wireless Internet
Use
80%
70%
60%
50%
40%
30%
20%
10%
0%
Hong Kong
PC Home
CNNIC Symposium 2003
BJ/GZ
Brandband Home
Japan
Wireless Web Users
4
Diffusion of Cable TV, the Internet, and
Mobile Phone in Hong Kong
% of Population
100%
75%
50%
Internet Users
Mobile Users
Cable TV
25%
0%
1990 1992 1994 1996 1998 2000 2002
CNNIC Symposium 2003
5
Internet vs. Mobile Phone in Beijing and
Guangzhou
% of 18-74 Population
60
50
40
BJ Web
GZ Web
BJ Mobile
GZ Mobile
30
20
10
0
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
CNNIC Symposium 2003
6
Issues in Measurement of Internet Use
and Users

The size of “Internet users” in a society is a
function of:




Definition of study population (SP)
Method of sample weighting (SW)
Requirement of minimal usage (MU)
The amount of “online time” by Internet users is a
function of:




Definition of study population (SP)
Method of sampling weighting (SW)
Method of data collection (DC)
Treatment of extreme values (EV)
CNNIC Symposium 2003
7
Criteria for Evaluation of Measurement



Validity: how accurate or correct is the
measure as compared with the “truth”?
Reliability: how precise or stable is the
measure over time and/or across space?
Practicality: how efficient or economic is
the measure in data collection and analysis?
CNNIC Symposium 2003
8
Data


Hong Kong Survey 2002: telephone
interviews of 1,800 residents at 6 and above
in Dec. 2002 by Jonathan Zhu and his team
AC Nielsen/Netratings 2002-03: online
tracking of 1,500 Internet users from 811
households in Hong Kong in Oct. 2002 and
Jan. 2003.
CNNIC Symposium 2003
9
Definitions of Study Population




WIP-Hong Kong: 18-74
CNNIC: 6+
Another popular definition: 18+
HK Census 2002:



6-17: 16.4%
18-74: 80.0%
75+: 3.6%
CNNIC Symposium 2003
10
Impact of Population Definitions on
Internet User Size
% of Study Population
60%
50%
40%
30%
50.1%
48.5%
46.4%
20%
10%
0%
Data: Hong Kong 2002
CNNIC Symposium 2003
6+
18-74
18+
11
Requirements of Minimal Usage
Minimal Usage Required?
Last Usage
Specified?
Yes
No
Yes
?
?
No
CNNIC
(1 hour/week)
WIP
CNNIC Symposium 2003
12
Impact of Minimal Requirements on
Internet User Size
% of Study Population
60%
50%
50.1%
45.0%
48.5%
43.9%
46.4%
41.9%
40%
WIP
CNNIC
30%
20%
10%
0%
6+
18-74
18+
Data: Hong Kong 2002
CNNIC Symposium 2003
13
Data: Hong Kong 2002
CNNIC Symposium 2003
Unweighted
80-84
75-79
70-74
65-69
60-64
55-59
50-54
45-49
40-44
35-39
30-34
25-29
20-24
15-19
10-14
16
14
12
10
8
6
4
2
0
6-9
% of Sample
Age Distribution of the Sample
before and after Weighting
Weighted
14
Impact of Weighting Methods on
Internet User Size
% of Study Population
60%
50%
40%
30%
55.3% 50.1%
54.0% 48.5%
51.2% 46.4%
20%
10%
0%
6+
Data: Hong Kong 2002
CNNIC Symposium 2003
18-74
Unweighted
18+
Weighted
15
Summary: Internet Users by Population,
Usage Requirement & Weighting Method
60%
50%
55.3%
40%
30%
41.9%
20%
10%
0%
WIP/UW/18-74
CNNIC/W/18-74
CNNIC/UW/18+
WIP/W/6+
WIP/W/18-74
WIP/UW/18+
CNNIC/W/18+
CNNIC/UW/6+
CNNIC/UW/18-74
WIP/W/18+
WIP/UW/6+
CNNIC/W/6+
Data: Hong Kong 2002
CNNIC Symposium 2003
16
A Mathematical Model of “True”
Internet Users (TIU)
TIU = 55.3 – 1.4SP18-74 - 3.7SP18+ - 4.5MU – 5.4SW
(Adjusted R2 = 99.6%, Standard Error = 0.3%)
Where TIU is the “Unadjusted” Internet Users (%)
for HK in 2002, which should be 1.4% less for a
study population of 18-74, or 3.7% less for a study
population of 18+, or 4.5% less if those use the
Internet less than 1 hour per week are excluded, or
5.4% less if the sample is weighted based on
population census.
CNNIC Symposium 2003
17
Minutes per Week
Impact of Population Definitions on
Online Time (at Home)
450
400
350
300
250
200
150
100
50
0
424
6+
412
18-74
Data: Hong Kong 2002
CNNIC Symposium 2003
18
Impact of Weighting Methods on
Online Time (at Home)
Minutes per Week
750
500
250
517
424
468
412
0
6+
Data: Hong Kong 2002
CNNIC Symposium 2003
18-74
Unweighted
Weighted
19
Impact of Extreme Values on Online
Time (at Home)
Minutes per Week
1000
750
499
424
473
412
500
250
0
6+
Data: Hong Kong 2002
CNNIC Symposium 2003
Raw Data
18-74
EV Removed
20
Impact of Data Collection (DC)
Methods on Online Time
Minutes per Week
500
250
424
412
236
239
0
6+
Phone Interview
18-74
Online Tracking
Data: HKS 2002 & Netratings 2002-03
CNNIC Symposium 2003
21
Summary: Online Time by SP, SW, DC,
and EV
581
600
468
500
400
241
300
209
200
100
0
W6+/Raw
W6+/No EV
W18-74/Raw
W18-74/No EV
UW6+/Raw
UW6+/No EV
UW18-74/Raw
UW18-74/No EV
W6+/Raw
W6+/No EV
W18-74/Raw
W18-74/No EV
UW6+/Raw
UW6+/No EV
UW18-74/Raw
UW18-74/No EV
Data: Hong Kong 2002
CNNIC Symposium 2003
22
A Mathematical Model of “True”
Online Time (TOT)
TOT = 532 + 16SP18-74 – 22SW – 49EV - 249DC
(Adjusted R2 = 93.5%, Standard Error = 34.3)
Where TOT is the “Unadjusted” Online Time
(min.) for HK users in 2002, which should be 16
min. more for a study population of 18-74, 22
min. less if the user sample is weighted, 49 min.
less if extreme values are removed, or 249 min.
less if data are collected through online tracking
method.
CNNIC Symposium 2003
23
Caution: Different Definitions of
“Online” Activities

Telephone interview
data include:


Online time at both
home (68%) and
elsewhere (32%);
Non-HTTP based
activities such as using
POP3 Email (=136
min./week) and other
protocols;
CNNIC Symposium 2003

Online tracking data
include:


Online time only at
home;
Only HTTP=based
activities protocols).
It is estimated that tracking
data may measure only 51%
of the total online time..
24
Estimated Distribution of Online Time
by Location and Protocol of Usage
Usage
Location
Home
Elsewhere
Total
CNNIC Symposium 2003
Online Activities
HTTP based Non-HTTP
Total
Online
Tracking
(51%)
17%
68%
24%
8%
32%
75%
25%
100%
25
Conclusion: How Many Internet Users
Are There?



The size of “Internet Users” is significantly affected by the
definition of study population (SP), the requirement of
minimal usage (MU) and the method of sample weighting
(SW).
SP (e.g., general population vs. adults) may produce a
difference of 1-4% and MU (e.g., no requirement vs. 1
hour per week) up to 5%. While there is no “correct”
definition of SP or MU, it is important to report the
definition and adopt, whenever possible, multiple
definitions.
SW (weighted vs. unweighted) may contribute another 5%
difference. Since Internet use is highly correlated with age
and sex, it seems both necessary and effective to weight
the sample to ensure the accuracy of the measurement.
CNNIC Symposium 2003
26
Conclusion: How Much Time Do They
Spend Online?



The amount of online time is marginally affected by SP (p
= 0.3) and SW (p = 0.2) probably due to the fact the base
of analysis is already restricted to users.
Online time is significantly affected by the treatment of
extreme values (EV), which may inflate online time by up
to 10%. It is thus necessary to control for it (i.e., removing
EVs).
Online time is most significantly affected by the method of
data collection (DC, e.g., interviews vs. online tracking),
which may result in a difference of 2-folder. Although
online tracking is generally more accurate, it is far more
expensive and impractical in many societies. It is thus
important to keep in mind the magnitude of inflation in
self-reported data.
CNNIC Symposium 2003
27
Ultimate Criteria for Evaluation



Validity: how accurate or correct is the
measure as compared with the “truth”?
Reliability: how precise or stable is the
measure over time and/or across space?
Practicality: how efficient or economic is
the measure in data collection and analysis?
CNNIC Symposium 2003
28
Consistency in Measurement of Internet
Users over Time and across Space*
50%
% of Sample
40%
30%
20%
10%
0%
HK
* Based onWIP definition.
CNNIC Symposium 2003
Beijing
2000
2001
Guangzhou
2002
29
Stability in Measurement of Sex Ratio
among Internet Users in Hong Kong
100%
75%
47%
46%
46%
50%
25%
0%
CNNIC Symposium 2003
53%
2000
54%
2001
54%
Female
Male
2002
30
Stability in Measurement of Online
Locations in Hong Kong
100%
75%
36%
29%
36%
Office
Elsewhere
50%
25%
62%
69%
62%
2000
2001
2002
Home
0%
CNNIC Symposium 2003
31
Consistency in Difference between
Methods across Age Cohorts
Telephone
Interview
Online
Tracking
Interview
/Tracking
18-19
10.72
6.53
1.64
20-24
8.49
5.54
1.53
25-29
7.06
4.21
1.68
30-34
5.24
3.62
1.45
35-39
5.50
3.51
1.57
40-44
5.02
2.98
1.69
45-49
3.72
2.58
1.44
50-74
3.51
1.84
1.91
Total
6.13
4.28
1.43
Age
CNNIC Symposium 2003
32
Final Verdicts



Measurement of Internet users and online time
based on interviews data is largely reliable over
time and across space.
The interview-based measurement is generally
more practical than online tracking method.
The interview-based measurement is generally
weaker in validity, as compared to online tracking
method. However, it could be adjusted if the
departure from the “truth” is known (e.g., based
on comparison with online tracking data.
CNNIC Symposium 2003
33