Conceptual and Operational Issues in the Measurement of Internet Use* @ * Funded Jonathan Zhu City University of Hong Kong enjhzhu@cityu.edu.hk by the UGC of HKSAR (CityU1152/00H) CNNIC Symposium 2003 1 Background: the Diffusion of the Internet in Hong Kong, Beijing and Guangzhou % of 18-74 Population 60 50 40 Hong Kong Beijing Guangzhou 30 20 10 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 Source: J. H. Zhu (2003) CNNIC Symposium 2003 2 % of Adult Population Internet Penetration Rate in East Asia 50% Japan CNNIC Symposium 2003 49% Hong Kong 41% 38% Singapore 37% Taiwan 33% BJ/GZ Macau 3 Wired Internet Use vs. Wireless Internet Use 80% 70% 60% 50% 40% 30% 20% 10% 0% Hong Kong PC Home CNNIC Symposium 2003 BJ/GZ Brandband Home Japan Wireless Web Users 4 Diffusion of Cable TV, the Internet, and Mobile Phone in Hong Kong % of Population 100% 75% 50% Internet Users Mobile Users Cable TV 25% 0% 1990 1992 1994 1996 1998 2000 2002 CNNIC Symposium 2003 5 Internet vs. Mobile Phone in Beijing and Guangzhou % of 18-74 Population 60 50 40 BJ Web GZ Web BJ Mobile GZ Mobile 30 20 10 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 CNNIC Symposium 2003 6 Issues in Measurement of Internet Use and Users The size of “Internet users” in a society is a function of: Definition of study population (SP) Method of sample weighting (SW) Requirement of minimal usage (MU) The amount of “online time” by Internet users is a function of: Definition of study population (SP) Method of sampling weighting (SW) Method of data collection (DC) Treatment of extreme values (EV) CNNIC Symposium 2003 7 Criteria for Evaluation of Measurement Validity: how accurate or correct is the measure as compared with the “truth”? Reliability: how precise or stable is the measure over time and/or across space? Practicality: how efficient or economic is the measure in data collection and analysis? CNNIC Symposium 2003 8 Data Hong Kong Survey 2002: telephone interviews of 1,800 residents at 6 and above in Dec. 2002 by Jonathan Zhu and his team AC Nielsen/Netratings 2002-03: online tracking of 1,500 Internet users from 811 households in Hong Kong in Oct. 2002 and Jan. 2003. CNNIC Symposium 2003 9 Definitions of Study Population WIP-Hong Kong: 18-74 CNNIC: 6+ Another popular definition: 18+ HK Census 2002: 6-17: 16.4% 18-74: 80.0% 75+: 3.6% CNNIC Symposium 2003 10 Impact of Population Definitions on Internet User Size % of Study Population 60% 50% 40% 30% 50.1% 48.5% 46.4% 20% 10% 0% Data: Hong Kong 2002 CNNIC Symposium 2003 6+ 18-74 18+ 11 Requirements of Minimal Usage Minimal Usage Required? Last Usage Specified? Yes No Yes ? ? No CNNIC (1 hour/week) WIP CNNIC Symposium 2003 12 Impact of Minimal Requirements on Internet User Size % of Study Population 60% 50% 50.1% 45.0% 48.5% 43.9% 46.4% 41.9% 40% WIP CNNIC 30% 20% 10% 0% 6+ 18-74 18+ Data: Hong Kong 2002 CNNIC Symposium 2003 13 Data: Hong Kong 2002 CNNIC Symposium 2003 Unweighted 80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49 40-44 35-39 30-34 25-29 20-24 15-19 10-14 16 14 12 10 8 6 4 2 0 6-9 % of Sample Age Distribution of the Sample before and after Weighting Weighted 14 Impact of Weighting Methods on Internet User Size % of Study Population 60% 50% 40% 30% 55.3% 50.1% 54.0% 48.5% 51.2% 46.4% 20% 10% 0% 6+ Data: Hong Kong 2002 CNNIC Symposium 2003 18-74 Unweighted 18+ Weighted 15 Summary: Internet Users by Population, Usage Requirement & Weighting Method 60% 50% 55.3% 40% 30% 41.9% 20% 10% 0% WIP/UW/18-74 CNNIC/W/18-74 CNNIC/UW/18+ WIP/W/6+ WIP/W/18-74 WIP/UW/18+ CNNIC/W/18+ CNNIC/UW/6+ CNNIC/UW/18-74 WIP/W/18+ WIP/UW/6+ CNNIC/W/6+ Data: Hong Kong 2002 CNNIC Symposium 2003 16 A Mathematical Model of “True” Internet Users (TIU) TIU = 55.3 – 1.4SP18-74 - 3.7SP18+ - 4.5MU – 5.4SW (Adjusted R2 = 99.6%, Standard Error = 0.3%) Where TIU is the “Unadjusted” Internet Users (%) for HK in 2002, which should be 1.4% less for a study population of 18-74, or 3.7% less for a study population of 18+, or 4.5% less if those use the Internet less than 1 hour per week are excluded, or 5.4% less if the sample is weighted based on population census. CNNIC Symposium 2003 17 Minutes per Week Impact of Population Definitions on Online Time (at Home) 450 400 350 300 250 200 150 100 50 0 424 6+ 412 18-74 Data: Hong Kong 2002 CNNIC Symposium 2003 18 Impact of Weighting Methods on Online Time (at Home) Minutes per Week 750 500 250 517 424 468 412 0 6+ Data: Hong Kong 2002 CNNIC Symposium 2003 18-74 Unweighted Weighted 19 Impact of Extreme Values on Online Time (at Home) Minutes per Week 1000 750 499 424 473 412 500 250 0 6+ Data: Hong Kong 2002 CNNIC Symposium 2003 Raw Data 18-74 EV Removed 20 Impact of Data Collection (DC) Methods on Online Time Minutes per Week 500 250 424 412 236 239 0 6+ Phone Interview 18-74 Online Tracking Data: HKS 2002 & Netratings 2002-03 CNNIC Symposium 2003 21 Summary: Online Time by SP, SW, DC, and EV 581 600 468 500 400 241 300 209 200 100 0 W6+/Raw W6+/No EV W18-74/Raw W18-74/No EV UW6+/Raw UW6+/No EV UW18-74/Raw UW18-74/No EV W6+/Raw W6+/No EV W18-74/Raw W18-74/No EV UW6+/Raw UW6+/No EV UW18-74/Raw UW18-74/No EV Data: Hong Kong 2002 CNNIC Symposium 2003 22 A Mathematical Model of “True” Online Time (TOT) TOT = 532 + 16SP18-74 – 22SW – 49EV - 249DC (Adjusted R2 = 93.5%, Standard Error = 34.3) Where TOT is the “Unadjusted” Online Time (min.) for HK users in 2002, which should be 16 min. more for a study population of 18-74, 22 min. less if the user sample is weighted, 49 min. less if extreme values are removed, or 249 min. less if data are collected through online tracking method. CNNIC Symposium 2003 23 Caution: Different Definitions of “Online” Activities Telephone interview data include: Online time at both home (68%) and elsewhere (32%); Non-HTTP based activities such as using POP3 Email (=136 min./week) and other protocols; CNNIC Symposium 2003 Online tracking data include: Online time only at home; Only HTTP=based activities protocols). It is estimated that tracking data may measure only 51% of the total online time.. 24 Estimated Distribution of Online Time by Location and Protocol of Usage Usage Location Home Elsewhere Total CNNIC Symposium 2003 Online Activities HTTP based Non-HTTP Total Online Tracking (51%) 17% 68% 24% 8% 32% 75% 25% 100% 25 Conclusion: How Many Internet Users Are There? The size of “Internet Users” is significantly affected by the definition of study population (SP), the requirement of minimal usage (MU) and the method of sample weighting (SW). SP (e.g., general population vs. adults) may produce a difference of 1-4% and MU (e.g., no requirement vs. 1 hour per week) up to 5%. While there is no “correct” definition of SP or MU, it is important to report the definition and adopt, whenever possible, multiple definitions. SW (weighted vs. unweighted) may contribute another 5% difference. Since Internet use is highly correlated with age and sex, it seems both necessary and effective to weight the sample to ensure the accuracy of the measurement. CNNIC Symposium 2003 26 Conclusion: How Much Time Do They Spend Online? The amount of online time is marginally affected by SP (p = 0.3) and SW (p = 0.2) probably due to the fact the base of analysis is already restricted to users. Online time is significantly affected by the treatment of extreme values (EV), which may inflate online time by up to 10%. It is thus necessary to control for it (i.e., removing EVs). Online time is most significantly affected by the method of data collection (DC, e.g., interviews vs. online tracking), which may result in a difference of 2-folder. Although online tracking is generally more accurate, it is far more expensive and impractical in many societies. It is thus important to keep in mind the magnitude of inflation in self-reported data. CNNIC Symposium 2003 27 Ultimate Criteria for Evaluation Validity: how accurate or correct is the measure as compared with the “truth”? Reliability: how precise or stable is the measure over time and/or across space? Practicality: how efficient or economic is the measure in data collection and analysis? CNNIC Symposium 2003 28 Consistency in Measurement of Internet Users over Time and across Space* 50% % of Sample 40% 30% 20% 10% 0% HK * Based onWIP definition. CNNIC Symposium 2003 Beijing 2000 2001 Guangzhou 2002 29 Stability in Measurement of Sex Ratio among Internet Users in Hong Kong 100% 75% 47% 46% 46% 50% 25% 0% CNNIC Symposium 2003 53% 2000 54% 2001 54% Female Male 2002 30 Stability in Measurement of Online Locations in Hong Kong 100% 75% 36% 29% 36% Office Elsewhere 50% 25% 62% 69% 62% 2000 2001 2002 Home 0% CNNIC Symposium 2003 31 Consistency in Difference between Methods across Age Cohorts Telephone Interview Online Tracking Interview /Tracking 18-19 10.72 6.53 1.64 20-24 8.49 5.54 1.53 25-29 7.06 4.21 1.68 30-34 5.24 3.62 1.45 35-39 5.50 3.51 1.57 40-44 5.02 2.98 1.69 45-49 3.72 2.58 1.44 50-74 3.51 1.84 1.91 Total 6.13 4.28 1.43 Age CNNIC Symposium 2003 32 Final Verdicts Measurement of Internet users and online time based on interviews data is largely reliable over time and across space. The interview-based measurement is generally more practical than online tracking method. The interview-based measurement is generally weaker in validity, as compared to online tracking method. However, it could be adjusted if the departure from the “truth” is known (e.g., based on comparison with online tracking data. CNNIC Symposium 2003 33