Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS DETERMINANTS OF INTERNET UTILIZATION IN THE US: DEMOGRAPHIC AND SPATIAL-ECONOMIC FACTORS John B. Horrigan Pew Internet and American Life Project 1100 Connecticut Ave, NW Suite 710 Washington, DC 20036 202/557-3465 (voice), jhorrigan@pewinternet.org Chandler Stolp LBJ School of Public Affairs University of Texas at Austin Austin, TX 78712 512/471-8951, stolp@mail.utexas.edu Robert H. Wilson LBJ School of Public Affairs University of Texas at Austin Austin, TX 78712 512/475-7906 (voice), 475-7909 (fax), rwilson@mail.utexas.edu Key Words: Internet utilization, Internet users, Geography and the Internet Abstract Networked information technologies have become a critical component of today’s society. The Internet segment of this industry is believed to have reached 945 million users worldwide in 2004 and has been forecast to reach nearly 1.5 billion users by 2007. Electronic commerce has exceeded expectations, with $2.4 trillion in business-to-business e-commerce and $95 billion in consumer e-commerce conducted in the United States in 2003. Given this expanded use of the Internet, a series of questions emerge concerning who uses the Internet and for what purposes. This paper utilizes a unique data set of over 25,000 survey responses generated by the Pew Internet and American Life Project in 2000. Several multivariate logit models of Internet use are estimated as a function of several demographic characteristics, such as gender, educational attainment, age, income, and workforce status, and spatial-economic factors. Although the demographic characteristics largely explain utilization, the spatial-economic factors associated with location of users play a secondary, but distinguishable, role. Acknowledgements: The authors wish to thank Ms Mee Young Han, formerly a graduate student at the LBJ School of Public Affairs for her valuable assistance in developing the database and statistical analysis. In addition, the authors acknowledge the Pew Internet and American Life Project for making the data available and the Mike Hogg Urban Policy Professorship for research support of the project. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Networked information technologies have become a critical component of today’s society. The Internet segment of this industry is believed to have reached 945 million users worldwide in 2004 and has been forecast to reach nearly 1.5 billion users by 2007 (ClickZ, 2004). Electronic commerce has exceeded expectations from the heady days of the dot-com boom, with $2.4 trillion in business-to-business e-commerce and $95 billion in consumer e-commerce conducted in the United States in 2003 (Business Week, 2003). The utilization of the Internet is of great interest to many parties. Internet service providers (ISPs) must understand opportunities for market expansion. For the remarkable range of businesses utilizing the Internet to reach customers, knowledge of demographic factors affecting utilization is critical. Recognizing the fundamental importance of Internet communications in contemporary society, social scientists and communications researchers are keenly interested in the reasons for and the consequences of utilization as well as in determining who is not using the Internet and why. Public policy decisionmkers also have broad interest in Internet utilization. The high levels of telephony penetration in the US are the results of a public commitment to universal service, dating from the 1930s. Today’s policymakers are facing similar challenges concerning Internet utilization, a debate often framed as the digital divide. In addition, a wide range of Internet-based applications in the provision of public services and in management of public infrastructure systems has proven effective, thus raising additional issues concerning citizen access. Fiscal pressures at all levels of government have force decisionmakers to explore possible productivity enhancing applications of advanced telecommunication systems. The feasibility of e-government, including electronic access to services and information, will be at least partially determined by success in making Internet access universal. In addition, the geography of Internet deployment and utilization may have important implications for regional development policy. This study utilizes a unique data set of over 25,000 responses to a survey in the US, generated by the Pew Internet and American Life Project (Pew Project) in 2000, to examine several questions relating to Internet use. The paper first reviews the existing literature on demographic determinants of Internet usage, such as gender, educational attainment, age, income, and workforce status. Gross utilization rates by demographic categories are generated with the Pew data and these are contrasted with the findings in the literature. A series of logit models are then estimated to provide for more complex modeling of demographic and spatial-economic determinants of usage. Characteristics of the 2,576 counties in which respondents live are incorporated into the modeling of Internet use. The paper concludes with suggestions for further research. 1.0 Determinants of Internet Use: Findings in the Literature Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS The demographic characteristics of Internet users have been investigated by a number of researchers, but efforts in this field face several difficulties. The diffusion of information technology has been quite rapid (Leigh and Atkinson, 2001; Department of Commerce, 2002) and the rates of utilization are subject to rapid change over time. Furthermore, given the multiple sites (at home, at work, at school and at other locations) and multiple purposes of Internet usage (to make purchases, obtain information, entertainment, social interaction, and others), variation in the effects of demographic factors across different settings must be considered.1 In addition, the usage at one site may affect the individual’s usage at other sites (Department of Commerce, 2002). For example, after exposure to computers and Internet at work, an individual may decide to use the Internet at home. Non-demographic factors of Internet access, such as availability of computers and ISPs, have also been explored. Some researchers (Novak and Hoffman, 1998; Chaudhuri, Flamm and Horrigan) have chosen to investigate computer ownership and Internet access as a joint decision. Access from any site requires a computer to be available. In terms of home usage, purchase of a computer must precede the decision to access the Internet. Others, adopting a microeconomic approach, attempt to integrate the quality and cost of Internet access into the analysis of the decision to use the Internet (Kridel, Rappoport, and Taylor 1999; Chaudhuri, Flamm, Horrigan). It is hypothesized that the decision of the consumer acquire access to the Internet will depend on the price as well as the quality of the service (Government Accounting Office). This question raises the intriguing issue of whether areas of intense competition among services providers will benefit from lower prices and, as an effect, higher utilization. The Pew Project data do not include information on infrastructure availability and prices. At a result, this analysis should be considered exploratory in nature.2 Returning to the literature, education and income of individuals have invariably been found to affect Internet usage (Novak and Hoffman, 1999; UCLA, 2000; Leigh and Atkinson, 2001; Government Accounting Office, 2001; Department of Commerce, 2002). Internet utilization rates increase with levels of income and education. Presumably, the more highly educated place greater value on services and information provided on the Internet and possess the intellectual capital to utilize it. Higher levels of income would certainly be associated with an enhanced ability to purchase computers and access. It has been noted, however, that more rapid increases in penetration have recently occurred among lower income groups (Department of Commerce, 2002). 1 Since the Pew Project dataset used in this study does not include individuals below the age of 18 and does not consider usage at school, this study does not explore how Internet use at school may relate to overall usage at home or at work. 2 The models estimated in this project are effectively reduced form equations, incorporating both the direct and indirect effects of a variety of demographic factors on Internet utilization. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Another factor found to be important is the age of the individual; utilization rates decline with age, but not necessarily with aging (Department of Commerce, 2002). The results concerning gender are less clear. Some studies have found that women were significantly less likely than men to use the Internet, but that the gender gap with respect to use per se had narrowed by 2000, although there remain gender differences in the intensity of internet use (Ono and Zavodny, 2003; Bimber, 2000). One large study found virtually no difference in utilization among men and women (Department of Commerce, 2002). These results suggest distinct diffusion patterns among men and women, but with women converging on the utilization rate of men. Researchers have been quite concerned with the impact of race/ethnicity on Internet utilization. The concept of digital divide emerged from a concern that differential rates of utilization by poverty status, race, or other factors were preventing a large segment of the population from fully participating in a modern, communications-based society. Several studies produced by U.S. governmental agencies found significant differences in utilization by race/ethnicity with higher rates among the white population (Department of Commerce, 2002 and 2000; Government Accounting Office, 2001). Novak and Hoffman find differences between whites and blacks in computer ownership and Internet usage in their study using 1997 survey data (Novak and Hoffman). Computer ownership was higher for whites, holding income constant, but statistically significant differences between whites and blacks were few. In terms of Internet usage, once education, gender and income were held constant, the race of users produced few statistically significant differences. The authors conclude that blacks were quickly achieving usage profiles similar to whites. The study also found similar utilization rates for males and females. In later research, with expanded datasets, Hoffman, Novak, and Schlosser, once again found income and education to be key factors explaining Internet use, but higher rates of use are found among whites than blacks in similar demographic segments. Differences are more accentuated at similar levels of education status than at similar levels of income (Hoffman, Novak, and Schlosser, 1998). The Internet utilization rate among Hispanics has been estimated to be 50 percent among individuals over 18 years of age (Spooner, 2001) but rates of utilization vary substantially by site, with higher usage at home than at work. Internet penetration among Hispanics grew rapidly between March 2000 and March 2001. The existing research literature focuses primarily on the demographic determinants of Internet utilization. Some studies, however, have analyzed variation in utilization across space. Substantial rural-urban disparities in Internet utilization, for example, have been found (Leigh and Atkinson, 2001; Department of Commerce, 2002). It has also been demonstrated cities on the eastern and western seaboards display higher densities of the Internet infrastructure than cities in the interior of the country, the socalled “coastal effect” (Gorman and Malecki, 2000). The investigation of the geography Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS of utilization is warranted for several reasons. The availability of Internet services is not spatially ubiquitous and this may well lead to variation in prices of access to services and in rates of utilization. Furthermore, demographic variables themselves are not uniform over space. The distribution of levels of educational attainment, for example, varies dramatically across states and regions (Wilson, 1993). It is also well-established that many of the rapidly growing service sectors, such as finance and insurance, business services, computer and data processing, real estate, wholesale and retail trade, and hotels are intensive users of telecommunications services (Wilson, 1993). Firms in these sectors are not uniformly distributed across the U.S. and several are relatively concentrated in larger cities such as Atlanta, Chicago, Dallas, Los Angeles and New York. This concentration of intensive users is leading to the provision of superior telecommunications infrastructure in these cities (Schmandt, Williams, Wilson, and Strover, 1990; Greenstein, Lizardo, Spiller, 1997; Moss and Townsend, 2000). As a result, one might expect Internet utilization to be higher in areas with relatively high levels of these telecommunications-intensive sectors, independent of the demographic characteristics of individuals. Therefore, the potential role of spatial characteristics in explaining Internet utilization will be examined below. 2.0 Gross Utilization Rates in the Pew Project Data The Pew Project conducts random-digit dial telephone surveys of Americans age 18 and older, focusing on whether people use the Internet, where they gain access to the Internet, and what they do online. The question that measures Internet use is phrased as follows: “Do you ever go online to access the Internet or World Wide Web or to send and receive email?” This captures Internet use at home, work, or other places people may access the Internet. The survey methodology reflects the research standards of AAPOR, the American Association for Public Opinion Research.3 Data used in the analysis for this paper reflects surveys conducted from March 2000 through December 2000; aggregating the surveys yields a sample size of more than 25,000 individuals in over 2,500 counties across the United States for the year 2000.4 To initiate the examination, gross utilization rates for various groups of respondents were calculated without controlling for intervening or inter-related variables (Table 1). These rates must be interpreted with caution since the relatively low utilization rate for a particular group of people may not be the result of the variable defining the 3 The telephone sample is provided by Survey Sampling International, LLC (SSI) according to specifications provided by Princeton Survey Research Associates International, the firm Pew Internet contracts with to administer surveys. The sample is drawn using standard list-assisted random digit dialing (RDD) methodology. Active blocks of telephone numbers (area code + exchange + two-digit block number) that contain three or more residential directory listings are selected with probabilities in proportion to their share of listed telephone households; after selection, two more digits are added randomly to complete the number. This method guarantees coverage of every assigned phone number regardless of whether that number is directory listed, purposely unlisted, or too new to be listed. Selected numbers are called 10 times in efforts to get responses; response rates for surveys conducted by the Pew Internet Project are typically about 33%. 4 Data from the Pew Internet Project is available online at: http://www.pewinternet.org/data.asp. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS group, such as non-married status, but rather of some other variable associated with nonmarried status, such as age. Although the analysis below controls for such indirect influences, gross utilization rates will be briefly reviewed since the literature discussed above tends to adopt this type of analysis. [Insert Table 1] The Pew Project survey respondents demonstrate patterns of utilization similar to those found in the literature: for example, the higher level of education attainment, the higher the rate of Internet utilization. For those with less than eight years of education, the (unconditional) probability of utilization is a little over 4 percent, but for those with graduate level educational attainment the probability is close to 80 percent. Family income also has a positive effect on utilization: the higher the family income, the greater the likelihood of Internet utilization. For individuals in families with annual household incomes of less than $10,000, the rate of utilization of the Internet is about 23 percent but for those above $100,000 the rate is 80 percent. The survey also reveals gender differences in utilization, with the rate among men being 56 percent compared to 49 percent for women (Table 1). The findings in the literature concerning the effect of gender on utilization, discussed above, were mixed. Some studies indicated that women have lower rates of utilization than men and others reporting essentially similar rates. Being a parent is found to have a significant effect on utilization in the Pew Project survey. Married people show a utilization rate of 57 percent and respondents with children a rate of 62 percent. Individuals without children and unmarried individuals have substantially lower rates. Student status has also been reported in the literature to be a significant determinant, suggesting that having children at home who are exposed to Internet at school may well have a positive effect on their parent’s use at home. The gross utilization rates in the Pew Project survey support the argument that Internet use differs by race/ethnicity (Table 1). The survey data shows 54 percent utilization for whites, 39 percent for blacks, 47 percent for Hispanics, and 57 percent for Asian/other. These pronounced differences are similar to those found in several of the research results discussed above (Department of Commerce, 2002 and 2000; Government Accounting Office, 2001; Pew, 2001). The employment status of an individual has a substantial impact on utilization. Those employed demonstrate a utilization rate of over 30 percentage points higher than those not employed (Table 1). In addition to providing a measure of Internet use in general (the "USER" dataset), the Pew Project survey allows for a narrower measure of utilization that allows us to examine use at work (the "WORK" subset of the USER dataset) for those individuals who were employed (Table 2). One can hypothesize that the effect of demographic variables may differ when the Internet is being used in a job- Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS related activity as compared to its utilization at other locations. For example, having a child might affect a parent’s utilization at home but not at work. [Insert Table 2] The county of residence of each respondent is provided in the Pew Project survey, permitting the examination the spatial variation in Internet utilization. Substantial variation across regions for the two measures of utilization is found (Table 2, Part a). The Northeast and, especially, the Western regions of the country reveal higher rates than the Midwest and South with respect to USER. This finding is consistent with the reporting of a “coastal effect” where cities on the eastern and western seaboards display higher densities of the Internet infrastructure than cities in the interior of the country (Gorman and Malecki, 2000). In terms of utilization at work, the Western region shows distinctively higher utilization than the other three. The county in which a respondent resides was classified in one of four categories: counties outside a Metropolitan Statistical Area (MSA, a definition of the U.S. Census Bureau), counties that constitute a single-county MSA, central counties of multi-county MSAs, and non-central counties of multi-county MSAs (Table 2, Part b).5 The first category consists of sparsely populated counties with an average population of around 29,000. The latter two categories include counties within multi-county MSAs but distinguished between central counties, which normally contain the largest central city of the metropolitan area, and the surrounding suburban counties. Internet utilization in nonMSA counties is substantially lower than in other types of counties for the two types of utilization. The Single-County MSA type has the highest average level of utilization among all county types for USER. For the WORK variable, the Central County and Suburban County categories demonstrate the highest average rates of utilization. To explore the possible effects on utilization of the interaction among variables, utilization at work is also cross-tabulated by region and county type (Table 3). Non-MSA counties in the South and the Midwest show particularly low rates of utilization. The level of utilization among county types in the Northeast varies to lesser extent than in the other three regions. Utilization rates in the Single-County MSAs category in the Midwest and West are relatively high. The utilization rate in Central County category is the highest of all county types in the Northeast. Suburban counties have the highest utilization in other counties, and the rate is especially high in the West. This somewhat puzzling result may be explained by the historical pattern of urban development. In the Northeast, central cities retain their historical importance as a place of work while metropolitan areas in the West have followed a much more decentralized pattern due to their growth after the general availability of the automobile and supporting infrastructure. These results confirm a complex pattern of Internet diffusion across the country noted by others (Moss and Townsend, 1998; Gorman and Malecki, 2000). 5 In the Northeast, boundaries of townships often do not coincide with MSA designations. Adjustments in population of the respondent ‘s county were made in instances where boundaries were not coterminous. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS 3.0 Modeling Internet Utilization Building upon the analysis of unconditional, univariate utilization rates, discussed in the previous section, attention now turns to examining Internet usage as a function of demographic and spatial-economic characteristics. A series of logistic regression models are estimated in which the dependent variable is a binary indicator of Internet usage (usage=1, nonusage=0) expressed as function of a linear combination of explanatory variables. Maximum likelihood statistical techniques are employed to estimate the marginal contribution that each explanatory variable makes to the probability of Internet usage. Venturing into the world of conditional probabilities of usage (i.e., the probability of Internet use given a vector of values for the explanatory variables) suggests a causal relationship between the explanatory variables and usage. The nexus of causality surrounding Internet use, however, is far more complex than this and remains a rich territory for future research. The models estimated in this exploratory study are, again, best understood as reduced-form expressions of more complicated structural relations, ones that nevertheless serve to illuminate the mechanisms driving Internet use. In particular, the Pew Project data do not include information on the price of Internet use incurred by respondents. Although the effects of this important variable may be indirectly captured by spatial-economic variables, the explicit effect of this potentially important variable will not be examined in the modeling presented here. With this introduction, attention can now turn to the results of the logit modeling exercise, first for Internet use at any site analyzing the USER dataset (Section 3.1), and then for Internet use at work with the WORK dataset (Section 3.2). The nonlinear functional form in which logit models express the relationship between a linear combination of explanatory variables and the dependent variable makes it difficult to interpret the estimated coefficients directly. Consequently, the marginal effect of each explanatory variable on Internet use is, with the exception of Figure 3 (below), expressed in terms of the partial odds ratio of utilization (i.e. the antilog of the estimated coefficients) rather than in terms of its impact on the probability of utilization.6 The partial odds ratio represents the odds in favor of usage over non-usage attributable to a unit change in a given explanatory variable, holding all other explanatory variables constant. For binary (dummy) indicator variables, like most of the variables in the Pew dataset, the partial odds ratio is interpreted as the marginal impact on the odds in favor of usage due to the presence of the indicator (eg, setting the PARENT=1 for a parent or guardian of a child under 18 years of age respondent in contrast to PARENT=0 for a nonparent respondent), ceteris paribus. 3.1 Determinants of Internet Use 6 “Odds” are calculated as p/(1-p), where p=the probability of Internet use. The odds ratio is the probability of usage divided by the probability of non-usage. As such, a partial odds ratio of 1 for a given explanatory variable suggests no impact on the choice to use the Internet. Consequently, the null-hypothetical value for testing the statistical significance of partial odds ratios is 1 rather than 0. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Modeling Internet usage with the dependent variable USER (Y=USER=1 if the respondent uses the Internet, 0 if not) produced quite strong statistical results (Table 4). The share of concordant pairs, one measure of the ability of a logit model to predict the correct outcome, for the models is over 83 percent, indicating strong predicative power.7 The large number of odds ratios that are significant at the =0.01 level or better is another indicator of the explanatory power of the models, but one that is also partially attributable to the large sample size over which the models were estimated. [Insert Table 4] The models are also quite robust. The odds ratios for almost all of the variables are remarkably stable across the four specifications, a noteworthy outcome in light of the large sample size and of the complex nonlinearity of a logit specification.8 Four specifications are presented in Table 4: Model 1 explains Internet use solely in terms of the personal characteristics of the individual; Model 2 adds basic county characteristics and population to the mix; Model 3 substitutes industry characteristics (discussed in detail, below) for the county characteristics in Model 2; Model 4 includes both county and industry characteristics, along with population interactions with both these sets of spatial-economic variables. Two summary statistics are offered to evaluate the four models in Table 4, the Akaike Information Criterion (AIC) and the Schwarz Bayesian Information Criterion (SBIC). Both are based on the the log of the likelihood function at its maximum, but include a penalty for including an excessive number of parameters. The SBIC is less forgiving of additional parameters than the AIC. Identifying the "best" model is a classical unsolved problem in statistics, nevertheless, in the exploratory spirit of this study, these information statistics provide a convenient way to evaluate these four reduced-form models. The lower the AIC or SBIC, the "better" the explanatory power of the model. Accordingly, the AIC suggests that Model 4 provides the best fit to the data, while the SBIC gives the nod to Model 2. From the AIC perspective, all four models are statistically distinguishable from one another with p-values in pairwise tests of less than 0.001 for all but a still significant contrast between Models 3 and 4 at <0.01.9 The 7 If the models offered no predictive power, the concordance rates would be around 50 percent (i.e., predicting usage with these models would be no better than that achieved by flipping a coin). The models, in other words, improve upon randomly flipping a fair coin to predict Internet usage by more than 33 percentage points. 8 The one striking exception to cross-model stability is the odds ratio for Population, which rises from 1.00 (rounded) in Models 2 and 3, yet jumps to 52.55 in Model 4. This is explained by the presence of population interaction terms in Model 4, especially those associated with the indicator variables for county type. Multiplying the odds for the population interaction with county type, all of which round to 0.03, by the population coefficient of 52.55 yields odds of 1.78, a figure nearly identical to what would be obtained with Model 4 without the "population by county type" interaction terms (odds=1.75, not shown here). 9 The difference in information statistics (either AIC or SBIC) across models is asymptotically chi-square distributed with degrees of freedom equal to the difference in the number of parameters of the two models being compared. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS pairwise contrasts are less crisp from the SBIC perspective. Models 1 and 4 and Models 3 and 4 are statistically indistinguishable at the conventional <0.05, the remaining pairs are significantly different, two (Model 1 vs 2 and 2 vs 4) at <0.05, one (1 vs 2) at <0.01, and one (2 vs 3) at <0.001. In the sections that follow, we adopt the more expansive AIC perspective and generally focus the discussion on Model 4. We turn first, in the section that follows, to the demographic characteristics of individuals, then in Section 3.1.2 to the spatialeconomic variables. 3.1.1 Demographic Determinants Most of the demographic variables included in the logit models are found to affect the likelihood of Internet utilization. In particular, the education variables strongly influence utilization. Holding other explanatory variables constant, utilization increases significantly with higher levels of education across all four specifications (Table 5).10 The effect of education can be expressed graphically, with the effects of other variables held constant (Figure 1, USER dataset). Whereas the partial odds ratio in favor of utilization by an individual with postgraduate training is over 4.0 (i.e., the probability of usage is four times as large as the probability of non-usage), for those with less than the eight years of education, the partial odds of Internet use is 0.16-0.17. [Insert Figure 1] Income also has a very substantial effect on Internet utilization, holding constant the effect of other variables. Individuals in households with income over $100,000 are three times more likely to use the Internet than not, whereas individuals with household income of less than $10,000, the partial odds ratio of utilization is 0.46, that is the odds of utilization are less than half the odds of non-utilization (Figure 2, USER dataset).11 In sum, these results confirm the findings in the literature concerning education and income, but with more compelling evidence in that the effects of other demographic variable are held constant in determining the unique effects of income and education. [Insert Figure 2] The literature finds that Internet utilization rates tend to decline with the age of the user, as discussed above. In the estimation here, however, the authors hypothesize that this decline is not linear with age. We test this by incorporating a quadratic term for age within the logit specifications. This complicates the interpretation of the impact of age on Internet usage and, although the partial odds associated with the quadratic term is not statistically significant, the resulting estimation shows a declining probability of 10 The reference category for the dummy indicator variables for education and income is, in both cases, "Don't Know". 11 See previous footnote. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS usage as age increases, but that the decline lessens as age increases (Figure 3).12 Further, the gap in the probability of Internet use between those without college education and those with at least some college education reaches its maximum at the age of 48. This may reflect the fact that people around that age group were among the first to be exposed to the computer revolution that swept colleges and universities earlier and more intensely than it did many other institutions in the 1970s and 1980s. These results represent a distinct refinement of the impact of age on utilization reported in the literature. [Insert Figure 3] The effect of family structure was also examined. Marriage has a significant positive effect on the odds of utilization, whereas the presence of children (i.e., being a parent) has no distinguishable effect on the odds of utilization, holding other variables constant (Table 4). The estimation shows significant differences in the likelihood of using the Internet according to the race/ethnicity of the respondent, confirming results frequently found in the existing literature (Table 4). The odds of utilization by blacks and Hispanics are significantly lower than for whites (all compared against the reference group "Asian and others"), with blacks having almost half the odds of using the Internet than Hispanics. Finally, the respondent’s employment status was found to have a strong effect on the odds of utilization. The partial odds ratio in favor of Internet usage of those employed is 1.7 times that of those unemployed. This result clearly establishes the importance of employment status. The determinants of utilization at work are examined separately in Section 3.2. To summarize, the demographic characteristics of individuals have a quite substantial effect on the propensity to use the Internet. The results from this estimation largely confirm the findings in the literature, but a more sophisticated methodology has been employed. The overall predictive power of the models and stability of the coefficients of demographic variables across several model specifications confirms the robustness of the results. 3.1.2 Spatial-Economic Determinants Having established the importance of various demographic characteristics on Internet utilization, this section examines the effects of characteristics of the community 12 Probabilities of utilization were calculated from logit model forecasts in light of the characteristics of each individual, substituting the marginal impact of "some college" (the weighted impact of the first four education variables) then replacing it with that of "some college +" (the weighted impact of the last three education variables) for each. The two probability plots represent gentle 6-th order polynomial smoothings of the outcomes arrayed by age. Ages below 20 and above 80 were dropped due to small-sample irregularities and problems of interpretation (eg, an 18-year old is unlikely to have graduated from college). Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS in which the respondent resides. As noted earlier in the discussion of Table 3, the availability of Internet access varies across space. To investigate this dimension of Internet utilization, communities are measured in three ways: (1) whether the county is in a metropolitan statistical area (MSA) and if so, how it relates to the MSA; (2) the population of the county; and, (3) the relative size of telecommunications intensive sectors of the local economy. Respondents residing in counties within an MSA have substantially higher odds of utilization than those in non-MSA counties, the reference category for the dummy variables accounting for county type (listed under the heading "County Characteristics" in Table 4). The variables accounting for the MSA status of the county allow for the consideration of the impacts that differing broad patterns of urbanization may have on Internet utilization. The three partial odds estimates in Models 2 and 4 fall in the range of 1.25 to 1.40. All are statistically significant with p-values less than 0.0001, and SingleCounty MSA is statistically distinguishable from the other two county types in Model 2 (but not in Model 4). Finding greater Internet utilization in non-rural, MSA counties is consistent with the literature, discussed above, which shows that utilization in rural areas (roughly consistent with the non-MSA county definition) is lower than in non-rural areas (Leigh and Atkinson, 2001; Department of Commerce, 2000; Kolko, 1999). Recent data from the Pew Internet Project suggests that lower Internet use in rural areas persist. Data from 2003 show that Internet penetration is about 10 percentage points lower than in urban and suburban areas. Some of the gap is attributable to the older population of American and some to an interaction effect of living in a rural area with income, i.e., the effect of income on Internet use varies across geographical region. In this case, low-income rural residents are much less likely to be online than their counterparts elsewhere or, put differently, high-income individuals in rural areas are just as likely to be online as high-income people elsewhere (Bell, et.al., 2004). The difference is more pronounced for broadband penetration at home. 2004 data show that rural Americans are about a third as likely to have high-speed connections at home as urban Americans; 10% of all rural Americans have broadband connections at home compared with 29% of urban Americans (Horrigan, 2004). County population is included in Models 2-4 to capture the network externality effects that size could conceivably have on the supply of and demand for Internet services and, consequently, on Internet utilization. Population is not a significant predictor in Models 2 and 3, but is significant when interacted with county and industry characteristics in Model 4 (see discussion in Section 3.1.3, below). The second spatial-economic measure, referred to here as "economic structure", is defined in terms of the percentage of county employment in each of seven telecommunications-intensive sectors in the economy: Communications; Finance, Insurance, and Real Estate ("FIRE"); Professional Services; Education, Health, and Social Services; Retail Trade; Manufacturing; and Other Telecom-Intensive Sectors Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS (Distribution Services, Personal Services and Entertainment, and Miscellaneous Business Services).13 It is expected that nature of the local economy may well affect the general environment for Internet usage. If adults in an area are familiar with the Internet because of its use at work, a culture of usage among adults in the area may emerge. Furthermore, higher utilization of the Internet in local businesses may generate better telecommunications infrastructure and this supply side factor could influence rates of utilization of the general community. Taken individually, these variables help identify sectors of the economy that may have a particularly strong impact on Internet use. As a set, they can be thought of as instruments that control for variation in the economic profile of counties across the United States, thereby providing more refined estimates of the partial odds associated with other predictors of Internet use. The results show that the nature of the local economy does affect utilization rates (see Table 4, Models 3 and 4). In particular, the higher the share of workers in Communications, Professional Services, Retail Trade and Telecommunications-intensive Manufacturing, the greater the marginal probability of Internet use. With partial odds ratios barely greater than 1.0, the effect of economic structure is quite modest compared to that of the demographic characteristics of respondents. The set of seven economic structure variables as a whole is nevertheless statistically significant with p<0.0001, and five of the partial odds are individually statistically significant at =0.05. 3.1.3 Population Interaction Effects To allow for differential effects on utilization across space, more complex forms of urbanization were introduced through the use of interaction terms. These terms were constructed for population and county type as well as for population and the economic structure variables. One can interpret the partial odds ratios of the interaction terms as modifying the partial odds associated with population in ways that depend on county type or feature of the economic structure. In other words, in Model 4 the effect of county population on Internet use is allowed to depend on the type of county and/or the nature of the economic structure of the county. 13 Using the input-output table for the United States, Schmandt and Wilson (1990) originally defined telecommunications-intensive sectors as industries whose purchases from the communication sector, SIC 48, are more than 1 percent of the total value of intermediate inputs. While 1 percent seems small, it is a relatively large threshold compared to input purchases across all sectors. These activities were translated here into a series of 2- to 6-digit NAICS (North American Industrial Classification System) industrial codes which were, in turn, mapped to the March 2000 U.S. County Business Patterns dataset to identify the proportion of total county employment attributable to these sectors in each of the 2575 counties represented in the Pew Survey. The non-trivial problems of data supression at this level of detail (due to having only one to three employers in a particular NAICS category in a given county) were resolved by an algorithm that involved exploiting unsuppressed information on the number of plants within employment categories, along with an interative proportional fitting routine to impute county employment within each of the seven telecommunications intensive sectors. Details are available from the authors. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Without interaction terms, population is found to have no effect on the overall odds of Internet use (Table 4; the partial odds are approximately equal to 1.00 and insignificant for Models 2 and 3). In the more complex specification of Model 4, increasing population is found to have a positive effect on the partial odds of utilization, holding other variables constant. In other words, the size of the county has an important, and positive effect, only when other features of the county are taken into account. The extraordinarily large partial odds associated with population in Model 4 (52.55) must be interpreted in conjunction with the partial odds of the interaction terms. From Model 4 (Table 4), all else held constant, a one-unit (one million) increase in population in a Single-County MSA, for example, increases the odds of Internet use by a factor of 1.79 (almost 80 percent) compared to that of a rural county: 1.79 = 52.55 x 0.034, the product of the respective partial odds ratios. For Central-County MSAs the partial odds factor is 1.68 (=52.55 x 0.032), and for Suburban-County MSAs 1.47 (=52.55 x 0.028). All three of these compound partial odds ratios are statistically significant at =0.001. Results for the interaction of population with the economic structure variables in Model 4 are less remarkable from a statistical point of view. Only the population interaction with the percent of county employment in professional services is statistically significant (p=0.004). The rest of the population/economic structure interactions are singly and collectively insignificant. The interpretation of the estimated partial odds ratios here is complicated by the fact that population and the economic structure variables are all continuous rather than discrete indicators like county type. Moreover, the magnitude of the partial odds for population (=52.55) and those of its interactions with economic structure yield compound partial odds that are difficult to interpret substantively. To the extent that any credence is given to these interactions at all, it is best to think of them as capturing some nonlinear complexity relating to economic structure which is held constant in interpreting the partial impacts of the remaining variables in the model. To summarize the findings resulting from the incorporation of interaction terms, the odds of Internet utilization for an individual, with a particular set of demographic characteristics, can vary by place of residence. In general, counties with a more telecommunication-intensive economy, larger population, and in MSAs have higher utilization rates holding constant demographic characteristics. Since this estimation is essentially a reduced-form specification, it cannot be determined whether these differences arise due to local Internet infrastructure and cost of the service or to the social milieu of the particularly area. Nevertheless, the results strongly suggest that future research on utilization should take into account community characteristics in estimating utilization of individuals. 3.2 Explaining Internet Utilization at Work Models for exploring the determinants of Internet use at work (binary dependent variable = WORK) were estimated for those 16,680 respondents in the sample who were employed at the time of the survey. The overall explanatory power of these models Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS (Table 5) is somewhat weaker than those estimated for USER, presented above, but the results are again quite robust across the alternative specifications.14 Model 4 is again the preferred specification according to the AIC criterion, while Model 3 is favored by the SBIC; all are pairwise distinguishable by both the AIC and SBIC criteria15 except for Models 3 and 4 under the AIC. The same general patterns of demographic effects on Internet utilization found in the USER models are repeated, but with some intriguing differences with respect to some demographic variables. The lower likelihood of Internet use among those with lower educational attainment, discussed earlier for the USER models, is somewhat more accentuated in the workplace models (Figure 1, WORK dataset). Only at the level of college education and post-graduate education do significantly higher likelihoods of utilization appear. In addition, the same positive monotonic trend in the partial odds ratios for income categories seen in the USER models (Figure 2) is observed in the WORK models, but with statistically significant differences only appearing once the $40,000 to $50,000 bracket is reached (as opposed to the $20,000 to $30,000 bracket for USER). In the workplace only individuals with high levels of education, and presumable higher levels of income, demonstrate markedly higher utilization rates (partial odds ratios greater than one). The actual process by which education levels affect workplace utilization cannot be conclusively determined from this dataset, but a hypothesis can be advanced. Employment for more highly educated individuals probably occurs in occupations requiring the processing or manipulation of information, the so-called symbolic analyst (Reich, 1992). Given that information technology facilities the exchange of information, such occupations are likely to comprised of more intensive users of the Internet (Department of Commerce, 2002). Several demographic variables that are significant in the USER models are found to have less significance on Internet use at work. The impact of age on the workplace use is slightly less diminished compared to the earlier models for USER (compare Tables 4 and 5). Being married has a positive effect on Internet usage in general, but has a small negative effect in the WORK models, although a statistically insignificant one in Model 4 (Table 5). Another quite interesting result is found in the race/ethnicity variables. The differences in the odds of utilization by race found in the estimation of USER models virtually disappear in the estimation for use at work. There is basically no statistically significant difference in the odds of use at work among whites, Hispanics, and blacks, holding constant the effects of other variables (the only exception being a lingering low partial odds for blacks in Models 1 and 2). The essential absence of effect of 14 An F-test determined that a separate set of regressions for use at work (WORK) is preferred to a single set of regressions on USER with interaction terms for EMPLOY (p<0.001). 15 P-values are all less than 0.001 with the exception of the contrast between Models 1 and 2 under the SBIC where p<0.05. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS race/ethnicity on workplace use of the Internet is encouraging, but perhaps not surprising when one considers the employer’s interest. If computers and Internet access at work were needed in certain jobs, it would make little sense for an employer to distinguish among workers based on race/ethnicity (or gender and marital status). Data on the occupation of the respondent would be needed to further explore this hypothesis, but unfortunately, the Pew Project data does not include occupation. The effects of economic structure and county type variables are similar in sign to those found in the USER estimations, but with lower values of the partial odds ratio. High levels of telecommunications-intensive industries in a county result in higher odds ratio of use at work. Although this result is consistent with the researchers’ expectations, data on the industry in which a respondent works would provide a sounder basis for drawing conclusions about the effect of this variable on usage. In terms of county type, the positive marginal effect of MSA counties over rural counties remains, although the effects are slightly smaller and, for Model 4, statistically weaker than in the USER models. The interaction terms again improve on the explanatory power of the models, but largely in terms of curve fitting without important interpretative value. That is to say that the marginal effects of economic structure, county type and population are not constant but vary in relationship to each other. The findings concerning workplace utilization reinforce the importance of demographic characteristics in explaining Internet use but with subtle differences with the USER models. The effects of education on Internet use are less pronounced in the workplace than they are more generally. By contrast, several variables in the USER estimation, race/ethnicity, marriage and parental status have little effect on the likelihood of utilization. We hypothesize that Internet usage at work would be made available for only those tasks where a need, irrespective of these characteristics of individual workers, exists. 4.0 Summary of Findings and Directions for Future Research The study has reconfirmed several findings reported in the literature on the impact of demographic variables on Internet utilization. The estimation of the logit models in this study, however, represents a distinct methodological improvement over other approaches used in the existing literature. The gross utilization rates, calculated for single variables, can mask the true effects of demographic characteristics on utilization. Although the very powerful effects of education and family income reported in the literature using bivariate analysis is confirmed in this multivariate approach, the difference in gross utilization between males and females found in earlier research disappears in the more complex specifications tested in the logit models. The research also suggests that the determinants of utilization at work will vary significantly from utilization in other environments. Utilization at work will largely Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS depend on whether an individual holds a position that the employer believes requires the capabilities offered by the Internet whereas utilization outside a work environment will certainly be a question of personal choice in many circumstances. Future data collecting efforts might well incorporate the occupation of the individual as well as the industry in which the individual works. The effect of education on the odds of utilization at work found in this project may actually derive from the occupation of the individual. The logit modeling in this research indicates a modest, but statistically significant, effect of a set of spatial-economic variables on the odds of utilization. A respondent residing in an urban county (i.e. a county within an MSA) has greater odds of utilizing the Internet than one in a rural (non-MSA) county. Furthermore, everything else held constant, the larger the population of the county, the greater the odds of utilization. Individuals residing in counties with relatively high levels of telecommunication intensive employment have higher odds of Internet utilization, holding other individual demographic characteristics constant. We explored the effects of more complex relationships among these three spatial-economic variables through the use of interaction terms. While these results were less robust than those in models without interaction terms, on the whole, community level effects appear significant. Although the empirical analysis presented in this paper is exploratory, promising directions for future research can be identified in terms of public policy evaluation and alternative approaches to the modeling of utilization. Many state and local governments have adopted policies, using a variety of mechanisms, to encourage the use of the Internet. The use of county level variables, as used in this study, forecloses the opportunity to investigate impacts of local policies since several local governments usually exist within a single county. However, the analysis could be used to identify counties where the odds of Internet utilization are much higher (or lower) than expected. This set of counties could be further investigated to determine whether innovative or aggressive Internet policies are being pursued by jurisdictions in the county. Several more sophisticated modeling approaches should be considered in future research. Rather than a focus on the users of the Internet in a single reduced form equation, as adopted in this paper, a simultaneous equation specification could allow for the estimation of supply and demand functions. The demand function would incorporate characteristics of individuals and the price of services to estimate demand for Internet services. Similarly, a supply function would estimate the level and quantity of services offered by telecommunications companies based on various characteristics of their provision and prices. Difficult measurement problems will be faced, such as a variable to measure price and quantity of services. But as new data sources incorporate information on prices, this methodological approach will become more common since it permits the calculation of price elasticities (this approach is adopted in Anindya, Flamm and Horrigan, forthcoming). Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Further investigation of community-level (or spatial-economic) effects on utilization may be pursued through multilevel modeling. The argument advanced in this paper is that utilization rates are affected by characteristics of individuals and of the community in which the individual resides. Despite the relatively crude measures of "community" used here – spatial-economic variables of county type, population and economic structure – the empirical results provide sufficient evidence to justify further investigation and the development of more refined measures of community. Multilevel modeling in spatial-economic settings can, for example, account for the nesting of geographic structures such as "suburban counties within an MSA" and do so in ways that allow for suburban county effects to "borrow statistical strength" from information about the overall MSA of which it is a component. The essential logic of multilevel modeling can be extended more abstractly to empirical Bayes estimators in ways that may provide an improved framework for capturing the effect of the structure of telecommunications intensive sectors on Internet use. The impact of "professional services" on a particular county, for example, could be "nested" within an overall "professional services" effect for the state, subnational region, or nation as a whole. Richly promising as multilevel and empirical Bayes modeling are in settings like the one studied here, these approaches typically require strong assumptions about the parametric structure and independence of the multiple error components that attend them. Future work is needed to identify the appropriate balance to strike in weighing statistical efficiency against strong a priori assumptions in settings like this. A second way to incorporate geographic information into the statistical modeling is to account for spatial autocorrelation. When present, this form of autocorrelation implies that, at some level, behavior in a single community is affected by behavior in proximate communities. For example, unobserved factors operating at the metropolitan level might have a distinct effect on county level behaviors within the metropolitan area. Evidence of such an effect would be reflected in spatial autocorrelation, where the error terms of proximate counties are correlated. Spatial autocorrelation models would provide a useful framework for examining spillover effects that may spread from the central MSA counties to suburban counties to exurban non-MSA counties bordering the MSA. While it is somewhat daunting to specify the covariance structure linking each county in the survey dataset to its neighbors, modern advances in computing power and in software for spatial statistics open a large arena for future research aimed at capturing the spatialeconomic structure underlying Internet use. The academic and policy literature on Internet utilization is expanding. The vital interests of telecommunications providers, public policy makers and social scientists are driving this research agenda. It will remain a challenging area of investigation for various theoretical, methodological and empirical reasons. Nevertheless, the importance of this new technology in reshaping society requires that the investigation be pursued. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS References Bell, Peter, Pavani Reddy, and Lee Rainie, February 2004. Rural Areas and the Internet. Pew Internet & American Life Project. Available online at: http://www.pewinternet.org/PPF/r/112/report_display.asp. Bimber, Bruce. 2000. Measuring the Gender Gap on the Internet. Social Science Quarterly 81: 868-76. Business Week. May 12, 2003. Special Report: The E-Biz Surprise. Chaudhuri, Anindya, Kenneth Flamm, and John Horrigan. 2004. An Analysis of the Determinants of Internet Access (Unpublished manuscript). ClickZ Internet Statistics and Demographics. Online. Available at: http://www.clickz.com/stats/big_picture/geographics/article.php/5911_151151. Accessed on July 27, 2004. Cooper, Mark and Gene Kimmelman. February 1999. The Digital Divide Confronts the Telecommunications Act of 1996. Online. Available at http://www.consumersunion.org/pdf/telecom1-0299.pdf. Accessed February 9, 2002. Cooper, Mark N. , October 2000. Disconnected, Disadvantaged, and Disenfranchised: Explanations in the Digital Divide. Consumers Union. Online. Available at http://www.consumersunion.org/pdf/disconnect.pdf. Accessed February 9, 2002. Devol, Ross C., America’s High-Tech Economy. July 1999. Milken Institute. Online. Available at http://www.milkeninstitute.org/mod30/ross_report.pdf. Accessed February 8, 2002 Department of Commerce, A Nation Online: How Americans are Expanding Their Use of the Internet. February 2002. Online. Available at http://www.ntia.doc.gov/ntiahome/dn/anationonline2.pdf. Accessed March 10, 2002. Department of Commerce. October 2000. Falling Through the Net: Toward Digital Inclusion. Online. Available at http://search.ntia.doc.gov/pdf/fttn00.pdf . Accessed January 18, 2002. Government Accounting Office. February 2001. Telecommunications: Characteristics and Choices of Internet Users. Online. Available at http://www.gao.gov/new.items/d01345.pdf. Accessed February 3, 2002. Gorman, Sean P., and Edward J. Malecki. March 2000. The Networks of the Internet: An Analysis of Provider Networks in the USA. Telecommunications Policy 24(2): 113-134. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Greenstein, Shane with Mercedes M. Lizardo and Pablo T. Spiller. February 1997. The Evolution of Advanced Large Scale Information Infrastructure in the United States, National Bureau of Economic Research Working Paper No. 5929. Hoffman, Donna L., Thomas P. Novak and Ann E. Schlosser. March 2000. The Evolution of the Digital Divide: How Gaps in Internet Access May Impact Electronic Commerce, Vanderbilt University. Online. Available at http://www.ascusc.org/jcmc/vol5/issue3/hoffman.html. Accessed February 5, 2002. Horrigan, John. February 2004. Broadband Penetration on the Upswing. Pew Internet & American Life Project. Available online at: http://www.pewinternet.org/PPF/r/121/report_display.asp. Kolko, Jed. July 1999. The High-Tech Rural Renaissance?: Information Technology, Firm Size and Rural Employment Growth. Online. Available at http://www.sba.gov/advo/research/rs201tot.pdf. Accessed February 7, 2002. Kridel, D. J., P. R. Rappoport, and L. D. Taylor. 1999. An Econometric Analysis of Internet Access. In The Future of the Telecommunications Industry: Forecasting and Demand Analysis, eds. David G. Loomis and Lester D. Taylor, p. 21-42. Kluwer Academic Press. Leigh, Andrew and Robert D. Atkinson. 2001. Clear Thinking on the Digital Divide, Washington, D.C.; Progressive Policy Institute. Available online at: http://www.ppionline.org/ppi_ci.cfm?knlgAreaID=107&subsecid=126&contentid=3490. Accessed on July 27, 2004 Moss, Mitchell L., and Anthony M. Townsend. 2000. The Internet Backbone and the American Metropolis. The Information Society 16(1): 35-47. Novak, Thomas P. and Donna L. Hoffman. February 1998. Bridging the Digital Divide: The Impact of Race on Computer Access and Internet Use, Project 2000, Vanderbilt University. Online. Available at http://ecommerce.vanderbilt.edu/research/papers/html/manuscripts/race/science.html. Accessed February 5, 2002. Ono, Hiroshi and Madeline Zavodny. March 2003. Gender and the Internet. Social Science Quarterly, 84(1):111-121. Reich, Robert B. 1992. The work of nations: preparing ourselves for 21st century capitalism. NY: Vintage Books. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Schmandt, Jurgen, Robert H. Wilson et al, 1990. The New urban infrastructure : cities and telecommunications. CT: Praeger. Spooner, Tom and Lee Rainie. July 2001. Hispanics and the Internet. Pew Internet & American Life. Online. Available at http://www.pewinternet.org/PPF/r/38/report_display.asp. Accessed July 27, 2004. UCLA Center for Communication Policy. November 2000. Surveying the Digital Future. Online. Available at http://sfpl4.sfpl.org/btdir/ucla-internet.pdf. Accessed March 29, 2002. Wilson, Robert H. 1993. States and the Economy Policymaking and Decentralization. CT: Praeger. Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Table 1: Definition of Variables with Gross Utilization Rates Variable Name Definition USER Use the Internet at home, work, or both (=1) or not (=0) WORK Employed and use the Internet at work and/or home (=1) or not (=0) Education level (highest level achieved) ED-Elem Grades 1-8 or less ED-NoHS High school incomplete ED-HS High school graduate ED-Tech/Voc Business/technical/vocational school ED-SomeCol Some college, no 4-year degree ED-ColGrad College graduate ED-PostGrad Post-graduate training ED-Unknown Don't know or missing Gross Utilization Number of (%) Observations 13,185 100.00 12,026 0.00 6,617 10,063 100.00 0.00 616 1,941 7,851 1,078 6,102 4,895 2,571 217 4.38 19.53 34.85 44.53 62.77 74.38 79.39 29.77 Ethnicity WHITE White, not Hispanic origin (=1) 19,101 54.01 BLACK Black, not Hispanic origin (=1) 2,654 39.30 HISPA Hispanic or Latino origin (=1) 1,757 46.96 ASIAN/Other Income INC-LT10 INC-10/20 INC-20/30 INC-30/40 INC-40/50 INC-50/75 INC-75/100 INC-100+ INC-Unknown Other PARENT Asian or other ethnicity (=1) 1,398 57.44 Household income less than $10,000 $10,000 to < $20,000 $20,000 to < $30,000 $30,000 to < $40,000 $40,000 to < $50,000 $50,000 to < $75,000 $75,000 to < $100,000 $100,000 or more Don't know or missing 1,498 2,247 3,042 2,923 2,407 3,419 1,935 2,019 5,781 22.76 27.90 41.42 53.27 61.32 71.48 78.19 82.81 41.34 8,880 16,391 13,933 11,338 61.53 47.11 56.74 46.57 16,680 8,591 11,920 13,351 NA 63.35 30.49 55.82 48.92 NA MARITAL EMPLOY GENDER AGE Parent/guardian of child under 18 (=1) or not (=0) Marital status, married or living as such (=1) or not (=0) Employment status, employed full or part time (=1). Used to subset the USER dataset into the WORK dataset. or not employed (=0) Male (=1) Female (=0) Age between 18-98 (continuous variable) Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Source: Pew Internet & American Life Project Survey (March–December 2000). Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Table 2: Gross Utilization Rates by Regions and County Type† USER Dataset Variable a. Region Region1 Region2 Region3 Region4 Overall Definition Northeast region of the U.S. Midwest South West % Total Number Internet of Responses Use WORK Dataset Total Number of Responses % Internet Use 4,713 6,119 9,639 4,800 25,271 53.8 50.3 49.9 57.7 52.2 3,184 4,074 6,270 3,152 16,680 39.3 38.3 39.1 42.9 39.7 5,911 41.9 3,623 30.2 4,153 56.0 2,713 40.2 9,054 55.7 6,146 42.5 6,153 55.7 4,198 43.4 25,271 52.2 16,680 39.7 b. County Type Non-MSA County County outside an MSA (N=1,756, average pop=28,982) MSA comprised of a single county Single-County (N=130, average pop= 392,094) MSA Central County Suburban County Overall Central county of a multi-county MSA (N=235, average pop= 440,656) Non-central county within a multicounty MSA (N=455, average pop= 152,785) (N=2,576, average pop= 106,730) † The USER dataset contains all usable responses to the survey. The WORK dataset is a subset of the USER dataset consisting of all respondents who are employed. The binary variable USER is equal to 1 if a respondent uses the Internet at home, at work, or both, and equal to 0 otherwise. Likewise, WORK=1 if a respondent uses the Internet at work (including "at work and at home"). Source: Pew Internet & American Life Project Survey (March–December 2000), U.S. Bureau of Census (Census, 2000). Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Table 3: Percent Internet Utilization at Work by Region and Type of County Type of County Non-MSA County Single-County MSA Central County Suburban County Overall Region Northeast Midwest South West Total Number of Observations 33.1 31.5 27.5 33.5 30.2 31.7 40.5 37.7 42.9 40.2 41.4 40.7 43.3 45.0 42.5 40.4 42.0 45.0 50.5 43.4 39.3 38.3 39.1 42.9 39.7 3,623 2,713 6,146 4,198 16,680 Source: Pew Internet & American Life Project Survey (March–December 2000), U.S. Bureau of Census (Census, 2000). Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Table 4: Impact of Demographic and Spatial-Economic Factors on Internet Utilization (Y = USER)† Model 1 Personal Characteristics ED-Elem ED-NoHS ED-HS ED-Tech/Voc ED-SomeCol ED-ColGrad ED-PostGrad INC-LT10 INC-10/20 INC-20/30 INC-30/40 INC-40/50 INC-50/75 INC-75/100 INC-100+ White Black Hispanic Male Age AgeSquared Parent Married Employed County Characteristics SglCty-MSA CtrCty-MSA SubCty-MSA Population Pop Economic Structure %Comms %FIRE %ProfServs %Educ/Hlth/Social %Retail %Mfg %OtherServices Interaction Effects Pop*SglCty Pop*CtlCty Pop*SubCty Pop*%Comms 0.16*** 0.36*** 0.70 1.15 2.06*** 3.25*** 4.79*** 0.45*** 0.51*** 0.83** 1.15** 1.41*** 2.06*** 2.60*** 3.17*** 0.94 0.53*** 0.60*** 0.95 0.92*** 1.00*** 0.99 1.22*** 1.72*** Model 2 0.16*** 0.36*** 0.70* 1.14 2.00*** 3.15*** 4.65*** 0.46*** 0.52*** 0.84** 1.16** 1.42*** 2.05*** 2.57*** 3.10*** 0.97 0.53*** 0.59*** 0.96 0.92*** 1.00*** 1.00 1.24*** 1.72*** Model 3 0.17*** 0.37*** 0.71 1.16 2.04*** 3.14*** 4.64*** 0.46*** 0.52*** 0.85** 1.16** 1.42*** 2.03*** 2.56*** 3.04*** 0.99 0.54*** 0.60*** 0.95 0.92*** 1.00*** 1.00 1.24*** 1.73*** 1.40*** 1.27*** 1.25*** 1.00 Model 4 0.17*** 0.37*** 0.71 1.15 2.00*** 3.09*** 4.57*** 0.46*** 0.53*** 0.85** 1.17** 1.42*** 2.03*** 2.55*** 3.03*** 1.00 0.55*** 0.59*** 0.96 0.92*** 1.00*** 1.00 1.25*** 1.74*** 1.35*** 1.26** 1.37*** 1.00 1.07** 1.01 1.03*** 1.00 1.03*** 1.05*** 1.01* 52.55*** 1.06* 1.02* 1.04*** 1.00 1.02*** 1.04** 1.00 0.034*** 0.032*** 0.028*** 1.03 Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Pop*%FIRE Pop*%ProfServs Pop*%Ed/Hlth/Soc Pop*%Retail Pop*%Mfg Pop*%OtherServices N = 25271 AIC SBIC % Concordant 0.98 0.97** 0.99 0.99 0.99 1.00 25464.8 25668.2 83.0 25420.1 25656.8 83.1 25351.7 25684.8 83.2 *** p < .001 ** p < .01 * p < .05 † Figures reported are partial odds ratios; basis for significance tests is "H 0: Partial Odds=1". Source: Pew Internet & American Life Project Survey (March–December 2000). 25319.4 25685.6 83.3 Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Figure 1: Partial Odds Ratio by Level of Education (Model 4) 5.00 4.57 4.50 3.83 4.00 3.50 Partial Odds Ratio 3.09 3.00 2.45 2.50 2.00 2.00 1.50 1.29 1.15 1.00 0.50 0.83 0.71 0.49 0.37 0.17 0.14 0.22 0.00 Elem NoHS HS Tech/Voc USER Dataset SomeCol WORK Dataset ColGrad PostGrad Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Figure 2: Partial Odds Ratio by Household Income (Model 4) 5.00 4.50 4.00 Partial Odds Ratio 3.50 3.03 3.00 2.55 2.36 2.50 2.03 1.99 2.00 1.17 1.17 1.03 0.85 0.8 1.00 0.50 1.5 1.42 1.50 0.46 0.43 0.53 0.55 0.00 LT10K 10K/20K 20K/30K 30K/40K 40K/50K Household Income USER Dataset WORK Dataset 50K/75K 75K/100K 100K+ Prepared for the MISRC/CRITO Symposium on the Digital Divide Carlson School of Management, University of Minnesota , August, 2004. NOT TO BE QUOTED WITHOUT PERMISSION OF AUTHORS Figure 3: Probability of Internet Use by Age & Education (Y=User) 0.9 0.8 0.7 0.6 Probability <-- Some College+ 0.5 0.4 No College --> 0.3 0.2 0.1 0 20 25 30 35 40 45 50 Age 55 60 65 70 75 80 Determinants of Internet Utilization-August 1, 2004 Table 5: Impact of Demographic and Spatial-Economic Factors on Internet Utilization (Y = WORK)† Model 1 Personal Characteristics ED-Elem ED-NoHS ED-HS ED-Tech/Voc ED-SomeCol ED-ColGrad ED-PostGrad INC-LT10 INC-10/20 INC-20/30 INC-30/40 INC-40/50 INC-50/75 INC-75/100 INC-100+ White Black Hispanic Male Age AgeSquared Parent Married County Characteristics SglCty-MSA CtrCty-MSA SubCty-MSA Population Pop Economic Structure %Comms %FIRE %ProfServs %Educ/Hlth/Social %Retail %Mfg %OtherServices Interaction Effects Pop*SglCty Pop*CtlCty Pop*SubCty Pop*%Comms Pop*%FIRE Pop*%ProfServs Pop*%Ed/Hlth/Soc Model 2 Model 3 Model 4 0.13*** 0.20*** 0.46** 0.78 1.23 2.39** 3.76*** 0.42*** 0.54*** 0.78*** 1.02 1.15* 1.51*** 2.00*** 2.46*** 1.00 0.84* 0.95 0.98 0.98*** 0.13*** 0.20*** 0.46** 0.78 1.22 2.35** 3.69*** 0.43*** 0.55*** 0.79** 1.03 1.16* 1.51*** 1.98*** 2.42*** 1.02 0.84* 0.95 0.99 0.98*** 0.14*** 0.22*** 0.49* 0.82 1.29 2.44** 3.81*** 0.42*** 0.56*** 0.80** 1.03 1.17* 1.50*** 1.98*** 2.35*** 1.07 0.87 0.98 0.98 0.98*** 0.14*** 0.22*** 0.49* 0.83 1.29 2.45** 3.83*** 0.43*** 0.55*** 0.80** 1.03 1.17* 1.50*** 1.99*** 2.36*** 1.07 0.86 0.98 0.98 0.98*** 1.00 1.05 0.90* 1.00 1.05 0.91* 1.00 1.07 0.92* 1.00 1.07 0.92 1.25*** 1.29*** 1.28*** 0.99 1.22* 1.09 1.32* 0.97** 1.09*** 1.01 1.03*** 0.99** 1.01 1.02 1.01 29.92** 1.11*** 1.02* 1.04*** 0.99* 1.02* 1.02 1.00 0.110* 0.116 0.088* 0.98 0.97* 0.97** 1.00 31 Determinants of Internet Utilization-August 1, 2004 Pop*%Retail Pop*%Mfg Pop*%OtherServices N = 16680 AIC SBIC % Concordant 0.95*** 0.99 1.00 18910.0 19095.3 76.0 18889.5 19105.7 76.1 18809.6 19056.7 76.5 18801.1 19148.5 76.5 *** p < .001 ** p < .01 * p < .05 † Figures reported are partial odds ratios; basis for significance tests is "H 0: Partial Odds=1". Source: Pew Internet & American Life Project Survey (March–December 2000). 32