an econometric analysis of the effects of internet

advertisement
AN ECONOMETRIC ANALYSIS OF THE EFFECTS OF INTERNET USE AT WORK
ON HOURLY WAGES
Written By
Nicolae Cristea
Submitted to Professors H. Barreto and F. Howland
In Partial Completion of the Requirements for Economics 31
April 17, 2000
Abstract
This paper uses Current Population Survey data to examine whether workers who
use the Internet at work earn a higher wage rate than otherwise similar workers who do
not use the Internet at work. A multivariate regression model is used to control for
variables that might be correlated with Internet use and earnings. The estimates show that
Internet users earn on average 16 to 18 percent higher wages, depending on the worker
characteristics. This study also offers some support for the technology-based hypothesis
of increasing wage inequality in the 1980s and 1990s.
2
Table of Contents
I.
Introduction
4
II.
Literature Review
6
III.
Theoretical Analysis
11
IV.
Empirical Results
14
A.
The Data
14
B.
Presentation and Interpretation of Empirical Analyses
18
V.
Conclusion
Bibliography
26
28
3
I.
Introduction
An increase in wage inequality since the 1970s has been documented by a large
number of researchers. This increase has been labeled as a platitude of labor economics
and found to be sharp in the 1980s, tapered off in the late 1980s, and reaccelerated in the
1990s (Bernstein and Mishel, 3)1. Alan B Krueger (1993) in his paper How Computers
Have Changed the Wage Structure: Evidence From Microdata, 1984-1989, lists two
leading hypotheses that have emerged to explain this increase in wage inequality. The
first hypothesis is that increased international competition and trade has hurt the
economic position of low-skilled and less-educated workers in the United States. The
second hypothesis is that rapid, skill-biased technological change in the 1980s caused
profound changes in the relative productivity of various types of workers.
This paper focuses on the technological hypothesis for the increase in the wage
inequality. CPS data sets will be used to explore the effect of technology’s impact on
wages. In particular, I will consider the issue of whether employees who use the Internet
at work earn higher wages. The analysis is similar to Krueger’s (1993) work and uses the
same methodology. He uses the “computer revolution” as a prototypical example of
technological change in the 1980s. I am using the “Internet revolution” as a prototypical
example of technological change in the 1990s.
This study is intended to explore the relationship between Internet use at work
and wages. The remainder of the paper is organized as follows. In section II I will discuss
the important research already done on this topic. In section III I will give some
1
Bernstein, Jared and Mishel, Lawrence, "Has Wage Inequality Stopped Growing?", Monthly Labor
Review, December 1997
4
theoretical background for the claims made in this paper. Section IV will be the main part
of the paper, were I will perform the empirical analysis, followed by the conclusion in
section V.
5
II.
Literature Review
Although extensive work has been done to explain the increasing wage inequality
of the recent decades, little or almost no attention has been paid to the emergence of the
Internet as a major medium in the 1990s. Research supporting or questioning the
technological hypothesis for wage inequality has been focused on the more general aspect
of computer use.
By far the most important analysis of how computer use has influenced wages is
Alan B. Krueger’s How Computers Have Changed the Wage Structure: Evidence From
Microdata, 1984-1989. His paper focuses on the issue of whether employees who use
computers at work earn more as a result of applying their computer skills, and whether
the premium for using a computer can account for much of the change in the wage
structure. After controlling for a number of variables such as experience, race, and sex,
Krueger finds that computer use accounts for about 10-15% higher earnings, depending
on the kind of worker, year, and control variables included. In addition to answering the
main question, the author addresses issues of possible bias and analyses the impact of
computer use on other wage differentials.
Krueger (1993) uses a semi-log form for his multivariable regression model. This
is a functional form that is very frequently used in models for wages. In particular, in the
model that he estimates, observation i’s natural log of the wage rate lnWi is assumed to
depend on Ci (a dummy variable that equals one if the ith individual uses a computer at
work, and zero otherwise), a vector of observed characteristics Xi, and error i:
lnWi = Xi + Ci  + 
6
Where  and  are parameters to be estimated.
I have summarized Krueger's (1993) findings in the table bellow.
Independent Variable
Intercept
Uses computer at work (1=yes)
Years of Education
Experience
Black (1 = yes)
Female (1 = yes)
Married (1 = yes)
Union Member(1 = yes)
October 1984
Coefficient Estimates
0.75
0.17
0.69
0.027
-0.098
-0.162
0.156
0.181
October 1989
SE
0.023
0.008
0.001
0.001
0.013
0.012
0.011
0.009
Coefficient Estimates
0.905
0.188
.075
0.027
-0.121
-0.172
0.159
0.182
SE
0.024
0.008
0.002
0.001
0.013
0.012
0.011
0.010
Note: Not all independent variables are included in the table
Table II.1 Krueger's (1993) OLS Regression Estimates of the Effect of Computer Use on Pay
The author concludes that within the framework of his analysis, the computer
dummy variable (Uses Computer at Work) has a sizable and statistically significant effect
on wages. I have replicated Krueger’s analysis using data from the 1997 CPS Computer
Ownership/Internet Supplement. The following table gives the estimates that I have
obtained:
Independent Variable
Intercept
Uses computer at work (1=yes)
Years of Education
Experience
Black (1 = yes)
Female (1 = yes)
Married (1 = yes)
Union Member(1 = yes)
October 1997
Coefficient Estimates
0.877
0.225
0.076
0.025
-0.098
-0.152
0.148
0.162
October 1998
SE
0.023
0.008
0.001
0.001
0.013
0.012
0.012
0.011
Coefficient Estimates
SE
Computer Use at Work
not Surveyed!
Note: Not all independent variables are included in the table
Table II.2 October 1997 OLS Regression Estimates of the Effect of Computer Use on Pay
My findings provide evidence that the computer use wage premium that Krueger
calculated for 1984 and 1989 is still in place, and if anything, it even increased for 1997.
Concerning omitted variable bias, Krueger (1993) tries a number of empirical
strategies to probe whether the computer pay differential is a real consequence of
computer use or is caused by some omitted variable. First, he looks at more homogeneous
7
groups of workers. He considers a sample of secretaries, one of the occupational groups
defined in the CPS. He finds that for secretaries, the estimated computer use wage
premium is at least as important as for one year of additional schooling. Also, in addition
to the CPS data set, Krueger examined data from an additional data set: High School and
Beyond Survey. He was able to consider a more comprehensive set of personal
characteristics. His conclusion was similar to the one from the CPS estimates: "Computer
use at work is an important determinant of earnings"(Krueger, 50). It is theoretically
possible that Krueger is not considering some variable that is correlated with computer
use and, thus, is responsible for the observed wage premium. However, a variable like
that is far from obvious, and the author has considered nearly everything available that
could be correlated with computer use.
A replication of Kruger’s analysis, using a different data set, is The Returns to
Computer Use Revisited: Have Pencils Changed the Wage Structure Too?” by John E.
DiNardo and Jörn-Steffen Pischke. In addition to estimating the wage differential
associated with the use of a computer at work, the authors use data on German workers to
estimate wage differentials associated with the use of a calculator, a telephone, writing
materials, or sitting on the job. They find that the wage differentials associated with these
“white-collar” tools are almost as large as those measured for computer use. DiNardo and
Pischke conclude that the results seem to suggest that computer users possess unobserved
skills which might have little to do with computers, but which are rewarded in the labor
market, or that computers were first introduced in higher paying occupations or jobs”
(DiNardo and Pischke, 292). In interpreting their results, the authors determined that a
8
direct link between the effect of computer use on wages and changes in the wage
structure is weak at best.
Krueger's study has dealt with this problem by estimating premiums for computer
use outside the workplace. If it were the case that computer users in general and not only
computer users at work posses skills that are rewarded in the labor market, then the wage
premium for computer use outside the workplace should have been very close to the
wage premium for computer use at work. Krueger found that this was not the case. In my
study, I will similarly deal with the possibility that Internet use in general, but not
Internet use at work is rewarded in the labor market2.
Another study that challenges the validity of the technology-based hypothesis of
the growth of earnings inequality is Computers and the Wage Structure by Michael J.
Handel. By finding that most of the growth of the inequality in wages took place in the
early 1980s and that computer use at work had equalizing impacts on the gender wage
gap, as well as decreasing the wage gaps between education groups, Handel (1999)
questions the validity of Krueger’s (1993) findings. The increase in wage inequality did
not stop in the yearly 1980s. Bernstein and Mishel (1997) found that "the sharpest
increase was in the early 1980s, followed by a flattening in the second half of the 1980s
and a reacceleration in the 1990s. This roughly coincides with the development of new
technologies and implementation of computer at work in the past two decades.
John Bound’s and George Johnson’s Changes in the Structure of Wages in the
1980s: An Evaluation of Alternative Explanations point out important findings as well.
After thoroughly analyzing a number of alternative explanations, the authors conclude
that the primary cause of increasing wage inequality is technological advances.
9
Furthermore, Stephen Machin and John Van Reen, in their paper Technology and
Changes in Skill Structure: Evidence From Seven OECD Countries, find that there have
been shifts in relative labor demand that have favored skilled workers on an international
level. They conclude that changes in the wage and employment distribution are closely
tied to technical changes.
There is considerable support from the researchers in the field for the
technological hypothesis for wage inequality. Krueger (1993) went a step further by
pointing out the return to a more specific part of technology - computers. He found a
positive, statistically significant wage premium to computer use. Although questions
about possible omitted variable bias still remain, nearly the best effort was undertaken to
consider every possible variable. In a similar fashion I will look at Internet use at work.
Bound and Johnson (1992) pointed out that technology is the reason behind increasing
wage inequality. In addition, Bernstein and Mishel (1997) provide evidence for
increasing wage inequality in the 1990s. The development of the Internet in the 1990s
has symbolized technological change in this period. My analysis will link this
development of the Internet (as part of the technological development) and the increase in
the wage inequality.
III.
2
Theoretical Analysis
See Internet use at Home and at Work in section IV.B
10
The development of the Internet as a major communications medium started in
the early 1990s. Ever since then, the growth of the Internet has been astounding. The
Internet Software Consortium estimates the number of Internet hosts to have grown from
1,313,000 in January 1993 to 72,398,092 in January 2000. A graph of the “learning
curve” type growth pattern during this period is given bellow in Figure III.1.
Figure III.1 Number of Internet Hosts 1993-20003
Today, in the US, the Internet has penetrated every sphere of human activity. In
the workplace, the Internet is used for a variety of tasks, ranging from instant
communications and file exchange via email to actual monetary transactions. Such a
severe shock to the traditional workplace has to have had significant implications for the
3
Source: Internet Software Consortium (http://www.isc.org/)
11
parties involved. More precisely, this paper investigates the effects of Internet usage in
the workplace on hourly wages.
In the context of the broader idea of increased wage inequality in the 1990s, the
Internet use at work explanation fits the technology hypothesis described in the
introduction of this paper. It can be claimed that Internet use at work increases the
productivity of the workers who use it and, hence, increases their wages. The problem
with such a claim is that it is possible that the estimate is confounded by other variables
like education, experience, race, or gender. Running a multivariate regression model that
controls for confounding variables is necessary.
The linear function form of the multivariate regression model that I will use to
estimate the coefficient for the independent variable that I am interested in is:
Y = 0 + 1X1 + 2X2 + … + nXn + ,
Where, Y is the dependent variable
Bi is the coefficient term of the independent variable Xi,
 is an error term.
Special attention needs to be paid to the origin of the error term. The error term in
the model above represents the influence of omitted variables that can not be measured,
as well as pure measurement error. In order for the above model to be a good tool for
estimating slope coefficients, the error term needs to be generated in a very specific way.
Namely, the error term has to represent a draw from the Standard Econometric Gaussian
Error Box. In order for the hypothesis testing using the estimates obtained with the model
above to be valid, certain assumptions about the Standard Econometric Gaussian Error
Box Model have to hold. Firstly, the average of the “tickets” in the box has to be zero.
12
This means that the measurement process that generated the data has to be unbiased.
Secondly, each measurement has to be independent of every other measurement. A
prediction of the next draw based on any other draw has to be impossible. Thirdly, each
measurement has to face the same array of possible errors. In other words, the errors need
to be identically distributed.
The third assumption is a problem for the linear function form of the multivariate
regression model for wages. More precisely, heteroscedasticity will cause the reported
OLS SEs to be biased and OLS will no longer be BLUE (Best Linear Unbiased
Estimator). In order to deal with heteroscedasticity, I will use a semi-log functional form
for the model. Such a functional form is very frequently used in models for wages. In
addition, this is the form that Krueger (1993) uses in his analysis of computer usage at
work and wages. Thus, the regression equation will take the following form:
lnY = 0 + 1X1 + 2X2 + … + nXn + ,
Where, Y is the dependent variable
Bi is the coefficient term of the independent variable Xi,
 is an error term
I will use the semi-log functional form for the majority of models that I will
estimate in this paper.
13
IV. Empirical Results
A.
The Data
I will use in my empirical analysis data from the 1998 and 1997 Current
Population Survey Computer/Internet Use Supplements, which I obtained using the data
extraction tool available at http://ferret.bls.census.gov/. The sample is restricted to the
respondents that reported earnings data, which represents about one fourth of the initial
data set. The sample is further restricted to respondents that were in the civilian labor
force and reported the number of hours usually worked during the week. In order to
minimize possible confounding, I also restricted the sample to workers who held one
main job. In this case I am able to look at a more homogenous population. I need to do
this because no separate data on Internet use is available for the respondents who worked
multiple jobs. In other words, it is not clear at which of these jobs they are or are not
using the Internet. After these requirements were imposed, the sample size for 1997 was
11983 and for 1998 11388.
The dependent variable in my analysis is Hourly Wage. I have built this variable
using a number of Earnings and Labor Force Variables from the data set. As I have
mentioned above, only one fourth of the respondents were asked earnings questions.
Among these, the ones who were hourly workers reported their hourly wage. In this case,
that is the value that I assigned to Hourly Wage for that particular respondent. Nonhourly workers reported weekly earnings. For these respondents, I divided the weekly
earnings reported by the number of hours worked during the week at the main job.
Following Krueger’s example, I set a lower limit for the hourly wage at $3.00 (Krueger's
limit was $1.50, but his analysis is of the 1980s). This further reduced the sample to
14
11266 for 1998 and 11860 for 1997. It is also worth mentioning that the “hourly wage”
reported by hourly workers in the data sets were top coded at $99.99 and weekly earnings
were top coded at $2884.61. In order to partially fix this, I deleted 6 entries from the 1997
sample and 5 entries from the 1998 sample for which the Hourly Wage was greater then
or equal to $99.
The independent variable that will be the focus of my analysis is Uses Internet at
Work, which is a dummy variable equaling 1 if the respondent reported using the
Internet at work and 0 otherwise. There are a number of issues about this variable that
need to be discussed:
1) The universe for the 1998 Uses Internet at Work variable is “Internet use
outside the home”. Thus, the “out of universe” values are respondents that do
not use the Internet outside their homes. For the purposes of this paper, these
individuals were coded as not using the internet at work
2) The universe for the 1997 Uses Internet at Work variable is “computer use at
work”. Thus those who did not report using a computer at work were also
coded as not using the Internet. This was valid to a great extent in 1997 as
other ways of accessing the Internet except for the computer were not widely
available. However, this might become a valid concern in the future.
The variables that I will use in my analysis are given in the table bellow:
Nr.
Variable Name
1
Hourly Wage
2
Uses Internet at Work
3
Education Less than 8th Grade
4
Education High School No Diploma
Variable Description
Data shows amount of dollars earned per hour. Hourly
wage for hourly workers, weekly earnings divided by
number hours worked at main job for non-hourly workers
A dummy variable equaling 1 if the respondent uses the
internet at work and zero otherwise
A dummy variable equaling 1 if the respondent's highest
level of school completed was less then 8th grade.
A dummy variable equaling 1 if the respondent's highest
level of school completed was less then 12th grade, or
15
5
Education Some College No Degree
6
Education Associate Degree
7
Education Bachelor's Degree
8
Education Master's Degree
9
Education Prof. School or Doctorate
10
Experience
11
12
Experience Squared
Black
13
American Indian
14
Asian
15
Married
16
Female
17
18
Married*Female
Part-time Worker
19
Lives in the North-East
20
Lives in the South
21
Lives in the West
22
Private Sector Worker
23
Years of Education
24
Uses Internet at Home
12th grade but no diploma and 0 otherwise
A dummy variable equaling 1 if the respondent's highest
level of school completed was some college but no
degree and zero otherwise.
A dummy variable equaling 1 if the respondent's highest
level of school completed was an associate degree and
zero otherwise.
A dummy variable equaling 1 if the respondent's highest
level of school completed was a bachelor's degree and
zero otherwise.
A dummy variable equaling 1 if the respondent's highest
level of school completed was master's degree and zero
otherwise.
A dummy variable equaling 1 if the respondent's highest
level of school completed was professional school or
doctorate degree and zero otherwise.
Experience in years as of the year surveyed. Calculated as
Age - Education in Years - 6
Experience^2
Dummy variable.
Demographics - race of respondent.
1 = Black, 0 = Other race
Dummy variable.
Demographics - race of respondent.
1 = American Indian, 0 = Other race
Dummy variable.
Demographics - race of respondent.
1 = Asian, 0 = Other race
Dummy variable.
Demographics - marital status.
1 = Married, 0 = otherwise
Dummy variable.
Demographics - sex.
1 = Female, 0 = Male
Dummy variable. Interaction term.
Dummy variable.
1 = Part-time Worker, 0 = Full-time Worker
Dummy variable.
1 = Lives in the North-East, 0 = Lives outside of the
North-East region
Dummy variable.
1 = Lives in the South, 0 = Lives outside of the South
region
Dummy variable.
1 = Lives in the West, 0 = Lives outside of the West
region
Dummy variable.
1 = Works in the Private Sector of the Economy, 0 =
Works in the Government Sector
Constructed variable based on highest level of schooling
variable.
A dummy variable equaling 1 if the respondent uses the
internet at home and zero otherwise
Table IV.A.1 Variable Description
16
The summary statistics for the variables considered are given bellow:
Variable Name
Hourly Wage
Uses Internet at Work
Education Less than 8th
Grade
Education High School No
Diploma
Education Some College
No Degree
Education Associate
Degree
Education Bachelor's
Degree
Education Master's Degree
Education Prof. School or
Doctorate
Experience
Experience Squared
Black
American Indian
Asian
Married
Female
Married*Female
Union Member
Part-time Worker
Lives in the North-East
Lives in the South
Lives in the West
Private Sector Worker
Years of Education
Uses Internet at Home
October 1997
Max
Min
92.3
3
1
0
1
0
Mean
13.1
0.16
0.03
SD
8.10
0.37
0.17
0.09
0.29
1
0.2
0.4
0.08
October 1998
Max
Min
96.15
3
1
0
1
0
n
11854
11854
11854
Mean
14.16
0.2
0.03
SD
9.23
0.4
0.17
n
11261
11261
11261
0
11854
0.1
0.3
1
0
11261
1
0
11854
0.2
0.4
1
0
11261
0.28
1
0
11854
0.09
0.29
1
0
11261
0.19
0.39
1
0
11854
0.18
0.38
1
0
11261
0.06
0.02
0.23
0.15
1
1
0
0
11854
11854
0.07
0.02
0.25
0.15
1
1
0
0
11261
11261
19.31
534.3
0.1
0.01
0.04
0.59
0.49
0.28
0.14
0.82
0.21
0.3
0.24
0.84
13.06
0.17
12.70
599.1
0.3
0.1
0.19
0.49
0.5
0.45
0.34
0.38
0.41
0.46
0.43
0.37
2.74
0.38
77
5929
1
1
1
1
1
1
1
1
1
1
1
1
20
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
11854
19.64
546.8
0.1
0.01
0.04
0.59
0.49
0.28
0.14
0.82
0.61
0.22
0.23
0.84
13.12
0.33
12.69
597.5
0.3
0.11
0.2
0.49
0.5
0.45
0.35
0.38
0.49
0.42
0.42
0.37
2.81
0.47
77
5929
1
1
1
1
1
1
1
1
1
1
1
1
20
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
11261
Table IV.A.2 Summary Statistics
In 1997, 16% of the respondents were using the Internet at work. In 1998 this
number rose to 20%. The respective numbers for Internet use at home are 17% and 33%.
Home Internet users were increasing in numbers quicker than "at work" Internet users. In
both years, 10% of the respondents were African American, 49% female and 19% and
18% respectively had at least college level education.
B.
Presentation and Interpretation of Empirical Analyses
17
General Findings and the Validity of the Box Model
I will initially run a linear multivariate regression equation relating the observed
independent variables to the dependent variable.
Hourly Wage = 0 + 1*Uses Internet at Work + 2*Education less than 8th Grade +
3*Education High School no Diploma + 4*Education Some College +
5*Education Associate Degree + 6*Education Bachelor's Degree + 7*Education
Master's Degree + 8*Education Professional School or Doctorate +  9*Experience
+ 10*Experience Squared + 11*Black + 12*American Indian + 13*Asian +
14*Female + 15*Married*Female + 16*Married + 17*Lives in the North East +
18*Lives in the South + 19*Lives in the West + 20*Union Member + 21*Private
Sector Worker+22* Part-time Worker + .
Where, 0 to 22 are the coefficient terms for the independent variables,
 is an error term.
JMP outputs the following parameter estimates:
Variable Name
Intercept
Uses Internet at Work
Education Less than 8th Grade
Education High School No Diploma
Education Some College No Degree
Education Associate Degree
Education Bachelor's Degree
Education Master's Degree
Education Prof. School or Doctorate
Experience
Experience Squared
Black
American Indian
Asian
Married
Female
Married*Female
Union Member
October 1997
Estimate SE
4.903
0.304
3.124
0.171
-3.866
0.357
-1.058
0.224
1.145
0.167
2.758
0.229
6.452
0.178
8.444
0.280
11.794
0.413
0.347
0.016
-0.005
0.000
-1.351
0.202
0.327
0.591
-0.511
0.317
2.103
0.178
-1.458
0.185
-2.043
0.239
1.775
0.183
t-stat
16.110
18.300
-10.820
-4.730
6.850
12.040
36.270
30.180
28.540
22.280
-17.050
-6.700
0.550
-1.610
11.780
-7.900
-8.540
9.680
October 1998
Estimate SE
4.922
0.366
2.787
0.190
-3.516
0.428
-1.217
0.268
1.636
0.202
2.700
0.266
7.222
0.219
9.856
0.315
14.086
0.483
0.403
0.019
-0.007
0.000
-1.201
0.244
-0.324
0.638
-0.921
0.354
1.789
0.217
-2.173
0.222
-1.621
0.288
1.202
0.217
t-stat
13.460
14.670
-8.210
-4.530
8.100
10.150
32.970
31.260
29.160
21.030
-16.460
-4.920
-0.510
-2.600
8.260
-9.810
-5.630
5.540
18
Part-time Worker
1.271
0.170
7.500
1.243
Lives in the North-East
0.939
0.173
5.430
0.935
Lives in the South
-0.313
0.162
-1.940
-0.134
Lives in the West
0.416
0.169
2.460
0.381
Private Sector Worker
0.540
0.173
3.120
1.014
Table IV.B.1 Parameter Estimates for the linear regression model
0.205
0.204
0.193
0.204
0.208
6.060
4.570
-0.690
1.870
4.870
In this case I am assuming that the error terms in the equation that I have
estimated are generated by the Standard Econometric Gaussian Error Box Model. The
error terms account for the influences of omitted variables and measurement error that
took place during data generation. In order for the results obtained with the aid of this
model to be useful, however, the requirements of the Standard Econometric Gaussian
Error Box Model have to hold. I have already listed the requirements of this model in the
theoretical analysis section of this paper. The problem here is that the error terms are not
identically distributed. We can not see the error terms but we can see the residuals. The
table bellow gives the SD's of the Hourly Wage residuals for the different levels of
education in 1997 and 1998.
SD's of Hourly Wage Residuals
Education
1997
Educ. < = 8th Grade
3.441089895
Educ. = No Diploma High School
5.211222387
Educ. = High School
5.024704702
Educ. = Some College, No Degree
5.72779791
Educ. = Associate Degree
6.419010175
Educ. = Bachelor's Degree
8.280047537
Educ. = Master's Degree
8.536087498
Educ. = Professional School or Doctorate
10.69772711
1998
4.188114875
3.858918013
5.070032609
6.374499385
7.17440541
10.03441887
10.71291691
13.93272381
Table IV.B.2 SD's of Hourly Wage Residuals, linear functional form
As it can be clearly seen, the spread of the residuals increases as the education
increases. This is strong visual evidence that heteroscedasticity is present. To be certain,
however, I will conduct the Goldfeld-Quandt test for detecting heteroscedasticity. I will
use the continuous Years of Education variable to create "high" and "low" dispersion
19
groups. For 1997, I have obtained a G_Q statistic of 9.08 with a P-value extremely close
to 0. Similarly, for 1998, I have obtained a G-Q statistic of 15.99 with a P-value
extremely close to 0. This clearly demonstrates the presence of heteroscedasticity in both
samples.
In order to ameliorate the problem of heteroscedasticity, I will turn to a different
functional form for my regression equation. A semi-log linear specification would be a
more appropriate way to estimate the coefficients. Krueger (1993) uses a semi-log
specification in his analysis. The equation that I will estimate for 1997 and 1998 is the
following:
ln Hourly Wage = 0 + 1*Uses Internet at Work + 2*Education less than 8th Grade
+ 3*Education High School no Diploma + 4*Education Some College +
5*Education Associate Degree + 6*Education Bachelor's Degree + 7*Education
Master's Degree + 8*Education Professional School or Doctorate +  9*Experience
+ 10*Experience Squared + 11*Black + 12*American Indian + 13*Asian +
14*Female + 15*Married*Female + 16*Married + 17*Lives in the North East +
18*Lives in the South + 19*Lives in the West + 20*Union Member + 21*Private
Sector Worker + 22* Part-time Worker + .
Where, 0 to 22 are the coefficient terms for the independent variables,
 is an error term.
The JMP output for this equation is the following:
Variable Name
Intercept
Uses Internet at Work
Education Less than 8th Grade
Education High School No Diploma
Estimate
1.788
0.198
-0.346
-0.143
October 1997
SE
t-stat
0.019
92.660
0.011
18.340
0.023
-15.270
0.014
-10.110
Estimate
1.836
0.184
-0.304
-0.152
October 1998
SE
t-stat
0.021
89.360
0.011
17.200
0.024
-12.660
0.015
-10.080
20
Education Some College No Degree
0.086
0.011
8.080
0.100
0.011
Education Associate Degree
0.214
0.015
14.750
0.191
0.015
Education Bachelor's Degree
0.439
0.011
38.980
0.431
0.012
Education Master's Degree
0.535
0.018
30.190
0.556
0.018
Education Prof. School or Doctorate
0.670
0.026
25.560
0.701
0.027
Experience
0.027
0.001
26.850
0.028
0.001
Experience Squared
0.000
0.000
-21.140
0.000
0.000
Black
-0.115
0.013
-8.960
-0.090
0.014
American Indian
-0.004
0.037
-0.120
-0.022
0.036
Asian
-0.034
0.020
-1.710
-0.052
0.020
Married
0.144
0.011
12.730
0.109
0.012
Female
-0.106
0.012
-9.050
-0.145
0.012
Married*Female
-0.115
0.015
-7.550
-0.072
0.016
Union Member
0.159
0.012
13.630
0.123
0.012
Part-time Worker
0.177
0.011
16.450
0.185
0.012
Lives in the North-East
0.063
0.011
5.740
0.056
0.011
Lives in the South
-0.023
0.010
-2.250
-0.016
0.011
Lives in the West
0.016
0.011
1.450
0.021
0.011
Private Sector Worker
0.014
0.011
1.310
0.036
0.012
Table IV.B.3 Parameter estimates for the semi-log regression model
8.830
12.800
35.060
31.400
25.830
25.610
-20.440
-6.570
-0.610
-2.620
8.920
-11.620
-4.420
10.120
16.100
4.850
-1.510
1.810
3.110
The table bellow gives the new SD's for the Hourly Wage residuals
SD's of Hourly Wage Residuals
1997
Education
Educ. < = 8th Grade
Educ. = No Diploma High School
Educ. = High School
Educ. = Some College, No Degree
Educ. = Associate Degree
Educ. = Bachelor's Degree
Educ. = Master's Degree
Educ. = Professional School or Doctorate
1998
0.35
0.30
0.36
0.37
0.41
0.41
0.38
0.42
0.40
0.34
0.32
0.42
0.44
0.48
0.47
0.56
Table IV.B.3 SD's of Hourly Wage Residuals, semi-log functional form
Visually, there is not enough evidence that heteroscedasticity has been completely
eliminated. However, it is obvious that it has been ameliorated. The increase in the spread
of residuals with the increase in the education is not nearly as dramatic as with the
previous functional form. I will turn once again to the Goldfeld-Quandt test for
determining the presence of heteroscedasticity. For 1997, I have obtained a G-Q statistic
of 3.13 with a P-value close to 0. Similarly, for 1998, I have obtained a G-Q statistic of
3.2 with a P-value extremely close to 0. Heteroscedasticity is still present, although it has
21
been greatly decreased. In a more accurate analysis, correcting the heteroscedasticity
should be considered.
Hypothesis Testing
Based on the estimates of the coefficients of the dependent variables that I have obtained
using a multivariate semi-log regression model (Table IV.B.3), I can perform the
following hypothesis tests for 1997 and 1998:
1)
NULL:
B1 = 0, holding the other independent variables constant, Uses
Internet at Work has no effect on Hourly Earnings
ALTERNATIVE:
B1  0, holding the other independent variables constant, Uses
Internet at Work has an effect on Hourly Earnings
The t-statistic reported by JMP for 1997 is 18.34 and for 1998 is 17.20. The JMP reported
P-value for both years is less than .0001. This means that if the null hypothesis were true,
the probability of getting a result like the one in the samples above or even more extreme
is less than .0001. Thus, the regression shows a statistically significant relationship
between Uses Internet at Work and Hourly Wages.
To assess the economic importance of the estimated coefficients, I will look at two
identical individuals that are different only in using a computer at work. For this purpose
I will consider a hypothetical female from the 1997 sample, that has a college degree, has
19.31(at mean) years of experience, is white, married, not a union member, lives in the
North East, works in the private sector of the economy, and is a full-time worker. If she
happened to be using the Internet at work, my model predicts that she was making and
hourly wage of $14.80. The same person, but in the case that she was not using the
Internet, is predicted to make $12.40, which is significantly lower (about 16% less). To
22
emphasize the importance of the Internet wage premium, I will look into the following
situation. If I assume that the person who is using the internet did not even have a 4 year
bachelor's degree, but only a 2 year associate degree, she would still be making $11.88,
which is close to what the person with a 4 year degree but not using the Internet is
making ($12.40). This boldly points out that the return to Internet use at work may be as
important as nearly two years of college education. In the model above I found the
differential in hourly pay between workers who use the Internet at work and those who
do not to be 21.9% (exp(0.1984)-1) in 1997 and 20.1%(exp(0.1835)-1) in 1998.
In addition, I should point out the negative estimates for a number of coefficients
of the independent variables. Namely, in both years, the coefficients for Education Less
than 8th Grade, Education High School No Diploma, Black, Asian, Female,
Married*Female, and Lives in the South were negative. The individuals with the above
characteristics are predicted to be disadvantaged in the labor market. The coefficient
estimates for all of the above listed variables are statistically significant. JMP reports Pvalues of less than 0.001 for all of the coefficient estimates.
Internet Use at Home and at Work
A concern that has been pointed out in the literature review section of this paper is
that perhaps Internet users posses unobserved skills that are correlated with Internet use
and are rewarded in the labor market. This is not an issue of omitted variable bias, but
rather an issue of weather it is Internet use at work that is rewarded and not some skill
that Internet users in general have. In order to attempt to clarify this issue, I am going to
estimate another model in which I will include two additional independent variables:
Internet Use at Home, and Internet Use at Home*Internet Use at Work. If workers
23
are indeed rewarded for unobserved skills that are associated with Internet use, then one
would expect the coefficient for Internet Use at Home*Internet Use at Work to be the
largest, followed by both Internet Use at Home and Internet Use at Work at roughly
equal values. If not Internet use at work but unobserved skills that Internet users have are
rewarded, then it doesn’t matter were the person uses the Internet, she will still have the
positive coefficients. Bellow is JMP’s output for this particular model:
Variable Name
October 1997
October 1998
Estimate SE
t-stat
Estimate SE
t-stat
Intercept
1.773
0.019
91.610
1.798
0.021
86.840
Uses Internet at Work
0.187
0.013
13.890
0.176
0.014
12.820
Education Less than 8th Grade
-0.343
0.023
-15.160
-0.289
0.024
-12.070
Education High School No Diploma
-0.143
0.014
-10.100
-0.149
0.015
-9.930
Education Some College No Degree
0.078
0.011
7.360
0.088
0.011
7.800
Education Associate Degree
0.206
0.015
14.210
0.176
0.015
11.840
Education Bachelor's Degree
0.427
0.011
37.720
0.407
0.012
32.910
Education Master's Degree
0.514
0.018
28.810
0.522
0.018
29.400
Education Prof. School or Doctorate
0.653
0.026
24.930
0.672
0.027
24.860
Experience
0.027
0.001
27.040
0.028
0.001
25.830
Experience Squared
0.000
0.000
-21.070
0.000
0.000
-20.290
Black
-0.108
0.013
-8.470
-0.070
0.014
-5.080
American Indian
0.004
0.037
0.110
-0.017
0.036
-0.470
Asian
-0.029
0.020
-1.430
-0.051
0.020
-2.590
Married
0.140
0.011
12.430
0.102
0.012
8.410
Female
-0.103
0.012
-8.820
-0.137
0.012
-11.060
Married*Female
-0.114
0.015
-7.520
-0.077
0.016
-4.820
Union Member
0.159
0.012
13.730
0.123
0.012
10.140
Part-time Worker
0.183
0.011
17.060
0.198
0.011
17.230
Lives in the North-East
0.061
0.011
5.610
0.050
0.011
4.410
Lives in the South
-0.024
0.010
-2.360
-0.019
0.011
-1.760
Lives in the West
0.013
0.011
1.180
0.017
0.011
1.500
Private Sector Worker
0.013
0.011
1.220
0.036
0.012
3.100
Uses Internet at Home
0.097
0.012
7.740
0.116
0.010
11.650
Use Int. @Home*Use Int. @Work
-0.028
0.022
-1.230
-0.001
0.020
-0.070
Table IV.B.3 Parameter estimates for the semi-log regression model, including Internet use at home
The coefficients that are important for this particular analysis are given in bold
characters. The results point out that the wage premium is greatest for those who use the
internet at work, followed at a considerable margin by those who use it at home, and with
a statistically insignificant negative wage premium by those who use the internet at both
work and home. This provides evidence that if our estimated coefficient for Internet use
24
is close to the true one, then it is Internet use at work that is rewarded and not some skill
that Internet users have. This particular analysis does not say anything about the
confounding that might be in place due to any omitted variables, but rather comments on
the meaning of the coefficient already obtained, be it biased or not.
V. Conclusion
25
The notion of Internet is readily associated with technology, change, and
productivity. In this framework of ideas, Internet use at work is expected to have an
important, positive impact on hourly wages. This paper provides empirical evidence for
this claim obtained from the Current Population Survey Computer/Internet Use
Supplements of 1997 and 1998 . I conducted hypothesis testing and inquiries into the
economic importance of the Internet use at work impact on Wages. First, I have estimated
an Ordinary Least Squares (OLS) linear multivariate regression model to find the extent
of the effect of Internet use at work on wages, after controlling for a number of variables
that are generally perceived to have an impact on wages. I had to turn to a different
functional form, because the model under this functional form did not satisfy the
requirements of the Standard Econometric Gaussian Error Box Model. Namely, the
results were affected by heteroscedasticity. I have thus estimated an OLS semi-log
functional form multivariate regression model. The regression estimate of the model finds
a statistically significant relationship between Internet use at work and wages. Also, the
estimate has economic importance, as I have found the differential in hourly pay
depending on Internet use at work to be around 20 percent.
Next, I am addressing the issue of weather it is Internet use at work and not some
other skill associated with Internet use in general that is rewarded. Here, again, I estimate
an OLS semi-log functional form multivariate regression model including two new
independent variables for Internet use at home. I find Internet use at home to have a
statistically significant effect on wages as well, but the coefficient estimate is
considerably lower than the coefficient for Internet use at work. Therefore, I conclude
26
that biased or not, the coefficient for Internet use at work stands for returns on the
Internet use at work and not on skills related with Internet use in general.
A main concern in my analysis is the presence of heteroscedasticity. Although I
have been able to reduce its extent by turning to a different functional form, it is still
present in the model that I am using to do hypothesis testing. An improvement of my
analysis would thus be eliminating heteroscedasticity and commenting on the new
estimates.
Although I have found statistically and economically significant relationships
between Internet use at work and wages, I can not make any significant claims about the
validity of the technology-based hypothesis of increasing wage inequality in the 1990's. I
have merely shown that Internet use at work positively influences wages. Weather the
great technology increases of the 1990's have increased the wage gap is to be seen in a
much broader, more inclusive study.
27
Bibliography
Krueger, Alan B., “How Computers have Changed the Wage Structure: Evidence form
Microdata, 1985-1989,” Quarterly Journal of Economics, February 1993, 108:3360
DiNardo, John., Pischke, Jörn-Steffen. “The returns on computer use revisited: have
pencils changed the wage structure too?” Quarterly Journal of Economics, February
1997, 112:291-303
Handel, Michael J. “Computers and the Wage Structure” Working Paper No. 285: The
Jerome Levy Economics Institute, October 1999
Bernstein, Jared., Mishel, Lawrence. “Has Wage Inequality Stopped Growing?”, Monthly
Labor Review, December 1997, 3-16
Bound, John., Johnson, George. “Changes in the Structure of Wages in the 1980’s: An
Evaluation of Alternative Explanations” The American Economic Review, June
1992, 82:371-392
Machin, Stephen., Van Reenen, John. “Technology and Changes in Skill Structure:
Evidence From Seven OECD Countries” The Quarterly Journal of Economics,
November 1998, 1251-1244
28
Download