4. Nonlinear regression functions

advertisement
4. Nonlinear regression functions
Up to now:
• Population regression function was assumed to be linear
• The slope(s) of the population regression function is (are)
constant
• The effect on Y of a unit-change in the regressor Xj (j =
1, . . . , k) does not depend on the value of Xj
Now:
• Two groups of methods for detecting and modeling nonlinear
population regression functions
72
Group #1:
• Effect on Y of a change in one regressor, say X1, depends
on the value of X1 itself
• Example: reducing class size by one student per teacher (that
is a change in STR) might have a larger effect on TEST SCORE
when class sizes are already managebly small
Group #2:
• Effect on Y of a change in one regressor, say X1, depends
on the value of another regressor, say X2
• Example: students still learning English might benefit from
having more one-on-one attention
• Effect on TEST SCORE of reducing STR is greater in districts
with higher values of PCTEL
73
Population regression functions with different slopes
Slope depends on the value of X1
Y
Y
Constant slope
X1
X1
Slope depends on the value of X2
Y
Population regression
function when X2=1
Population regression
function when X2=0
X1
74
4.1. A general strategy for modeling nonlinear
regression functions
Empirical example:
• Consider the student-performance dataset
• Generally, we would expect that the economic background
of the students might have an impact on TEST SCORES
(’rich’ students perform better than ’poor’ students)
• The economic background is measured by the variable AVGINC
(average per capita income in the school district in thousands
1998 US-dollars)
75
Test score vs. district income with a linear OLS regression function
720
700
Test score
680
660
640
620
600
0
10
20
30
40
50
District income (thousands of dollars)
60
76
Scatterplot characteristics:
• The variable AVGINC and TEST SCORE are highly correlated
(correlation coefficient: 0.71)
• For incomes below 10000 US-$ or above 40000 US-$ the
points are below the OLS-line
• For incomes between 15000 US-$ and 30000 US-$ the points
are above the OLS-line
−→ nonlinear relationship between TEST SCORE and AVGINC
Possibly:
• Quadratic relationship between both variables:
TEST SCOREi = β0 + β1 · AVGINCi + β2 · AVGINC2
i + ui
(quadratic regression model)
(4.1)
77
Estimation of model (4.1):
• Eq. (4.1) is a variant of the multiple regression model
Yi = β0 + β1 · X1i + . . . + βk · Xki + ui
with k = 2 and
X1i = AVGINCi
X2i = AVGINC2
i
−→ OLS estimation technique is applicable
• We can test the null hypothesis that the population regression function is linear versus the alternative that it is
quadratic by conducting the test
H0 : β2 = 0
vs.
H1 : β2 6= 0
on the basis of the conventional t-statictic
78
OLS estimation results of the quadratic model (4.1)
Dependent Variable: TEST_SCORE
Method: Least Squares
Date: 10/04/12 Time: 18:31
Sample: 1 420
Included observations: 420
White heteroskedasticity-consistent standard errors & covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
AVGINC
AVGINC_SQ
607.3017
3.850995
-0.042308
2.901754
0.268094
0.004780
209.2878
14.36434
-8.850509
0.0000
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.556173
0.554045
12.72381
67510.32
-1662.708
261.2778
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
654.1565
19.05335
7.931944
7.960803
7.943350
0.951439
79
Test scores vs. district income with a quadratic OLS regression function
720
700
Test score
680
660
640
620
600
0
10
20
30
40
50
District income (thousands of dollars)
60
80
Obviously:
• β2 is significantly different from zero at all conventional levels
−→ quadratic model fits the data better than the linear model
Next:
• Consider the general nonlinear regression model
Yi = f (X1i, X2i, . . . , Xki) + ui
(i = 1, . . . , n)
(4.2)
where f (X1i, X2i, . . . , Xki) is a general nonlinear population
regression function
• Under the OLS assumptions on Slide 18 we have
E(Yi|X1i, X2i, . . . , Xki) = f (X1i, X2i, . . . , Xki)
81
Question:
• What is the expected effect on Y of a change in one regressor, say of a change ∆Xj in Xj (j = 1, . . . , k)?
Answer:
• The expected change in Y, ∆Y , associated with a change in
the regressor Xj , ∆Xj , holding all other regressors constant,
is the difference between the value of the population regression function before and after changing Xj , holding all other
regressors constant:
∆Y = f (X1, . . . , Xj + ∆Xj , . . . , Xk ) − f (X1, . . . , Xj , . . . , Xk )
(4.3)
82
Remarks:
• Note that the specific parametric form of f (X1, X2, . . . , Xk )
is unknown
• f (X1, X2, . . . , Xk ) contains unknown parameters that have to
estimated from the data
• Let fˆ(X1, X2, . . . , Xk ) denote the predicted value of Y based
on the estimator fˆ of the population regression function
• Then, the predicted change in Y is
∆Ŷ = fˆ(X1, . . . , Xj + ∆Xj , . . . , Xk ) − fˆ(X1, . . . , Xj , . . . , Xk )
(4.4)
83
Example:
• Consider the quadratic OLS regression of TEST SCORE on AVGINC and AVGINC2 on Slide 79 with the estimated coefficients
β̂0 = 607.3017,
β̂1 = 3.8510,
β̂2 = −0.0423
• An increase in district income from 10 to 11 (i.e. from 10000
US-$ per capita to 11000 US-$) yields the estimated effect
∆Ŷ = (β̂0 + β̂1 ·11+ β̂2 ·112)−(β̂0 + β̂1 ·10+ β̂2 ·102) = 2.9627
• An increase in district income from 40 to 41 (i.e. from 40000
US-$ per capita to 41000 US-$) yields the estimated effect
∆Ŷ = (β̂0 + β̂1 ·41+ β̂2 ·412)−(β̂0 + β̂1 ·40+ β̂2 ·402) = 0.4247
84
Example: [continued]
• Obviously, a change of income of 1000 US-$ is associated
with a larger change in predicted test scores if the initial
income is low (10000 US-$) than if it is high (40000 US-$)
• The predicted changes are 2.9627 points versus 0.4247 points
Remarks:
• The estimator ∆Ŷ of the effect on Y of changing the regressor Xj depends on the estimator of the population regression
function, fˆ, which varies from one sample to the next
−→ ∆Ŷ contains sampling error
• There are several techniques for computing the standard error SE(∆Ŷ )
(see Stock & Watson, 2011, pp. 302, 303)
85
Strategy for modeling nonlinear regressions:
1. Identify a possible nonlinear relationship
(use economic theory and general knowledge)
2. Specify a nonlinear function and estimate parameters by OLS
(see next section for various nonlinear functions)
3. Check if the nonlinear model improves upon a linear model
(use t- and F -statistics)
4. Plot the estimated nonlinear regression function
5. Estimate the effect on Y of a change in the regressor Xj
86
4.2. Nonlinear functions of a single regressor
Outline:
• Description of most important nonlinear regression functions
(polynomials and logarithms)
• We restrict attention to regressions with a single regressor
• Extensions to multiple regressors are straightforward
• We treat the alternative nonlinear functions separately, although it is unproblematic to combine them in one regression
function
87
4.2.1. Polynomials
Definition 4.1: (Polynomial regression model)
We define the general polynomial regression model of degree r
as
(4.5)
Yi = β0 + β1 · Xi + β2 · Xi2 + . . . + βr · Xir + ui.
When r = 2 or r = 3, we call Eq. (4.5) the quadratic or the
cubic regression model, respectively.
Remarks:
• We interpret Xi, Xi2, . . . , Xir as the r distinct regressors X1i,
X2i, . . . , Xri
• We estimate the parameters β0, β1, . . . , βr via OLS by regressing Yi against Xi, Xi2, . . . , Xir
88
Test of linear versus polynomial specification:
• If the ’true’ regression is linear, then the terms Xi2, Xi3, . . . , Xir
do not enter the population regression function (4.5)
• Hypothesis test: H0: ’Regression is linear’ vs. H1: ’Regression is polynomial of degree r’
• In probabilistic terms:
H0 : β2 = 0, β3 = 0, . . . , βr = 0 vs.
H1 : at least one βj 6= 0 (j = 2 . . . , r)
• Use the F -testing strategy as decribed in Section 3.4 to test
this specification issue
89
Which polynomial degree?
• Trade-off between (1) flexibility in the shape of the regression
function and (2) the precision of estimated coefficients
• Include just enough polynomial terms to model the nonlinear
regression function adequately, but no more
Sequential testing procedure:
1. Pick a maximum value of r and perform OLS estimation
(try a maximal r = 4)
2. Use the t-statistic to test H0 : βr = 0 vs. H1 : βr 6= 0
3. If you reject H0 in Step 2 use the polynomial of degree r and
stop the procedure
90
Sequential testing procedure: [continued]
4. If you do not reject H0 in Step 2, eliminate the Xir -term from
the regression and estimate a polynomial of degree r −1. Use
the t-statistic to test H0 : βr−1 = 0 vs. H1 : βr−1 6= 0
5. If you reject H0 in Step 4, use the polynomial of degree r − 1
and stop the procedure
6. If you do not reject H0 in Step 4, continue this procedure
until the coefficient on the highest power in your polynomial
is statistically significant
91
4.2.2. Logarithms
Natural logarithm [ln(x)]:
• ln(x) is the most important nonlinear function in economics
• Logs convert changes in variables into percentage changes
Logs and pecentages:
• Consider a variable x and a small change in x, ∆x
• The percentage change in x is given by 100 · (∆x/x)
• The following approximation holds:
∆x
x
(difference in logs approximates the percentage change devided by 100)
ln(x + ∆x) − ln(x) ≈
92
Definition 4.2: (Logarithmic regression models)
We consider the following three types of regression models:
Yi = β0 + β1 · ln(Xi) + ui,
ln(Yi) = β0 + β1 · Xi + ui,
ln(Yi) = β0 + β1 · ln(Xi) + ui.
(4.6)
(4.7)
(4.8)
We refer to the models (4.6) – (4.8) as the linear-log, the loglinear and the log-log model, respectively.
Remarks:
• The regression models (4.6) – (4.8) are conventional regression models with a single regressor
−→ OLS estimation technique applies
(provided that the OLS assumptions are satisfied)
93
Remarks: [continued]
• The three models (4.6) – (4.8) differ in their interpretation
of the coefficient β1
Interpretation of β1:
• In the linear-log model (4.6) a 1%-change in X is associated
with a change in Y of 0.01β1
• In the log-linear model (4.7) a change in X by one unit
(∆X = 1) is associated with a 100β1%-change in Y
• In the log-log model (4.8) a 1%-change in X is associated
with a β1% change in Y , that is β1 is the elasticity of Y with
respect to X
(see class for details)
94
Remarks:
• Which of the log regression models best fits the data?
• Only the log-linear and the log-log models (4.7) and (4.8)
can be compared via their R̄2 values
• The linear-log model (4.6) cannot be compared with the
other log models via the R̄2 values since the dependent variables are different (Yi vs. ln(Yi))
−→ Use economic theory and other expert knowledge of the specific data problem at hand to decide whether it makes sense
to specify Y in logarithms
95
4.3. Interactions between regressors
Up to now:
• Nonlinear relationship between Y and the regressor X depends on the values of the regressor X itself
Now:
• The effect on Y of a change in one regressor, say X1, depends
on the value of another regressor, say X2
−→ Interactions between the regressors
96
4.3.1. Interactions between two dummy regressors
Definition 4.3: (Dummy variables)
We consider a potential regressor that may indicate the presence
or absence of a qualitative characteristic or an attribute (such
as male or female, catholic or non-catholic and so forth). We
quantify such attributes by constructing artificial variables of the
form
Di =
(
1 if the attribute is present for the ith observation
0 if the attribute is not present for the ith observation
for i = 1, . . . , n. We call variables like Di dummy variables (or
binary or indicator variables).
97
Remarks:
• We have already made use of dummy variables on the Slides
34–36
• Dummies are essentially nominal scale (qualitative) variables
that have been quantified
• Note that a dummy variable can only assume the two values
0 and 1
• Dummy regressors as specified in Definition 4.3 can be incorporated in regression models just as easily as any other
quantitative (continuous) regressor
98
Consider the following empirical problem:
• Assume you have a data set containing the dependent variable (log) earnings, that is
Yi = ln(EARNINGSi)
(i = 1, . . . , n)
and the two dummy variables
D1i =
(
1 if the ith worker has a college degree
0 otherwise
D2i =
(
1 if the ith worker is female
0 otherwise
• You aim at analyzing the effects of a worker’s schooling (college degree or not) and the worker’s gender (female or male)
on the worker’s earnings
99
Empirical problem: [continued]
• Consider the intuitive regression model
Yi = β0 + β1 · D1i + β2 · D2i + ui
(4.9)
• Interpretation of parameters:
β1 is the effect on (log) earnings of having a college degree
holding gender constant
β2 is the effect on (log) earnings of being female holding
schooling constant
• The limitation of this model is that the effect on earnings of
having a college degree is the same for men and women
100
Removal of this limitation:
• Augmenting the regression model (4.9) by the interaction
term (D1i × D2i):
Yi = β0 + β1 · D1i + β2 · D2i + β3 · (D1i × D2i) + ui
(4.10)
• The interaction term (D1i × D2i) in (4.10) allows the population effect on log earnings (Yi) of having a college degree
(that is changing D1i from D1i = 0 to D1i = 1) to depend
on gender D2i
Mathematical background:
• Use Formula (4.3) on Slide 82 to compute the expected
effect on Y , ∆Y , resulting from a change in D1i from 0 to
1 given the fixed value d2 for D2i
101
Mathematical background: [continued]
• We have
E(Yi|D1i = 0, D2i = d2) = β0 + β1 × 0 + β2 × d2 + β3 × (0 × d2)
= β0 + β2 × d2
and
E(Yi|D1i = 1, D2i = d2) = β0 + β1 × 1 + β2 × d2 + β3 × (1 × d2)
= β0 + β1 + β2 × d2 + β3 × d2
• This yields the expected effect on Y :
∆Y
= E(Yi|D1i = 1, D2i = d2) − E(Yi|D1i = 0, D2i = d2)
= β0 + β1 + β2 × d2 + β3 × d2 − β0 − β2 × d2
= β1 + β3 × d2
(4.11)
102
Interpretation of (4.11):
• The expected effect of acquiring a college degree (that is a
unit change in D1i) depends on the person’s gender:
∆Y =
(
if the worker is male (d2 = 0)
β1
β1 + β3 if the worker is female (d2 = 1)
−→ The coefficient β3 on the interaction term (D1i × D2i) in
regression (4.10) is the difference in the effect of acquiring
a college degree for women versus men
Empirical exercise:
• Interaction between the student-teacher ratio and the percentage of English learners in a dummy regression model of
the form (4.10)
(see class)
103
4.3.2. Interactions between a continuous and a
dummy regressor
Consider the following data set:
• The dependent variable is (log) earnings, that is
Yi = ln(EARNINGSi)
(i = 1, . . . , n)
• We consider the dummy regressor
Di =
(
1 if the ith worker has a college degree
0 otherwise
• We consider the continuous regressor
Xi = individual’s years of work experience
104
Three alternative regression models:
• Specification with both regressors, no interaction term
Yi = β0 + β1 · Xi + β2 · Di + ui
(4.12)
• Specification with both regressors plus interaction term
Yi = β0 + β1 · Xi + β2 · Di + β3 · (Xi × Di) + ui
(4.13)
• Specification with continuous regressor plus interaction term
Yi = β0 + β1 · Xi + β2 · (Xi × Di) + ui
(4.14)
Question:
• For each of the specifications (4.12) – (4.14), what are the
expected effects on (log) earnings (Yi) of having a college
degree (that is from changing Di from Di = 0 to Di = 1)?
105
Expected effects:
• For specification (4.12) we have
∆Y
= E(Yi|Di = 1, Xi) − E(Yi|Di = 0, Xi)
= β0 + β1 · Xi + β2 − β0 − β1 · Xi
= β2
(4.15)
• By analogous calculations, we find for the specifications (4.13)
and (4.14)
∆Y = β2 + β3 · Xi
(4.16)
∆Y = β2 · Xi
(4.17)
and
106
Remarks:
• The effects ∆Y = E(Yi|Di = 1, Xi) − E(Yi|Di = 0, Xi) computed in (4.15) – (4.17) can be interpreted as differences in
the two population regression functions associated with the
two values of the dummy regressor Di = 1 and Di = 0
• For specification (4.12) this difference is constantly equal to
β2 producing two population regression lines with different
intercepts and the same slope
• By analogous reasoning, the specifications (4.13) and (4.14)
produce population regression lines with (a) different intercepts and different slopes and with (b) the same intercept
and different slopes
(see the figure on Slide 108)
107
Regression functions using dummy and continuous regressors
108
Interpretation of coefficients:
• How can we interpret the specific regression coefficients involved in the specifications (4.12) – (4.14)
−→ See class
Empirical exercise:
• Application to the student-teacher ratio and the percentage
of English learners (see class)
109
4.3.3. Interactions between two continuous regressors
Consider the following data set:
• The dependent variable is (log) earnings, that is
Yi = ln(EARNINGSi)
(i = 1, . . . , n)
• We consider the two continuous regressors
X1i = individual’s years of work experience
X2i = individual’s years he or she went to school
• Regression specification:
Yi = β0 + β1 · X1i + β2 · X2i + β3 · (X1i × X2i) + ui
(4.18)
110
Expected effects on Y :
• In (4.18), a change in X1 by ∆X1 (holding X2 constant)
leads to
∆Y = (β1 + β3 · X2)∆X1
(4.19)
−→ The effect on Y of a change in X1 by ∆X1 depends on
the value of X2
• Analogously, we find that the effect on Y of a change in X2
by ∆X2 (holding X1 constant) depends on the value of X1:
∆Y = (β2 + β3 · X1)∆X2
(4.20)
111
Expected effects on Y : [continued]
• We now consider a simultaneous change in X1 by ∆X1 and,
at the same time, in X2 by ∆X2
• We then find that the expected change in Y is given by
∆Y
= (β1 + β3 · X2)∆X1 + (β2 + β3 · X1)∆X2
+ β3∆X1∆X2
(4.21)
• The first term in (4.21) is the effect from changing X1 holding X2 constant
• The second term in (4.21) is the effect from changing X2
holding X1 constant
• The final term, β3∆X1∆X2, in (4.21) is the extra effect from
changing both X1 and X2
112
Interactions in multiple regression: [Summary]
• The interaction term between the regressors X1 and X2 is
their product X1 × X2
• Including the interaction term allows the effect on Y of a
change in X1 to depend on the value of X2 and, conversely,
allows the effect of a change in X2 to depend on the value
of X1
• The coefficient on X1 ×X2 is the effect of a one-unit increase
in X1 and X2, above and beyond the sum of the individual
effects of a unit increase in X1 alone and a unit increase in
X2 alone
• This is true irrespective of whether X1 and/or, X2 are continuous or dummy regressors
113
Download