SW 8 part 1

advertisement
Nonlinear Regression Functions
(SW Chapter 8)
Outline
1. Nonlinear regression functions – general comments
2. Nonlinear functions of one variable
3. Nonlinear functions of two variables: interactions
1
The TestScore – STR relation looks
linear (maybe)…
2
But the TestScore – Income relation
looks nonlinear...
3
The general nonlinear population
regression function
Yi = f(X1i, X2i,…, Xki) + ui, i = 1,…, n
Assumptions
1. E(ui| X 1i,X2i,…,Xki) = 0 (same); implies that f is the
conditional expectation of Y given the X’s.
2. (X1i,…,Xki,Yi) are i.i.d. (same).
3. Big outliers are rare (same idea ; the precise mathematical
condition depends on the specific f).
4. No perfect multicollinearity (same idea; the precise statement
depends on the specific f).
4
5
Nonlinear Functions of a Single
Independent Variable (SW Section 8.2)
We’ll look at two complementary approaches:
1. Polynomials in X
2. Logarithmic transformations
6
1. Polynomials in X
Approximate the population regression function by a polynomial:
Yi = b0 + b1Xi + b2 X i2 +…+ br X ir + ui
· Still linear in the parameters?
· Therefore …
7
Example: TestScore vs. Income
Incomei = average district income in the ith district
(thousands of dollars per capita)
TestScorei = b0 + b1Incomei + b2(Incomei)2 + ui
TestScorei = b0 + b1Incomei + b2(Incomei)2 + b3(Incomei)3 + ui
8
Estimation in STATA
generate avginc2 = avginc*avginc;
reg testscr avginc avginc2, r;
Regression with robust standard errors
Create a new regressor
Number of obs
F( 2,
417)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
428.52
0.0000
0.5562
12.724
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------avginc |
3.850995
.2680941
14.36
0.000
3.32401
4.377979
avginc2 | -.0423085
.0047803
-8.85
0.000
-.051705
-.0329119
_cons |
607.3017
2.901754
209.29
0.000
601.5978
613.0056
------------------------------------------------------------------------------
9
Interpretation
. graph twoway scatter testscr avginc || connected yhat avginc, sort
msymbol(none) || connected yhat2 avginc, sort msymbol(none)
See Chapter 3 of Statistics with Stata, especially pages 79 & 118
10
Interpretation
(b) Compute “effects” for different values of X
!
TestScore
= 607.3 + 3.85Incomei – 0.0423(Incomei)2
(2.9) (0.27)
(0.0048)
Predicted change in TestScore for a change in income from
$5,000 per capita to $6,000 per capita:
!
DTestScore
= 607.3 + 3.85*6 – 0.0423*62
- (607.3 + 3.85*5 – 0.0423*52)
= 3.4
11
!
TestScore
= 607.3 + 3.85Incomei – 0.0423(Incomei)2
Predicted “effects” for different values of X:
Change in Income ($1000 per capita)
from 5 to 6
from 25 to 26
from 45 to 46
!
DTestScore
3.4
1.7
0.0
How does the effect change as income increases?
perhaps, a declining marginal benefit of an increase in school
budgets?
Caution! What is the effect of a change from 65 to 66?
Don’t extrapolate outside the range of the data!
12
Interpretation
13
Marginal effects in STATA
14
Estimation of a cubic in STATA
test avginc2 avginc3;
( 1)
( 2)
Execute the test command after running the regression
avginc2 = 0.0
avginc3 = 0.0
F( 2,
416) =
37.69
Prob > F =
0.0000
Write down H0 and H1 … conclusion?
15
Plotting a cubic in STATA
. reg testscr avginc avginc2 avginc3, rob
. predict yhat3
(option xb assumed; fitted values)
. graph twoway scatter testscr avginc || connected yhat2 avginc, sort
msymbol(none) || connected yhat3 avginc, sort msymbol(T)
16
Marginal effects in STATA
17
Marginal effects in STATA
18
Ramsey’s RESET Test:
REgression Specification Error Test
• Consider the model (1) Yi = b0 + b1X1i + ...+ bk X ki + ui
• General test for misspecification of functional form
• If LSA #1 holds, then no non-linear function of the X’s
should be significant when added to the model.
• Consider (2) Yi = b0 + b1X1i + ...+ bk X ki + d0Yˆi + d1Yˆi + ...+ ui
2
3
• Null hypothesis is that (1) is correctly specified
• How many powers of predicted values to include?
• Conduct F-test on powers of predicted values
•J.B. Ramsey (1969), Tests for Specification Error in Classical Linear Least Squares
Regression Analysis. Journal of the Royal Statistical Society, Series B 31, 350–371
19
Ramsey’s RESET Test
. reg test str avginc, r
Linear regression
Number of obs
F( 2,
417)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
132.65
0.0000
0.5115
13.349
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.6487401
.3533403
-1.84
0.067
-1.34329
.04581
avginc |
1.839112
.114733
16.03
0.000
1.613585
2.064639
_cons |
638.7292
7.301234
87.48
0.000
624.3773
653.081
-----------------------------------------------------------------------------. estat ovtest
(can just type . ovtest)
Ramsey RESET test using powers of the fitted values of testscr
Ho: model has no omitted variables
F(3, 414) =
18.36
Prob > F =
0.0000
20
Ramsey’s RESET Test
. reg test str avginc avginc2, r
Linear regression
Number of obs
F( 3,
416)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
286.55
0.0000
0.5638
12.629
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.9099512
.3545374
-2.57
0.011
-1.606859
-.2130432
avginc |
3.881859
.2709564
14.33
0.000
3.349245
4.414474
avginc2 |
-.044157
.0049606
-8.90
0.000
-.053908
-.034406
_cons |
625.2308
7.087793
88.21
0.000
611.2984
639.1631
-----------------------------------------------------------------------------. estat ovtest
Ramsey RESET test using powers of the fitted values of testscr
Ho: model has no omitted variables
F(3, 413) =
2.48
Prob > F =
0.0605
21
Ramsey’s RESET Test
. reg test str avginc avginc2 avginc3, r
Linear regression
Number of obs
F( 4,
415)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
207.23
0.0000
0.5663
12.608
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.9277523
.3562919
-2.60
0.010
-1.628114
-.2273905
avginc |
5.124736
.7045403
7.27
0.000
3.739824
6.509649
avginc2 | -.1011073
.0287052
-3.52
0.000
-.157533
-.0446815
avginc3 |
.0007293
.0003414
2.14
0.033
.0000582
.0014003
_cons |
617.8974
7.926373
77.95
0.000
602.3165
633.4782
-----------------------------------------------------------------------------. estat ovtest
Ramsey RESET test using powers of the fitted values of testscr
Ho: model has no omitted variables
F(3, 412) =
1.79
Prob > F =
0.1490
22
Ramsey’s RESET Test
. reg test str el_pct meal_pct , r
Linear regression
Number of obs
F( 3,
416)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
453.48
0.0000
0.7745
9.0801
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.9983092
.2700799
-3.70
0.000
-1.529201
-.4674178
el_pct | -.1215733
.0328317
-3.70
0.000
-.18611
-.0570366
meal_pct | -.5473456
.0241072
-22.70
0.000
-.5947328
-.4999583
_cons |
700.15
5.56845
125.74
0.000
689.2042
711.0958
-----------------------------------------------------------------------------. estat ovtest
Ramsey RESET test using powers of the fitted values of testscr
Ho: model has no omitted variables
F(3, 413) =
6.29
Prob > F =
0.0004
23
Ramsey’s RESET Test
. reg test str el_pct meal_pct avginc
Linear regression
, r
Number of obs
F( 4,
415)
Prob > F
R-squared
Root MSE
=
=
=
=
=
420
467.42
0.0000
0.8053
8.4477
-----------------------------------------------------------------------------|
Robust
testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.5603892
.2550641
-2.20
0.029
-1.061768
-.0590105
el_pct | -.1943282
.0332445
-5.85
0.000
-.2596768
-.1289795
meal_pct | -.3963661
.0302302
-13.11
0.000
-.4557895
-.3369427
avginc |
.674984
.0837161
8.06
0.000
.5104236
.8395444
_cons |
675.6082
6.201865
108.94
0.000
663.4172
687.7992
-----------------------------------------------------------------------------. estat ovtest
Ramsey RESET test using powers of the fitted values of testscr
Ho: model has no omitted variables
F(3, 412) =
0.47
Prob > F =
0.7014
24
Ramsey’s RESET Test: replicated
. predict yh
(option xb assumed; fitted values)
. sum yh
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------yh |
420
654.1565
17.09817
614.9183
702.8387
. gen yhz = (yh-r(mean))/r(sd)
. sum yh*
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------yh |
420
654.1565
17.09817
614.9183
702.8387
yhz |
420
1.22e-09
1 -2.294882
2.847214
. gen yhz2=yhz*yhz
. gen yhz3=yhz^3
. gen yhz4=yhz^4
25
Ramsey’s RESET Test: replicated
. reg test str el meal avginc yhz2 yhz3 yhz4
Source |
SS
df
MS
-------------+-----------------------------Model | 122595.145
7 17513.5921
Residual | 29514.4488
412 71.6370116
-------------+-----------------------------Total | 152109.594
419 363.030056
Number of obs
F( 7,
412)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
420
244.48
0.0000
0.8060
0.8027
8.4639
-----------------------------------------------------------------------------testscr |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------str | -.5500585
.2336368
-2.35
0.019
-1.009327
-.0907896
el_pct | -.2170374
.0407058
-5.33
0.000
-.2970544
-.1370204
meal_pct |
-.400967
.0289303
-13.86
0.000
-.4578364
-.3440976
avginc |
.6476592
.1505253
4.30
0.000
.3517657
.9435527
yhz2 |
.7652051
.915534
0.84
0.404
-1.034495
2.564906
yhz3 | -.0822669
.3243362
-0.25
0.800
-.7198272
.5552933
yhz4 | -.0650369
.1767693
-0.37
0.713
-.412519
.2824453
_cons |
675.8077
5.443279
124.15
0.000
665.1076
686.5077
-----------------------------------------------------------------------------. test yhz2 yhz3 yhz4
( 1)
( 2)
( 3)
yhz2 = 0
yhz3 = 0
yhz4 = 0
F(
3,
412) =
Prob > F =
0.47
0.7014
26
2. Logarithmic functions of Y and/or X
27
The 3 log specifications
Case
I. linear-log
Population regression function
II. log-linear
ln(Yi) = b0 + b1Xi + ui
III. log-log
Yi = b0 + b1ln(Xi) + ui
ln(Yi) = b0 + b1ln(Xi) + ui
·
28
I. Linear-log population regression
function
29
Linear-log case, continued
30
Example: TestScore vs. ln(Income)
TestScore = 557.8 + 36.42*ln(Incomei)
(3.8) (1.40)
so a 1% increase in Income is associated with an increase in
TestScore of 0.36 points on the test.
· Standard errors, confidence intervals, R2 – all the usual tools of
regression apply here.
· How does this compare to the cubic model?
31
Linear-log vs. Cubic models
32
II. Log-linear population regression
function
33
Log-linear case, continued
34
III. Log-log population regression
function
35
Log-log case, continued
36
Example: ln( TestScore) vs. ln( Income)
·
ln(TestScore) = 6.336 + 0.0554*ln(Incomei)
(0.006) (0.0021)
An 1% increase in Income is associated with an increase of
.0554% in TestScore (Income up by a factor of 1.01,
TestScore up by a factor of 1.000554)
37
Example: ln( TestScore) vs. ln( Income),
ctd.
38
The log-linear and log-log specifications:
39
40
41
42
43
44
Other nonlinear functions (and
nonlinear least squares) (SW App. 8.1)
The foregoing nonlinear regression functions have flaws…
· Polynomial …
· Linear-log …
· How about a nonlinear function such as
Y = b 0 - a e- b1 X
b0, b1, and a are unknown parameters. This is called a
negative exponential growth curve
45
Negative exponential growth
We want to estimate the parameters of,
Yi = b 0 - a e- b1 X i + ui
or
Yi = b 0 éë1 - e - b1 ( X i - b 2 ) ùû + ui
(*)
Compare model (*) to linear-log or cubic models:
Yi = b0 + b1ln(Xi) + ui
Yi = b0 + b1Xi + b2 X i2 + b2 X i3 + ui
The linear-log and polynomial models are linear in the
parameters b 0 and b 1 – but the model (*) is not.
46
Nonlinear Least Squares
47
48
Negative exponential growth; RMSE = 12.675
Linear-log; RMSE = 12.618 (oh well…)
49
Download