Prediction, Goodness-of-Fit, and Modeling Issues

advertisement
Prediction, Goodness-of-Fit, and Modeling Issues
Prepared by Vera Tabakova, East Carolina University

4.1 Least Squares Prediction

4.2 Measuring Goodness-of-Fit

4.3 Modeling Issues

4.4 Log-Linear Models
Principles of Econometrics, 3rd Edition
Slide 4-2
y0  β1  β 2 x0  e0
(4.1)
where e0 is a random error. We assume that E  y0   1  2 x0 and
E  e0   0 . We also assume that var  e0   2 and
cov  e0 , ei   0 i  1,2,
,N
ŷ0  b1  b2 x0
Principles of Econometrics, 3rd Edition
(4.2)
Slide 4-3
Figure 4.1 A point prediction
Principles of Econometrics, 3rd Edition
Slide 4-4
f  y0  yˆ0  1  2 x0  e0    b1  b2 x0 
(4.3)
E  f   1  2 x0  E  e0    E  b1   E  b2  x0 
 1  2 x0  0  1  2 x0   0
2


(
x

x
)
1
2
0
var( f )   1  
2 
N
(
x

x
)
 i


Principles of Econometrics, 3rd Edition
(4.4)
Slide 4-5
The variance of the forecast error is smaller when
i.
the overall uncertainty in the model is smaller, as measured
by the variance of the random errors ;
ii. the sample size N is larger;
iii. the variation in the explanatory variable is larger; and
iv. the value of (xo – x)2 is small.
Principles of Econometrics, 3rd Edition
Slide 4-6
2


(
x

x
)
1
2
0
var( f )  ˆ 1  
2 
N
(
x

x
)
 i


^
^
^
se  f   var  f 
yˆ0  tc se  f 
Principles of Econometrics, 3rd Edition
(4.5)
(4.6)
Slide 4-7
y
Figure 4.2 Point and interval prediction
Principles of Econometrics, 3rd Edition
Slide 4-8
yˆ 0  b1  b2 x0  83.4160  10.2096(20)  287.6089
2


(
x

x
)
1
2
0
var( f )  ˆ 1  
2 
N
(
x

x
)
 i


ˆ 2
 ˆ 
 ( x0  x ) 2
N
2
ˆ 2
2
(
x

x
)
 i
2
ˆ

 ˆ 2 
 ( x0  x ) 2 var  b2 
N
yˆ0  tc se(f )  287.6069  2.0244(90.6328)  104.1323,471.0854
Principles of Econometrics, 3rd Edition
Slide 4-9
yi  1  2 xi  ei
(4.7)
yi  E ( yi )  ei
(4.8)
yi  yˆ i  eˆi
yi  y  ( yˆi  y )  eˆi
Principles of Econometrics, 3rd Edition
(4.9)
(4.10)
Slide 4-10
Figure 4.3 Explained and unexplained components of yi
Principles of Econometrics, 3rd Edition
Slide 4-11
yi  y 


2
ˆ 
y
2
N 1
2
2
2
ˆ
ˆ
(
y

y
)

(
y

y
)

e
 i
 i
i
Principles of Econometrics, 3rd Edition
(4.11)
Slide 4-12

 ( yi  y )2 = total sum of squares = SST: a measure of total variation in y about the
sample mean.

 ( yˆi  y )2 = sum of squares due to the regression = SSR: that part of total variation
in y, about the sample mean, that is explained by, or due to, the regression. Also
known as the “explained sum of squares.”

 eˆi2 = sum of squares due to error = SSE: that part of total variation in y about its
mean that is not explained by the regression. Also known as the unexplained sum of
squares, the residual sum of squares, or the sum of squared errors.

SST = SSR + SSE
Principles of Econometrics, 3rd Edition
Slide 4-13
SSR
SSE
R 
 1
SST
SST
2

(4.12)
The closer R2 is to one, the closer the sample values yi are to the fitted regression
equation yˆi  b1  b2 xi . If R2= 1, then all the sample data fall exactly on the fitted
least squares line, so SSE = 0, and the model fits the data “perfectly.” If the sample
data for y and x are uncorrelated and show no linear association, then the least
squares fitted line is “horizontal,” so that SSR = 0 and R2 = 0.
Principles of Econometrics, 3rd Edition
Slide 4-14
 xy 
 xy
cov( x, y )

var( x) var( y )  x  y
rxy 
(4.13)
ˆ xy
ˆ x ˆ y
(4.14)
ˆ xy   ( xi  x )( yi  y )  N  1
ˆ x 
 ( xi  x )2  N  1
ˆ y 
 ( yi  y )2  N  1
Principles of Econometrics, 3rd Edition
(4.15)
Slide 4-15
rxy2  R 2
R 2  ryy2ˆ
2
R measures the linear association, or goodness-of-fit, between the sample data
2
and their predicted values. Consequently R is sometimes called a measure of
“goodness-of-fit.”
Principles of Econometrics, 3rd Edition
Slide 4-16
SST    yi  y   495132.160
2
SSE    yi  yˆi    eˆi2  304505.176
2
SSE
304505.176
R 1
1
 .385
SST
495132.160
2
rxy 
Principles of Econometrics, 3rd Edition
ˆ xy
ˆ x ˆ y

478.75
 .62
 6.848112.675
Slide 4-17
Figure 4.4 Plot of predicted y, ŷ against y
Principles of Econometrics, 3rd Edition
Slide 4-18

FOOD_EXP = weekly food expenditure by a household of size 3, in dollars

INCOME = weekly household income, in $100 units
FOOD_EXP = 83.42  10.21 INCOME
(se)
(43.41)* (2.09)***
*
indicates significant at the 10% level
**
indicates significant at the 5% level
***
indicates significant at the 1% level
Principles of Econometrics, 3rd Edition
R 2  .385
Slide 4-19

4.3.1 The Effects of Scaling the Data

Changing the scale of x:
y  1  2 x  e =1  (c2 )( x / c)  e =1  *2 x*  e
where β*2  cβ 2 and x*  x c

Changing the scale of y:
y / c  (1 / c)  (2 / c) x  (e/c) or y*  1*  *2 x  e*
Principles of Econometrics, 3rd Edition
Slide 4-20
Variable transformations:

Power: if x is a variable then xp means raising the variable to the power p; examples
are quadratic (x2) and cubic (x3) transformations.

The natural logarithm: if x is a variable then its natural logarithm is ln(x).

The reciprocal: if x is a variable then its reciprocal is 1/x.
Principles of Econometrics, 3rd Edition
Slide 4-21
Figure 4.5 A nonlinear relationship between food expenditure and income
Principles of Econometrics, 3rd Edition
Slide 4-22

The log-log model
ln( y )  1  2 ln( x)
The parameter β is the elasticity of y with respect to x.

The log-linear model
ln( yi )  1  2 xi
A one-unit increase in x leads to (approximately) a 100×β2 percent change in y.

The linear-log model
y  1  2 ln  x  or

y
 2
100   x x  100
A 1% increase in x leads to a β2/100 unit change in y.
Principles of Econometrics, 3rd Edition
Slide 4-23

The reciprocal model is
FOOD _ EXP  1  2

1
e
INCOME
The linear-log model is
FOOD _ EXP  1  2 ln( INCOME )  e
Principles of Econometrics, 3rd Edition
Slide 4-24
Remark: Given this array of models, that involve different
transformations of the dependent and independent variables,
and some of which have similar shapes, what are some
guidelines for choosing a functional form?
1.
2.
3.
Choose a shape that is consistent with what economic
theory tells us about the relationship.
Choose a shape that is sufficiently flexible to “fit” the
data
Choose a shape so that assumptions SR1-SR6 are
satisfied, ensuring that the least squares estimators have
the desirable properties described in Chapters 2 and 3.
Principles of Econometrics, 3rd Edition
Slide 4-25
Figure 4.6 EViews output: residuals histogram and summary statistics for food expenditure example
Principles of Econometrics, 3rd Edition
Slide 4-26

The Jarque-Bera statistic is given by
N  2  K  3
JB   S 
6 
4
2




where N is the sample size, S is skewness, and K is kurtosis.

In the food expenditure example
 2.99  3
40 
JB   .097 2 
6 
4
Principles of Econometrics, 3rd Edition
2

  .063


Slide 4-27
Figure 4.7 Scatter plot of wheat yield over time
Principles of Econometrics, 3rd Edition
Slide 4-28
YIELDt  1  2TIMEt  et
YIELDt  .638  .0210 TIMEt
(se)
R 2  .649
(.064) (.0022)
Principles of Econometrics, 3rd Edition
Slide 4-29
Figure 4.8 Predicted, actual and residual values from straight line
Principles of Econometrics, 3rd Edition
Slide 4-30
Figure 4.9 Bar chart of residuals from straight line
Principles of Econometrics, 3rd Edition
Slide 4-31
YIELDt  1 2TIMEt3  et
TIMECUBE  TIME 3 1000000
YIELDt  0.874  9.68 TIMECUBEt
(se)
R 2  0.751
(.036) (.082)
Principles of Econometrics, 3rd Edition
Slide 4-32
Figure 4.10 Fitted, actual and residual values from equation with cubic term
Principles of Econometrics, 3rd Edition
Slide 4-33
Yield t = Yield0(1+g)t
ln YIELDt   ln YIELD0   ln 1  g  t
 1  2t
ln YIELDt   .3434  .0178t
(se)
(.0584)
Principles of Econometrics, 3rd Edition
(.0021)
Slide 4-34

4.4.2 A Wage Equation
ln WAGE   ln WAGE0   ln 1  r  EDUC
 1  2 EDUC
ln WAGE   .7884  .1038  EDUC
(se)
(.0849)
Principles of Econometrics, 3rd Edition
(.0063)
Slide 4-35

4.4.3 Prediction in the Log-Linear Model


yˆ n  exp ln  y   exp  b1  b2 x 
yˆc  E  y   exp  b1  b2 x  ˆ 2  yˆ ne
2
ˆ 2 2
ln WAGE   .7884  .1038  EDUC  .7884  .1038 12  2.0335
yˆc  E  y   yˆ ne
ˆ 2 2
 7.6408 1.1276  8.6161
Principles of Econometrics, 3rd Edition
Slide 4-36

4.4.4 A Generalized R2 Measure
R  corr  y, yˆ    ry2, yˆ
2
g
2
R  corr  y, yˆ c    .47392  .2246
2
g
2
R2 values tend to be small with microeconomic, cross-sectional data, because the
variations in individual behavior are difficult to fully explain.
Principles of Econometrics, 3rd Edition
Slide 4-37

4.4.5 Prediction Intervals in the Log-Linear Model

 

exp ln  y   t se  f  ,exp ln  y   t se  f  
c
c


exp  2.0335  1.96  .4905 ,exp  2.0335  1.96  .4905   2.9184,20.0046
Principles of Econometrics, 3rd Edition
Slide 4-38











coefficient of
determination
correlation
data scale
forecast error
forecast standard
error
functional form
goodness-of-fit
growth model
Jarque-Bera test
kurtosis
least squares
predictor
Principles of Econometrics, 3rd Edition











linear model
linear relationship
linear-log model
log-linear model
log-log model
log-normal
distribution
prediction
prediction interval
R2
residual
skewness
Slide 4-39
Principles of Econometrics, 3rd Edition
Slide 4-40
f  y0  yˆ0  1  2 x0  e0    b1  b2 x0 
var  yˆ 0   var  b1  b2 x0   var  b1   x02 var  b2   2 x0 cov  b1 , b2 

2  xi2
N   xi  x 
Principles of Econometrics, 3rd Edition
2
x
2
0
2
  xi  x 
2
 2 x0 2
x
  xi  x 
2
Slide 4-41
 2  x 2
  2 Nx 2
   2 x 2

 2  2 x0 x    2 Nx 2
i
0


var  yˆ 0   



2
2
2
2
2
 N   xi  x   N   xi  x      xi  x    xi  x   N   xi  x  
  xi2  Nx 2
x02  2 x0 x  x 2 
 

2
2 
 N   xi  x 
  xi  x  
2
2
2


x

x
x

x





2
i
0
 

2
2
N
x

x
x

x
   i  
   i
2
1
x0  x  

  
2
N
x

x




 i
2
Principles of Econometrics, 3rd Edition
Slide 4-42
f
~ N (0,1)
var( f )
 1
( x0  x )2 
var  f   ˆ 1  
2
N
(
x

x
)
 i


2
y0  yˆ0

~ t( N 2)
var  f  se( f )
f
P (tc  t  tc )  1  
Principles of Econometrics, 3rd Edition
(4A.1)
(4A.2)
Slide 4-43
P[tc 
y0  yˆ 0
 tc ]  1  
se( f )
P  yˆ0  tcse(f )  y0  yˆ0  tcse(f )  1  
Principles of Econometrics, 3rd Edition
Slide 4-44
 yi  y 
2
 ( yˆi  y )  eˆi   ( yˆi  y )2  eˆi2  2( yˆi  y )eˆi
2
 yi  y 
2
  ( yˆi  y )2   eˆi2  2 ( yˆi  y )eˆi
  yˆi  y  eˆi   yˆi eˆi  y  eˆi    b1  b2 xi  eˆi  y  eˆi
 b1  eˆi  b2  xi eˆi  y  eˆi
Principles of Econometrics, 3rd Edition
Slide 4-45
 eˆi    yi  b1  b2 xi    yi  Nb1  b2  xi  0
 xi eˆi   xi  yi  b1  b2 xi    xi yi  b1  xi  b2  xi2  0
  yˆi  y  eˆi  0
If the model contains an intercept it is guaranteed that SST = SSR + SSE.
If, however, the model does not contain an intercept, then  eˆi  0 and
SST ≠ SSR + SSE.
Principles of Econometrics, 3rd Edition
Slide 4-46
Suppose that the variable y has a normal distribution, with mean μ and variance σ2.
2
y
If we consider w  e then y  ln  w ~ N ,  is said to have a log-normal
distribution.

E  w  e
var  w  e
Principles of Econometrics, 3rd Edition

2 2
2 2
e
2

1
Slide 4-47
Given the log-linear model ln  y   1  2 x  e
2
If we assume that e ~ N  0,  




E  yi   E e1 2 xi ei  E e1 2 xi e ei 
e
1 2 xi
 e
E e
ei
1 2 xi 2 2
e
e
E  yi   eb1 b2 xi ˆ
Principles of Econometrics, 3rd Edition
2
1  2 xi  2 2
2
Slide 4-48
The growth and wage equations:
2  ln 1  r 
and
r  e2  1

b2 ~ N 2 , var  b2   
E eb2   e2 varb2 /2
var  b2   ˆ 2
Principles of Econometrics, 3rd Edition
2
  xi  x 
2

rˆ  eb2 varb2 /2  1
  xi  x 
2
Slide 4-49
Download