Document

advertisement
2.1
Week 2
Two-variable
Regression
Analysis
2.2
Purpose of Regression Analysis
1. Estimate a relationship among economic
variables, such as Y = f(X).
2. Forecast or predict the value of one
variable, Y, based on the value of
another variable, X.
Weekly Food Expenditures
2.3
Y = dollars spent each week on food items.
X = consumer’s family weekly income.
The relationship between X and the expected value
of Y , given X, might be linear:
P(Y|X) = E(Y|Xi) = f(Xi) = 1 + 2 Xi
Means that each conditional mean E(Y|Xi) is a
function of Xi, this equation is known as
the population regression function (PRF)
2.4
Elucidating “conditional distribution”
Two-variable case is easy to draw, yet illustrates basic
concepts (as we saw in Galton-Pearson example)
(Galton (1886): although there was a tendency for tall parents to have tall children an
for short parents to have short children, the average height of children born of parents of
a given height tended to move or “regress” to ward the average height in the population
as a whole. Pearson (1903) collects more than 1000 records and confirms the claim
(saying tis was “regression to mediocrity”)
Example of population of 60 families - relationship between
weekly consumption and family income
Table 2.1 illustrates conditional distribution of
consumption given family income (in each column)
Conditional mean E(Y|X=80) = 65 (Table 2.2)
The population regression line plot the conditional mean as a
function of X (Figures 2.1 & 2.2)
2.5
Elucidating “conditional distribution”
Table 2.1 (and 2.2)
income---->
80
100 120 140 160 180 200 220 240 260
55
65
79
80 102 110 120 135 137 150
weekly
60
70
84
93 107 115 136 137 145 152
family
65
74
90
95 110 120 140 140 155 175
consmption
70
80
94 103 116 130 144 152 165 178
expenditure 75
85
98 108 118 135 145 157 175 180
in dollars
88
113 125 140
160 189 185
115
162
191
Total
325 462 445 707 678 750 685 1043 966 1211
Cond. Mean 65
77
89 101 113 125 137 149 161 173
2.6
f(Y|X=80)
f(Y|X=80)
Y|X=80
Condition Probability Distribution f(Y|X=80)
of Food Expenditures if given income X=$80.
Y
f(Y|x)
f(Y|x=80)
Y|x=80
f(Y|x=100)
Y|x=100
Condition Probability Distribution of Food
Expenditures if given income X=$80 and X=$100.
2.7
Y
2.8
Y
Conditional means
E(Y|xi)
Population regression line
(PRF)
149
101
65
Distribution of Y
given X=220
X
80
140
220
2.9
Average
Consumption Y
E(Y|x)
E(Y|x)=1+2x
E(Y|X)
x
2=
E(Y|X)
X
slope
1{
Intercept
X (income)
The Econometric Model: a linear relationship
between average consumption and income.
PRF - population regression function
Conditional mean is a function of X
E(Y|Xi) = f(Xi) - can take any functional form
Linear PRF: E(Y|Xi) = 1 + 2Xi
Parameters 1 and 2 are not known
For each value of Xi , Y has a distribution as
shown in Figure 2.2, with a conditional mean
and variance
2.10
PRF - population regression
function Population of 60 families
2.11
weekly consumption
200
160
PRF: E(Y|X) = 17 + 0.6•X
120
80
40
Fig. 2- 60 80 100 120 140 160 180 200 220 240 260 280
weekly income
1
Linearity
2.12
Linear in the variables:
1. E(Y|Xi) = 1 + 2Xi is linear in the variables
2. E(Y|Xi) = 1 + 2Xi2 is not linear in the variables
Linear in the parameters:
2. E(Y|Xi) = 1 + 2Xi2 is linear in the parameters
3. E(Y|Xi) = 1 + 2Xi + 12Zi is not linear in the
parameters
Linearity in the parameters is what matters: only case 3
cannot be handled by the linear regression model (LRM).
Case 3 is an example of the nonlinear regression model (NLRM)
Stochastic Specification of PRF
2.13
Given any income level of Xi, an family’s consumption is
clustered around the average of all families at that Xi, that
is, around its conditional expectation, E(Y|Xi).
The deviation of any individual Yi is:
ui = Yi - E(Y|Xi)
or
or
Yi = E(Y|Xi) + u i
Yi = 1 + 2 X + u i
Shochastic error or
Stochastic disturbance
The Error Term
Y is a random variable composed of two parts:
I. Systematic component:
This is the mean of Y.
E(Y) = 1 + 2X
II. Random component:
u = Y - E(Y)
= Y - 1 - 2X
This is called the random or shochastic error.
Together
E(Y) and u form the model:
Y = 1 + 2X + u
2.14
For examples:
given X = $80, the individual consumption are
Y1 = 55 = 1 + 2 (80) + u 1
Y2 = 60 = 1 + 2 (80) + u 2
Y3 = 65 = 1 + 2 (80) + u 3
Y4 = 70 = 1 + 2 (80) + u 4
Y5 = 75 = 1 + 2 (80) + u 5
Estimated average:
^ = 65 = ^ + ^ (80)
Y
1
1
2
^ = 65 = ^ + ^ (80)
Y
2
1
2
^ = 65 = ^ + ^ (80)
Y
3
1
2
Y^4 = 65 = ^1 + ^2 (80)
Y^5 = 65 = ^1 + ^2 (80)
2.15
2.16
The reasons for stochastic disturbance
•
•
•
•
•
•
•
•
Vagueness of theory
Unavailability of data
Direct effect vs indirect effect
(Core variables vs peripheral variables)
Intrinsic randomness in human behaviour
Poor proxy variables
Principle of parsimony
wrong functional form
Unobservable Nature of Error Term 2.17
• Unspecified factors / explanatory variables,
not in the model, may be in the error term.
For example: Final examine score does not only
depend on class attendance but also other
unobserved factors such as student ability,
maths background, hard work effort, etc.
• Approximation error is in the error term if
relationship between y and x is not exactly a
perfectly linear relationship.
• Strictly unpredictable random behavior that
may be unique to that observation is in error.
SRF - sample regression function 2.18
In practice we do not observe the entire population
Instead we collect a sample to estimate the PRF
We may not be accurate because of sample fluctuations
(sampling error)
To illustrate, suppose we have only one family taken at
random from each income level in Table 2.1
We use the 10 observations to derive a sample
regression function as illustrated in Figure 2.3
SRF: Ŷ = b1 + b2X, where b is an estimator for 
Ŷ = 24 + (0.51)X (note “hat” over the Y)
SRF - sample regression function2.19
Table 2.1 (and 2.4) - Sample 1 in green)
income----> 80 100 120
55 65 79
weekly
60 70 84
family
65 74 90
consmption 70 80 94
expenditure 75 85 98
in dollars
88
140
80
93
95
103
108
113
115
Total
325 462 445 707
Cond. Mean 65 77 89 101
160
102
107
110
116
118
125
180
110
115
120
130
135
140
200
120
136
140
144
145
220
135
137
140
152
157
160
162
678 750 685 1043
113 125 137 149
240
137
145
155
165
175
189
260
150
152
175
178
180
185
191
966 1211
161 173
Y
(SRF)
.
Y4
Y3
Y2
^
Y
2
{
E(Y|x)=1+2x
(PRF)
}
u2
^Y = ^ + ^2.20
1
2x
^
u2
E(Y|x2)
Y1
.} u 1
x
x1
x2
x3
x4
The relationship among Yi, ui and the true regression line.
The Sample Regression Function (SRF)
Y
^u {.
4
Y4
Y3
Y2
Y1
^u .
2{
(SRF2)
^ ^
^
Y = 1 + 2x
^Y = ^ + ^ x
1
2
(SRF1)
}^
u3
.
u^1
}
.
x1
x2
x3
x4
2.21
x
Different samples will have different SRFs)
SRF - sample regression function 2.22
Sample 1
weekly consumption
160
PRF: E(Y|X) = 17 + 0.6•X
120
SRF: Ŷ = 24 + 0.51•X
80
40
60
80 100 120 140 160 180 200 220 240 260 280
weekly income
SRF:
or
^ ^
^
Yi = 1 + 2 Xi
^
^
Yi = 1 + 2Xi + u^ i
or
Yi = b1 + b2 Xi + ei
PRF:
E(Y|X) = 1 + 2 Xi
Yi = 1 + 2 Xi + u i
^
Yi = estimator of Yi (E(Y|xi)
^
i or bi = estimator of i
Other Notation ???
Residual
Error term or
Disturbance
2.23
Least Squared Method
2.24
^2
SRF2:Y = a1+a2X
Y
^ 1= b1+b2X
SRF1:Y
1
1
1
-1/2
2
-2
-11/2
0
-1
-1
SRF1: |u| = |1| + |-1| + |-1| + |1| + |-1.5|
u2 =12 + 12 + 12 + 12 + 1.52
X
= 5.5
= 6.25 smaller
SRF2: |u| = |2| + |0| + |-1/2| + |1| + |-2|
= 5.5
u2 = 22 + 02 + (-1/2)2 + 12+ (-2)2 = 9.25
Ordinary Least Squares (OLS) Method
2.25
Yi = 1 + 2Xi + ui
u i = Y i - 1 - 2X i
Minimize error sum of squared deviations:
n
n
i=1
i=1
 ui2 = (Y i - 1 - 2X i )2 = f(1,2)
2.26
Minimize w.r.t. 1 and 2:
n
f(1,2) = (Y i - 1 - 2x i
i =1
f ( )
= - 2 (Y i
 1
2
) =
f ( )
- 1 - 2Xi )
f ( )
= - 2 Xi (Yi
 2
- 1 - 2Xi )
Set each of these two derivatives equal to zero and
solve these two equations for the two unknowns:
1 2
To minimize f(.), you set the two
derivatives equal to zero to get:
f()
= - 2 (Y i
 1
^
2.27
^
– 1 – 2Xi ) = 0
f()
= - 2 xi (Yi
 2
^
^
- 1 – 2Xi ) = 0
When these two terms are set to zero,
^
^
1 and 2 become 1 and 2 because they no longer
represent just any value of 1 and 2 but the special
values that correspond to the minimum of f() .
- 2 (Y i
^
2.28
^
- 1 – 2Xi ) = 0
- 2 Xi (Y i
^
^
– 1 – 2Xi ) = 0
^
^
Y
n

–
 i
1 2 Xi = 0
^
^
Xi Yi - 1  X i - 2  Xi
^
^
2
= 0
n 1 + 2  Xi = Y i
2
^
^
1  Xi + 2 Xi = Xi Yi
2.29
Example: Maddala (p. 66)
Month
Sales
(Y)
Advertising
Exp. (X)
X2
XY
1
3
1
1
3
0.8
2
4
2
4
8
0.6
3
2
3
9
6
-2.6
4
6
4
16
24
0.2
5
8
5
25
40
1.0
Total
23
15
55
81
0
5ˆ1  15ˆ2  23
15ˆ  55ˆ  81
1
2
Yˆi  1.0  1.2 X i
uˆi  Yi  1.0  1.2 X i
^
n Xi
2
X
X
 i  i
1
=
^
=
2
 Yi
Xi Yi
Solve for two unknowns
2 =
^
=
n Xi Yi -  Xi Yi
n X i - (Xi )
2
(Xi - X )(Yi -Y)
2
X
X
)
( i
1 = Y - 2 x
^
^
2
=
xiyi
2
x i
xi  X i  X
yi  Yi  Y
2.30
2.31
Y
.
^*
Y1
.
{
^
Y*
.
^
u*2 {. Y2
2
^*
Y3
.
^
u*3{
.
{.
Y4
^u*
4
^*
Y4
^ ^
^
Y = 1 + 2X
^
^
*
^
*
Y = 1 + *2X
Y3
^
u*1
.
Y1
x1
x2
x3
x4
x
Why the SRF is the best one?
^ is larger.
The sum of squared residuals from any other line Y*
Assumptions of Simple Regression
2.32
1. The linear regression Model:linear in
parameters
Y = 1+ 2X+ u
2. X values are fixed in repeated sampling, so that X is
not constant (X is nonstochastic).
3. Zero mean value of error terms (disturbance, ui),
E( ui | xi) = 0
4. Homoscedasticity or equal variance of ui, the
conditional variances of u i are identical, i.e.,
var(ui|xi) = 2
2.33
Homoscedasticity Case
f(Yi)
.
.
x1=80
x2=100
income
The probability density function for Yi at two
levels of family income, X i , are identical.
xi
2.34
Heteroscedasticity Case
f(Yi)
.
.
x1
x2
x3
.
income
xt
The variance of Yi increases as family income,
xi, increases.
Assumptions of SRF (continue)
5. No autocorrelation between the disturbance.
cov(ui,uj|xi ,xj) = 0
6. Zero covariance between ui and xi, i.e.,
cov(ui,xi ) = E(ui,xi ) = 0
7. The # of observation (n) must be greater
than the # of parameters (k) to be estimated.
n>k
2.35
Assumptions of SRF (continue)
8. Variability in X values: The values in a
given sample must not all be the same,
at least two must different.
9. No specification bias or error: the regression
model is correctly specified.
10. There is no perfect multicollinearity. No
perfect linear relationship among the
independent variables. i.e.,
X k   Xm
2.36
2.37
One more assumption that is often used in
practice but is not required for least squares:
(Optional) The values of y are normally
distributed about their mean for each
value of x:
Y ~ N [(1+2X),2 ]
The Error Term Assumptions
2.38
1. The value of y, for each value of x, is
Y = 1 + 2X + u
2. The average value of the random error u is:
E(u) = 0
3. The variance of the random error u is:
var(u) = 2 = var(Y)
4. The covariance between any pair of u’s is:
cov(ui , uj) = cov(Yi ,Yj) = 0
5. u is normally distributed with mean 0, var(u)=2
u ~ N(0,2)
Prediction
2.39
Estimated regression equation:
^
Yi = 4 + 1.5 X i
X i = years of experience
^
Yi = predicted wage rate
If
If
^
X i = 2 years, then Yi = $7.00 per hour.
^
X i = 3 years, then Yt = $8.50 per hour.
2.40
Mean Prediction:
Ŷ  ˆ 1  ˆ 2 X
Prediction
Ŷ = 24.454 + 0.5090 X
X= 100
^
Y = 24.454 + 0.5090 (100) = 75.364
(estimated result)
The “ex-post” and “ex ante” forecasting:
For example: Suppose you have data on Y and X for 1947–1999
and the estimated consumption expenditures for 1947-1995 is
1947 – 1995:
^
Yt= 238.4 + 0.87Xt
2.41
t=time, e.g.,
t=1947,1948, …
Given values of X96 = 10,419; X97 = 10,625; … X99 = 11,286
The calculated predictions or the “ex post” forecasts are:
^
^
1996: Y96 = 238.4 + 0.87(10,149) = 9.355
1997: Y97 = 238.4 + 0.87(10,625) = 9.535.50
…..
^
1999: Y99 = 238.4 + 0.87(11285) = 10,113.70
The calculated predictions or the “ex ante” forecasts base on the
assumed value of X2000=12000:
^
2000: Y2000 = 238.4 + 0.87(12,000) = 10678.4
Forecasting with the two-variable regression model
ex-post forecast
1996
2.42
ex-ante forecast
1999
2003
t=time
Estimated regression function in a time-series context:
^
^
^
Yt  1   2 Xt
Forecast for-period t+t is
^
^
^
Yt t  1   2 Xt t
t
Forecast error:
xt t
: # of period into the future
: is an observed or control value
of future
f
^
^

ut t
Yt t  Yt t
2.43
Comparison of Forecasts
Mean squared error (MSE)
^
2

 (Y Y )

nk
i
i
 MSE
2
^

 (Y Y )

nk
Root mean squared error(RMSE)
i
Mean absolute percentage error
(MAPE)
MAPE 
i
(
^  Y |
|Y
i
i
Yi
n  k
)
n = # of forecats, k = # of parameters estimated in the model
Download