Uploaded by K61 MAI HƯƠNG TRÀ

3. chapter3

advertisement
Chapter 3
Multiple Regression
1
Outline
3.1. Definitions
3.1.1 Multiple Regression Model
3.1.2 Population regression function
3.1.3 Sample regression function
3.2. OLS Estimator in multiple regression model
3.2.1 Ordinary least square estimators
3.2.2 Assumptions of multiple regression
model
3.2.3 Unbiased and efficient properties
3.3. Measure of fit
2
3.1 Definition
3.3.1 Multiple regression model
revise chapter 2:
- The error u arises because of factors, or
variables, that influence Y but are not included
in the regression function.
- The key assumption 3 – that all other factors
affecting y are uncorrelated with x – is often
unrealistic -> difficult to draw ceteris paribus
conclusions about how x affects y
3
Example: compared 2 model
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑎𝑠𝑠𝑒𝑡 + 𝑢𝑖 (1)
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝑢𝑖 (2)
^
We know that the simple regression coefficient 𝛽2 (2) does not
^
usually equal the multiple regression coefficient 𝛽2 (1). There
^
^
are two distinct cases where 𝛽2 (1) and 𝛽2 (2) are identical:
1. The partial effect of x2 on y is zero in the sample. That is
𝛽3^= 0.
2. x1 and x2 are uncorrelated in the sample.
4
3.3.1 Multiple regression model
• is more amenable to ceteris paribus analysis
• Add more factors to our model -> more of
variation in y can be explained -> better model
for predicting the dependent variable.
5
MRM can incorporate fairly general
function from relationships.
𝑐𝑜𝑛𝑠𝑢𝑚 = 𝛽1 + 𝛽2 𝑖𝑛𝑐𝑜𝑚𝑒 + 𝛽3 𝑖𝑛𝑐𝑜𝑚𝑒 2 + 𝑢𝑖
6
MRM is the most widely used vehicle for empirical
analysis in economics and other social sciences
Wage=f(edu, exper)
Q=f(K,L)
7
3.1.2 Population regression function
Yi  1   2 X 2i  ....   k X ki  ui
• Y = One dependent variable (criterion)
• X = Two or more independent variables (predictor
variables).
• ui the stochastic disturbance term
• Sample size: >= 50 (at least 10 times as many cases
as independent variables)
•  1 is the intercept
•  k measures the change in Y with respect to Xk,
holding other factors fixed.
8
9
10
3.1.3 The Sample Regression Function (SRF)
• Population regression function
E (Y / X i )  f ( X i )  1   2i X 2i   3i X 3i
• Sample regression function
•
•
•
•
Yˆi  ˆ1  ˆ2 i X 2 i  ˆ3i X 3i
𝑌෠ i = estimator of E(Y | Xi)
𝛽መ 1 = estimator of β1
𝛽መ2 = estimator of β2
𝛽መ3 = estimator of β3
• An estimator, also known as a (sample) statistic, is simply a rule or
formula or method that tells how to estimate the population
parameter from the information provided by the sample.
• A particular numerical value obtained by the estimator in an
application is known as an estimate.
11
Example- Multiple regression function
• Problem 3.2: (Suppose that there are only 2
independent variables in the MRM) A labor
economist would like to examine the effects of
job training on worker productivity. In this
case, there is little need for formal economic
theory. Basic economic understanding is
realizing that factors such as education and
experience affect worker productivity. Also,
economists are well aware that workers are
paid commensurate with their productivity.
12
Example- Multiple regression function
 Model: wage = f(educ,exper )
Where:
wage = hourly wage
educ: years of formal education
exper: years of workforce experience
PRF:
𝑤𝑎𝑔𝑒 = 𝛽1 + 𝛽2 𝑒𝑑𝑢𝑐 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 + 𝑢
13
3.1.3. The Sample Regression Function (SRF)
A sample of Y values corresponding to some
fixed X’s. Can we estimate the PRF from the
sample data?
So, if we have a data for problem 3.1, we can
have the SRF:
𝑤𝑎𝑔𝑒
ෟ = 𝛽መ1 + 𝛽መ2 𝑒𝑑𝑢 + 𝛽መ3 𝑒𝑥𝑝𝑒𝑟 + 𝑢ො 𝑖
14
3.2. The OLS estimator in multiple
regression model
• 3.2.1 Ordinary least square estimators
• 3.2.2 Assumptions of MRM
• 3.2.3 Unbiased and efficient properties
15
3.2.1 OLS Estimators
Considered the three-variable model
• To find the OLS estimators, let us first write the sample
regression function (SRF) as follows:
7.4.1
Yi  ˆ1  ˆ2 X 2i  ˆ3 X 3i  uˆi
• The residual sum of squares (RSS) ∑uˆ2i is as small as
possible
2
uˆ  (Y  Yˆ ) 2  min


i
i

i
ˆ  ˆ X  ˆ X
ˆ
u

Y



 i 1 2 2i 3 3i
2
i

2
 min
16
3.2.1 OLS Estimators
• Partial derivative
 Yi  nˆ1  ˆ2  X 2i  ˆ3  X 3i

2
ˆ
ˆ
ˆ
X
Y


X


X


 2i i
1
2i
2
2i
3  X 2 i X 3i

2
ˆ
ˆ
ˆ
 X 3iYi  1  X 3i   2  X 2i X 3i   3  X 3i
17
3.2.1 OLS Estimators
• If we denote:
yi  Yi  Y
x2 i  X 2 i  X 2
x3i  X 3i  X 3
x X
x X
 y  Y
2
2i
2
2i
2
3i
2
3i
2
i
2
i
 n X 2 
2
 n X 3 
2
 nY 
2
 x x   X X  nX X
 y x  Y X  n.Y .X
 y x  Y X  n.Y .X
2i 3i
2i
3i
2
i 2i
i
2i
2
i 3i
i
3i
3
3
18
3.2.1 OLS Estimators
• We will obtain:
ˆ1  Y  ˆ2 X 2  ˆ3 X 3
ˆ

yx  x    x x  y x 


 x  x   x x 

y x  x    x x  y x 


 x  x   x x 
2
3
2
2
ˆ
3
2
2
2 3
2
3
2
2
3
2
2
3
2
2 3
2 3
2
3
2
2
2 3
19
3.2.1 OLS Estimators
• Example: We have a following data
Y
20
19
18
17
16
15
15
14
14
12
X2
8
7
6
5
5
5
4
4
3
3
X3
3
4
5
5
6
6
7
7
8
9
20
3.2.1 OLS Estimators
• We obtain
 Y  160
 X  50
 X  60
i
2i
3i
Y  16
X2  5
X3  6
 Y  2616
 X  274
 X  390
 Y X  835
 Y X  920
 X X  274
2
i
2
2i
2
3i
i
2i
i
3i
2i
3i
21
3. OLS Estimators
• and
 y   Y  nY   56
 x   X  nX   24
 x   X  nX   30
 y x   Y X  nY X  35
 y x   Y X  nY X  40
 x x   X X  nX X  26
2
i
2
2
i
2
2i
2
2i
2
3i
2
3i
2
2
2
3
i 2i
i
2i
2
i 3i
i
3i
3
2 i 3i
2i
3i
2
3
22
3.2.1 OLS Estimators
• and ˆ
 2  0.2272
ˆ3  1.1363
ˆ1  21.6818
Yˆi  21.6818  0.2272 X 2 1.1363 X 3
23
3.2.1 OLS Estimators
• Variances and Standard Errors of OLS Estimators
1
ˆ
Var (1 )  ( 
n
X 22  x32  X 32  x22  2 X 2 X 3  x2 x3
 x  x  ( x x )
x

Var ( ˆ ) 

 x  x  ( x x )
2
x

2
2
ˆ
Var ( 3 ) 

2
2
2
 x2  x3  ( x2 x3 )
2
2
2
3
2 3
2
3
2
2
2
2
3
2
) 2
2
2
2 3
24
3. OLS Estimators
• or, equivalently,
Var ( ˆ ) 
2
Var ( ˆ3 ) 
x
2
2
2
se(ˆ2 )  var( ˆ2 )
(1  r )
2
23
2
se( ˆ3 )  var( ˆ3 )
2
2
x
(
1

r
 3 23 )
where r23 is the sample coefficient of correlation between X2,X3
• In all these formulas σ2 is the variance of the population
disturbances ui
2
ˆ
  ˆ
2
2
u


i
n3
7.4.19
25
Example- Stata output
• Model: wage = f(educ,exper )
26
3.2.2 The Three-Variable Model: Notation and
Assumptions
Assumptions
Yi  1   2 X 2i  3 X 3i  ui
1. Linear regression model, or linear in the parameters.
2. X values are fixed in repeated samplings. X is assumed to be nonstochastic.
3. Zero mean value of disturbance ui: E(ui|X2i, X3i) = 0.
Then we have Zero covariance between ui and each X variable
cov (ui, X2i) = cov (ui,X3i) = 0
4. Homoscedasticity or constant variance of ui: Var(ui)=σ2
5. No serial correlation between the disturbances:
Cov(ui,uj) = 0, i ≠ j
6. The number of observations n must be greater than the number of
parameters to be estimated.
7. Variability in X values. The X values in a given sample must not all
be the same.
8
No specification bias or the model is correctly specified.
9.
No exact collinearity between the X variables.
27
Assumption 3: Zero mean value of
disturbance ui: E(ui|X2i, X3i) = 0.
This Ass can fail if:
- the functional relationship between the
explained
and
explanatory
variables
is
misspecified in equation
- omitting an important factor that is correlated
with any of x1,x2, …xk
When xj is correlated with u for any reason, then
xj is said to be an endogenous explanatory
variable
28
Assumption 9: No exact collinearity between the
X variables.
• If an independent variable in the regression is
an exact linear combination of the other
independent variables, then we say the model
suffers from perfect collinearity, and it cannot
be estimate (chapter5)
• Note that Ass 9 allow independent variables
to be correlated, they just cannot perfectly
correlated. If we did not allow for any
correlation, then multiple regression would be
of very limited use for econometric analysis.
29
3.2.3. Unbiased and efficient properties
Gauss-Markov Theorem ˆ1 , ˆ2 ,...., ˆk
are the best
linear unbiased estimators (BLUEs) of 1 ,  2, ......,  k
• An estimator
𝛽መ𝑗 is an unbiased estimator of
j
if
𝐸(𝛽መ𝑗 ) = 𝛽𝑗
• An estimator of 𝛽መ𝑗 is linear if and only if it can be
expressed as a linear function of
the data on the
n
dependent variable:
~
 j   wij y i
i 1
• “best” is defined as smallest variance.
30
3.2.3. Unbias and efficient properties
• The sample regression line (surface) passes through
the means of
(Y , X 2 ,..., X k )
• The mean value of the estimated Yi is equal to the mean
value of the actual Yi.
ˆ
Y Y
• Sum of residuals is equal to 0
n
 uˆ
i 1
i
• The residuals are uncorrelated with Xki :
• The residuals are uncorrelated with
Yˆi
0
n
X
i 1
ki
uˆ i  0
n
 Yˆ uˆ
i 1
i
i
0
31
̂
3.2.3. Unbias and efficient properties
Standard errors of the OLS estimators
• An unbiased estimator of 
n
2
2


E
(
u
)

u
:
 i /n
2
2
i 1
 This is not a true estimator because we can not
observe the ui.
2
ˆ
u
 i
RSS

• The unbiased estimator of  : ˆ 
nk nk
2
2
distribution with df = number of
RSS /  follows
2
2

observations – number of estimated parameters = n-k
̂
Positive
is called the standard error of the regression
(SER) (or Root MSE). SER is an estimator of the standard
deviation of the error term.
32
3.2.3. Unbias and efficient properties
Var ( ˆ j ) 

2
TSS j (1  R 2j )
n
TSS j   ( xij  x j ) 2
• Where
is total sample
i 1
2
R
variation in xj and
is the R-squared from
j
regressing xj on all other independent
variables (and including an intercept).
• Since  is unknown, we replace it with its
estimator ̂ . Standard error:
2 1/ 2
ˆ
ˆ
se( j )   /[TSS j (1  R j )]
33
3.3 Measure of fit or coefficient of determination R2
• The total sum of squares (TSS)
TSS   y   (Yi  Y )  Yi  nY
2
i
2
2
2
• The explained sum of squares (ESS)
2
ˆ
ESS   yˆ  (Yi  Y )  ˆ2  yi x2i  ˆ3  yi x3i
2
i
• The residual sum of squares (RSS)
RSS   (Yi Yˆi ) 2   uˆi2  TSS  ESS
• Goodness of fit - Coefficient of Determination R2
ESS
RSS
R 
 1
TSS
TSS
2
 The fraction of the sample variation in Y that is explained
by X2 and X3.
34
Example- Goodness of fit
• Determinants of college GPA:
- The variables in GPA1.dta include the college
grade point average (colGPA), high school GPA
(hsGPA) and achievement test score (ACT),
AGE, Campus for a sample of 141 students
from a large university
35
Example- Goodness of fit
• Determinants of college GPA:
-
36
Output interpretation
• hsGPA and ACT together explain about ?% of
the variation in college GPA for this sample of
students.
• There are many other factors including family
background, personality, quality of high school
education, affinity for college that contribute
to a student’s college performance.
37
3.3. Measure of fit
• Note that R2 lies between 0 and 1.
o If it is 1, the fitted regression line explains 100 percent of
the variation in Y
o If it is 0, the model does not explain any of the variation
in Y.
• The fit of the model is said to be “better’’ the closer R 2 is to
1
• As the number of regressors increases, R2 almost invariably
increases and never decreases.
38
R2 and the adjusted R2
• An alternative coefficient of determination:
RSS /( n  k )
R  1
TSS /( n  1)
2
n 1
R  1  (1  R )
nk
2
2
where k = the number of parameters in the model including the
intercept term.
39
R2 and the adjusted R2
• It is good practice to use adjusted R2 than R2
because R2 tends to give an overly optimistic
picture of the fit of the regression, particularly
when the number of explanatory variables is
not very small compared with the number of
observations.
40
The game of maximizing adjusted R2
• Sometimes researchers play the game of maximizing
adjusted R2, that is, choosing the model that gives the
highest adjusted R2.  This may be dangerous.
• In regression analysis, our objective is not to obtain a
high adjusted R2 per se but rather to obtain
dependable estimates if the true population regression
coefficients and draw statistical inferences about them.
• Researchers should be more concerned about the
logical or theoretical relevance of the explanatory
variables to the dependent variable and their statistical
significance.
41
Comparing Coefficients of Determination R2
• It is crucial to note that in comparing two models on the
basis of the coefficient of determination, whether adjusted
or not
• the sample size n must be the same
• the dependent variable must be the same
• the explanatory variables may take any form.
Thus for the models
lnYi = β1 + β2X2i + β3X3i + ui
(7.8.6)
Yi = α1 + α2X2i + α3X3i + ui
(7.8.7)
the computed R2 terms cannot be compared
42
Review: Partial correlation coefficients
• Example: we have a regression model with three variables:
Y, X2 and X3.
• The coefficient of correlation r as a measure of the degree
of linear association between two variables: r12 (correlation
coefficient between Y and X2), r13(correlation coefficient
between Y and X3) and r23 (correlation coefficient between
X2 and X3) gross of simple correlation coefficients or
correlation coefficients of zero order.
• Does r12 in fact measure the “true” degree of (linear)
association between Y and X2 when X3 may be associated
with both of them?
 We need a correlation coefficient that is independent of
the influence of X3 on X2 and Y  The partial correlation
coefficient.
43
Review. Partial correlation coefficients
• r12,3 =partial correlation coefficient between Y and X2,
holding X3 constant.
• r13,2 =partial correlation coefficient between Y and X3,
holding X2 constant.
• r23,1 =partial correlation coefficient between X2 and X3,
holding Y constant.
 These are called first order correlation coefficients (order=
the number of secondary subscripts).
r12,3 
r12  r13 r23
(1  r )(1  r )
2
13
2
23
r23,1 
r13, 2 
r13  r12 r23
(1  r122 )(1  r232 )
r23  r12 r13
(1  r122 )(1  r132 )
44
Example- Partial correlation coefficients
• Y= crop yield, X2= rainfall, X3= temperature.
Assume r12=0, there is no association between
crop yield and rainfall. Assume r13 is positive,
r23 is negative  r12,3 will be positive 
Holding temperature constant, there is a
positive association between yield and rainfall.
Since temperature X3 affects both yield Y and
rainfall, we need to remove the influence of
the nuisance variable temperature.
• In Eview: quick-> group statistic->correlation
45
More on Functional Form
LEC 11
The Cobb–Douglas Production Function
• The Cobb–Douglas production function, in its stochastic
form, may be expressed as:
3 U i
2
Yi  1 X 2i X 3i e
7.9.1
where
Y = output
X2 = labor input
X3 = capital input
u = stochastic disturbance term
e = base of natural logarithm
• if we log-transform this model, we obtain:
ln Yi = ln β1 + β2 lnX2i + β3lnX3i + ui
= β0 + β2lnX2i + β3lnX3i + ui
(7.9.2)
where β0 = ln β1.
46
EXAMPLE 7.3 ValueAdded, Labor Hours, and Capital Input in
the Manufacturing Sector
• There
Nguyen Thu Hang, BMNV, FTU CS2
47
Regression
• There
Nguyen Thu Hang, BMNV, FTU CS2
48
. More on Functional Form
The Cobb–Douglas Production Function
7.9.4
• The output elasticities of labor and capital were 0.4683 and
0.5213, respectively.
• Holding the capital input constant, 1 percent increase in the labor
input led on the average to about a 0.47 percent increase in the
output.
• Similarly, holding the labor input constant, 1 percent increase in
the capital input led on the average to about a 0.52 percent
increase in the output.
49
More on Functional Form
Polynomial Regression Models
Figure 7.1, The U-shaped marginal cost curve shows that
the relationship between MC and output is nonlinear.
50
. More on Functional Form
Polynomial Regression Models
• Geometrically, the MC curve depicted in Figure 7.1
represents a parabola. Mathematically, the parabola is
represented by the following equation:
Y = β0 + β1X + β2Xi2
(7.10.1)
which is called a quadratic function,
• The general kth degree polynomial regression may be written
as
Yi = β0 + β1Xi + β2Xi2+ · · ·+βkXik + ui
(7.10.3)
51
More on Functional Form
Polynomial Regression Models
EXAMPLE 7.4 Estimating the Total Cost Function
52
Download