Uploaded by Tanishq Maini

Study Guide Exam 2

advertisement
Study Guide for Exam #2
This Study Guide is a supplementary study tool to help you
better prepare for the exam. This should not, however, be the
only source of information that you use to study for the exam.
First, and foremost, begin by reading your lecture notes, solve
the problems that we have done in class, go over the concepts
and the formulas. You should also solve as many additional
problems as you can. The more practice you get, the better.
1
List of Definitions
A binary variable is a variable that can only take two
values - 0 and 1. A binary variable is also called an
indicator variable or a dummy variable.
The error term ui is homoskedastic if the variance of
the conditional distribution of ui given Xi is constant
for i = 1, ..., n and in particular does not depend on
Xi . Otherwise, the error terms is heteroskedastic.
Mathematically,
var(ui |Xi = x) = σu2 ∀i = 1, ..., n
(1)
If the regressor is correlated with a variable that has
been omitted from the analysis and that determines, in
part, the dependent variable, then the OLS estimator
will have omitted variable bias.
σu
βˆ1 → β1 + ρXu
σX
1
(2)
The population regression line/function in the context
of multple regression is given by:
Yi = β0 + β1 X1 + β2 X2 + ... + βk Xk
(3)
The coefficient β0 is the intercept; the coefficient β1 is
the slope coefficient of X1 , or the coefficient on X1 ,
and so on.
The multiple linear regression model still allows for
deviations from the population regression line due to
remaining additional factors (including chance), captured in ui :
Yi = β0 + β1 Xi1 + β2 Xi2 + ... + βk Xki + ui , ∀i = 1, ..., n
(4)
Yi is ith observation on the dependent variable: X1i , ..., Xki
are the ith observation on each of the k regressors; and
ui is the error term. β1 is the slope coefficient on X1 ,
β2 is the slope coefficient on X2 , and so on. The coefficient β1 is the expected difference in Y1 associated
with a unit difference in X1 , holding constant the other
regressors, X2 , ..., Xk . The intercept β0 is the expected
value Y when all the X’s equal 0.
The estimators of the coefficients β0 , β1 , ..., βk that minimize the sum of squared mistakes are called the ordinary least squares (OLS) estimators of β0 , β1 , ..., βk
and are denoted βˆ0 , βˆ1 , ..., βˆk .
The OLS regression line is the straight line constructed
using the OLS estimators: βˆ0 + βˆ1 X1i + ... + βˆk Xki .
2
The predicted value of Yi given X1i , ..., Xki , based on
the OLS regression line, is Ŷi = βˆ0 + βˆ1 X1i +...+ βˆk Xki .
The OLS residual for the ith observation is the difference between Yi and its OLS predicted value; that is,
the OLS residual is ûi = Yi − Ŷi .
Zero conditional mean: E(ui |X1i , ..., Xki ) = 0. This
assumption is implied if X1i , ..., Xki are randomly assigned or are as-if randomly assigned.
The regressors are said to exhibit perfect multicollinearity if one of the regressors is a perfect linear function
of the other regressors.
The dummy variable trap arises when the set of regressors includes a complete set of dummy variables (indicator variables) for all possible outcomes in addition
to estimating the intercept.
Imperfect multicollinearity means that two or more of
the regressors are highly correlated in the sense that
there is a linear function of the regressors that is highly
correlated with another regressor.
R-squared (R2 ) captures the proportion of the variation
in the dependent variable that is explained by the model
(i.e. the chosen regressors). Equivalently, the R2 is 1
minus the fraction of the variance of Yi not explained
by the regressors.
3
ESS
T SS
SSR
R2 = 1 −
T SS
R2 =
(5)
(6)
The adjusted R̄2 accounts for the number of regressors
and imposes a small penalty for adding regressors that
is only offset if they have actual explanatory power.
R̄2 = 1 −
because
s2
n − 1 SSR
= 1 − 2u
n − k − 1 T SS
sY
(7)
n
s2û
X
1
SSR
=
uˆ2i =
n − k − 1 i=1
n−k−1
(8)
and
s2y = T SS(n − 1)
(9)
The standard error of the regression (SER) is an estimator of the standard deviation of the regression prediction error ui . The SER measures the spread of observations around the fitted regression line, calculated
in the same units as the dependent variable.
n
s2û
X
1
SSR
=
uˆ2i =
n − k − 1 i=1
n−k−1
SER = sû
Control Variable is a regressor included to hold constant factors that if neglected could lead to omitted
variable bias of the variable of interest.
4
(10)
(11)
The F -statistic is used to test a joint hypothesis about
regression coefficients.
In the q = 2 restriction case with H0 : β1 = 0 and
β2 = 0,
1 t21 + t22 − 2ρ̂t1 ,t2 t1 t2
F = {
} ∼ Fq=2,n−k−1
q
1 − ρ̂2t1 ,t2
(12)
where ρ̂2t1 ,t2 is an estimator of the correlation between
t-statistics.
The special homoskedasticity-only F-statistic can be expressed as the improvement in fit of the regression (e.g.
as measured by the decrease in the sum of squared
residuals or increase in R2 .)
F =
2
(RU2 nrestricted − RRestricted
)/q
2
(1 − RU nrestricted )/(n − kU nrestricted − 1)
(13)
F =
(SSRRestricted − SSRU nrestricted )/q
(SSRU nrestricted )/(n − kU nrestricted − 1)
(14)
In large samples, p-values are computed and interpreted
analogously, except that they use the Fq,∞ distribution.
Let F act denote the value of the F-statistic actually
computed. Because the F -statistic has a large sample
Fq,∞ distribution under the null hypothesis, the p-value
is
p − value = P r[Fq,∞ > F act ]
The p-value can be evaluated using a table of the Fq,∞
distribution.
5
(15)
A nonlinear regression function is a nonlinear function of the independent variables. The function f (X)
is linear if the slope of f (X) is the same for all values
of X, but if the slope depends on the value of X, then
f (X) is nonlinear. The nonlinear population regression models are of the form
Yi = f (X1i , X2i , ..., Xki ) + ui , i = 1, ..., n
(16)
where f (X1i , X2i , ..., Xki ) is the population nonlinear
regression function, a possibly nonlinear function of
the independent variables and ui is the error term.
The expected change in Y , ∆Y , associated with the
change in X1 , holding X2i , ..., Xki constant, is the difference between the value of the population regression
function before and after changing X1 , holding X2i , ..., Xki
constant. That is, the expected change in Y is the difference:
∆Y = f (X1 + ∆X1 , X2 , ..., Xk ) − f (X1 , X2 , ..., Xk )
(17)
The estimator of this unknown population difference is
the difference between the predicted values of these two
cases. Let fˆ(X1i , X2i , ..., Xki ) be the predicted value of
Y based on the estimator fˆ of the population regression
function. Then the predicted change in Y is
∆Ŷ = fˆ(X1 + ∆X1 , X2 , ..., Xk ) − fˆ(X1 , X2 , ..., Xk )
Let r denote the highest power of X that is included
in the rgeression. The polynomial regression model of
degree r is
6
(18)
Yi = β0 + β1 Xi + β2 Xi2 + ... + βr Xir + ui
(19)
When ∆x is small, the difference between x + ∆x and
the logarithm of x is approximately ∆x/x, the percentage change in x divided by 100.
When Y is not in logs, but X is, this is sometimes
referred to as a linear-log model.
Yi = β0 + β1 ln(Xi ) + ui
(20)
When Y is in logarithms, but X is not, this is referred
to as a log-linear model.
ln(Yi ) = β0 + β1 Xi + ui
(21)
When both X and Y are specified in logarithms, this
is referred to as a log-log model.
ln(Yi ) = β0 + β1 ln(Xi ) + ui
(22)
We can modify the multiple regression model by introducing the product of the two binary variables as
another regressor.
Yi = β0 + β1 D1i + β2 D2i + β3 (D1i × D2i ) + ui
The product D1i × D2i is called an interaction term or
an interacted regressor, and the population regression
model is called a binary variable regression model.
7
(23)
We can modify the multiple regression model by introducing the product of a binary variable and a continuous variable as another regressor.
Yi = β0 + β1 Xi + β2 Di + β3 (Xi × Di ) + ui
(24)
The product Xi × Di is called an interaction term or
an interacted regressor, and the population regression
model above illustrates the possibility of an interaction
between a continuous variable and a binary variable.
We can modify the multiple regression model by introducing the product of the two continuous variables as
another regressor.
Yi = β0 + β1 X1i + β2 X2i + β3 (X1i × X2i ) + ui
(25)
The product X1i × X2i is called an interaction term
or an interacted regressor, and the population regression model above illustrates the possibility of an interaction between two continuous variables. The interaction term allows the effect of a unit change in X1 to
depend on X2 .
The chi-squared distribution (χ2m ) is the distribution of
the sum of m squared standard normal random variables with degrees of freedom m.
The F distribution is the ratio of two independently
distributed chi-squared random variables divided by their
respective degrees of freedom.
8
If W1 ∼ χ2m , W2 ∼ χ2n , and
then
P r(W1 = w1 |W2 = w2 ) = P r(W1 = w1 )
(26)
W1 /m
∼ Fm,n
W2 /n
(27)
When the denominator degrees of freedom is large enough
the Fm,n distribution can be approximated by the Fm,∞
distribution. The Fm,∞ distribution is the distribution
of a chi-squared random variable, W , with m degrees
of freedom divided by m: W/m is distributed Fm,∞ .
2
List of Key Concepts and Applications
2.1
Statistical Inference in Multiple Regression
• To test the hypothesis that H0 : βj = βj,0 against the alternative βj 6= βj,0 , we have to:
1. Compute the standard error of SE(βˆj ).
2. Compute the t-statistic.
act
t
βˆj − β0
=
≈ N (0, 1)
SE(βˆj )
(28)
3. Compute the p-value.
Specifically, we reject the null (H0 : βj = βj,0 ) at the 5%
significance level whenever
1. p − value = 2Φ (− |tact |) ≤ 0.05
2. |tact | ≥ 1.96
3. βˆj falls outside the 95% confidence interval defined by
[βˆj − 1.96SE(βˆj ), βˆj + 1.96SE(βˆj )].
9
2.2
Testing Joint Hypotheses
– E.g., H0 : β1 = 0 and β2 = 0 vs. H1 : β1 6= 0 and/or
β2 6= 0.
– E.g., H0 : β1 = β2 vs. H1 : β1 6= β2 .
2.3 Testing Multiple Restrictions Involving Single
Coefficients
• To test the hypothesis that
H0 : βj = βj,0 , βm = βm,0 , ...
against the alternative
H1 : one or more of the q restrictions does not hold,
we have to:
in the q = 2 restriction case with H0 : β1 = 0 and β2 = 0,
1 t21 + t22 − 2ρ̂t1 ,t2 t1 t2
F = {
} ∼ Fq=2,n−k−1
q
1 − ρ̂2t1 ,t2
(29)
where ρ̂2t1 ,t2 is an estimator of the correlation between tstatistics. Because the F -statistic has a large sample Fq,∞
distribution under the null hypothesis, the p-value is
p − value = P r[Fq,∞ > F act ]
(30)
If the error term is homoskedastic, the F -statistic takes the
following form
2
(RU2 nrestricted − RRestricted
)/q
F =
2
(1 − RU nrestricted )(n − kU nrestricted − 1)
F =
(SSRRestricted − SSRU nrestricted )/q
(SSRU nrestricted )(n − kU nrestricted − 1)
10
(31)
(32)
2.4 Testing Single Restrictions Involving Multiple
Coefficients
H0 : β1 = β2
(33)
H1 : β1 6= β2
(34)
vs.
– Test the restriction directly.
– Transform the model and then test the restriction.
2.5 A General Approach to Modeling Nonlinearities Using Multiple Regression
1. Identify a possible nonlinear relationship.
2. Specify a nonlinear function, and estimate its parameters by OLS.
3. Determine whether the nonlinear model improves upon
a linear model.
4. Plot the estimated nonlinear regression function.
5. Estimate the effect on Y of a change in X.
• You should be able to correctly interpret regression results
from STATA.
• You should be able to calculate, if need be, and interpret
measures of the goodness of fit of a given regression model
(e.g., SER, R2 , R̄2 ).
• You should be able to 1) detect the presence of heteroskedasticity in the data; 2) propose solutions to standard regression methods to accommodate heteroskedastic errors.
11
• You should be able to propose nonlinear regression models
to improve upon linear regression models and to interpret
the estimated coefficients in the context of nonlinear regression models.
• You should be able to provide policy recommendations based
on empirical tests.
12
Download