Applied Econometrics Week 1

advertisement
Applied Econometrics
Week 1-2: Ordinary Least Squares
Michal Rubaszek
Readings
R.C. Hill, W.E. Griffiths, G.C. Lim, 2012. Principles of Econometrics, John Wiley & Sons,
Inc., Chapters 1,2,3,4,5
1
1.1
Introduction to Econometrics
Definition of econometrics
Econometrics: application of mathematical and statistical techniques to economics in the study of
problems, the analysis of data, and the development and testing of theories and models.
1.2
Econometric model
The general specification of a (single-equation) econometric model is:
yt = f (xt , α, t ) for t = 1, 2, . . . , T
(1)
where:
yt - a dependent variable;
xt - a vector of K independent variables (constant included);
α - a vector of model parameters (that are to be estimated);
t - error term (stochastic part of the model);
t - moment of observation
For linear models the notation translates into:
yt = α1 x1t + α2 x2t + . . . + αK xKt + t
(2)
Note. In the above notation we implicitly add “for t = 1, 2, . . . , T ”
Remark. The econometric model differs from an economic model in two dimensions. First, variables
in the former are indexed by t. Second, there is a stochastic part represented by the error term in
the econometric model.
1
Example 1. Our focus is to find the marginal propensity to consume (MPC) out of disposable
income. The economic model is:
C = α + βY,
where C and Y are the levels of consumption and disposable income, respectively, and the parameter
β describes the MPC. The corresponding econometric model is:
Ct = α + βYt + t
where parameters α and β are to be estimated on the basis of T empirical observations for Ct and
Yt .
1.3
Types of data
In order to estimate the parameters of the econometric model we need empirical observations, where
there are various types of data. The first classification is based on the source of the data:
• macroeconomic data (macroeconometrics)
• microeconomic data (microeconometrics)
• financial data (financial econometrics)
• experimental data (experimental econometrics)
The second classification is based on the type of data sample:
• time series (collected over discrete intervals of time: yt , t = 1, 2, ..., T )
• cross-section data (collected across sample units in a particular time period: yi , i = 1, 2, ..., N )
• panel or longitudinal data (observations on many individual units over time: yit , i = 1, 2, ..., N
and t = 1, 2, ..., T )
2
Stages of building an econometric model
The process of constructing an econometric model consists of six stages:
1. Setting up a research hypothesis
2. Choosing a functional form and the set of explanatory variables
3. Collecting the data
4. Estimating the model
5. Verification process
6. Application
2
Example 2. We analyze the relationship between unemployment rate and inflation in Poland (file
example1a.wf1).
Research hypothesis:
Model specification:
Data:
Estimation:
Verification:
Application:
3
To estimate the slope of the Phillips curve in Poland
πt = α1 + α2 ut + t
Quarterly data for inflation and the unemployment rate from the
period 1995-2011 (source: CSO).
Ordinary Least Squares (OLS)
Determination coefficient R2 , test whether α2 < 0
Verification of the theoretical model
Ordinary Least Squares estimator
Let us write down the linear econometric model given by (2) in a shorter form:
yt = α0 xt + t ,
(3)
where α = [α1 α2 . . . αK ]0 is the vector of parameters and xt = [x1t x2t . . . xKt ]0 is the vector of
explanatory variables. We can observe the values of yt and xt , but don’t know the values of α. We
need to estimate the parameters. There are many methods to do so, of which the most popular is
the Ordinary Least Squares (OLS).
Let us denote by α̂ the estimate of vector α. It could be any vector of size K, even containing
the most unreasonable values. For such a vector we can calculate:
Fitted values:
Residuals:
Sum of squared residuals:
ŷt = α̂0 xt , t = 1, 2, . . . , T
et = yt − ŷt , t = 1, 2, . . . , T
PT
SSE(α̂) = t=1 e2t
Let’s notice that the value of SEE depends on our choice of α̂. That means that we can find α̂
such that the SEE(α̂) is minimum, i.e.:
∀a∈<K SEE(α̂) ≤ SEE(a)
(4)
This value is called the OLS estimate.
To find this value we need to solve the optimization problem (i.e. find the value of α̂ for which
the first derivative of SEE is null). This solution is the formula for the OLS estimator:
T
T
X
X
α̂ = (
xt x0t )−1 (
xt yt ).
t=1
(5)
t=1
Remark. The OLS estimator is a general formula and is a random variable. The properties of the
estimator depend on the structure of the model (described by assumptions). OLS estimates are
numbers that we obtain by applying the general formulas to the observed data. This distinction is
fundamental to understand econometric inference.
3
Example 3. The application of (5) to the Phillips curve model leads to the following estimates
(example1a.wf1):
π̂t = 11.1 − 0.28ut ,
which means that ceteris paribus an increase in the unemployment rate by 1 percentage point leads
to a decrease of the annual CPI inflation rate by 0.28 percentage point.
4
Assumptions of the linear regression model
To perform statistical inference for the OLS estimator given by (5) it is essential to make assumptions
about the underlying model. The standard set of assumptions is as follows:
A1 For each t the expected value of yt given xt is constant:
E(yt |xt ) = E(α0 xt + t |xt ) = α0 xt + E(t |xt ) = α0 xt
A2 For each t and xt the variance of yt is:
var(yt |xt ) = σ 2
A3 For each s and t, such that s 6= t, the covariance of the error term is null:
cov(s , t ) = 0
A4 The values of xkt for k = 1, 2, . . . , K are not random and are not linear functions of other
explanatory variables
A5 (Additional assumption) The random term is normally distributed:
t ∼ N (0, σ 2 )
Note. A1 indicates that the error term has a probability distribution with zero mean. A2 and A3
indicate that the error term is homoscedastic and not autocorrelated. A4 indicates that the value
of the explanatory variables are known (are not stochastic) and that there is no exact collinearity.
Gauss-Markov Theorem
Under assumptions A1-A4 the OLS estimator is:
Unbiased:
Consistent:
Effective:
5
E(α̂) = α
limT →∞ α̂T = α
var(α̂) is the lowest in the class of linear and unbiased estimators
Interval estimation
Since the OLS estimator α̂ is a random variable, it has a distribution. To illustrate this let us
consider the following example
4
Example 4. Let’s generate a sample of 100 observation from the true model yt = 10+5xt +t , where
t ∼ N (0, 22 ). The data are available in the example1b.wf file. Subsequently, let’s divide the sample
into five equal subsamples of 20 observations and calculate OLS estimates for each subsample. The
results, which are presented below, show that the estimates vary across the subsamples and are
never exactly equal to the true values of the parameters.
subsample
a1
a2
1
9.21
5.27
2
10.97
5.75
3
9.70
4.57
4
9.96
5.51
5
10.41
6.14
true value
10
5
The interpretation of the above results is that for each subsample the OLS estimate is a single draw
from the distribution for the OLS estimator. Under the assumptions A1-A5 this distribution is:
α̂ ∼ N (α, Σ)
where:
T
X
xt x0t )−1
Σ = σ2 (
(6)
t=1
The diagonal elements of the covariance matrix Σ, which we denote by σα2 k , stand for the variance
of individual parameters var(α̂k ) = σα2 k . The distribution of the OLS estimator for an individual
parameter is:
α̂k ∼ N (αk , σα2 k ).
Now we can compute a range of values in which the true parameter αk is likely to fall (confidence
interval or interval estimate):
P rob(ak − 1.96σαk ≤ αk ≤ ak + 1.96σαk ) = 0.95.
In practice we cannot use the above formula for inference because we don’t know the variance of
the error term σ 2 . We need to substitute σ 2 with its unbiased estimate:
σ
b2 =
PT
2
t=1 et
T −K
(7)
and the covariance matrix Σ with its estimator:
T
X
b =σ
Σ
b2 (
xt x0t )−1 .
(8)
t=1
b which we denote by S 2 , stand for the estimators for
The diagonal elements of the matrix Σ,
k
the variance of individual parameters: Sk2 = σ̂α2 k . The substitution of σαk with Sk changes the
distribution from normal to t-distribution with ν = T − K degrees of freedom, so that:
α̂k − αk
∼ tν
Sk
and the interval estimate changes into:
5
(9)
P rob(α̂k − t∗ν Sk ≤ αk ≤ α̂k + t∗ν Sk ) = 0.95,
where t∗ν is the critical value of the tν distribution for the 95% interval.
Example 5. For the Phillips curve model (example1a.wf1) the values of Sk are 3.10 and 0.22,
which we write down as:
π̂t = 11.1 − 0.28 ut .
(3.10)
Since T = 68, then ν = 66,
t∗ν
(0.22)
= 1.995 and:
P rob(4.89 ≤ α1 ≤ 17.30) = 0.95
P rob(−0.72 ≤ α2 ≤ 0.10) = 0.95
6
Hypothesis tests
Tests of hypotheses about parameter values compare a conjecture we have about the population to
the information contained in a sample of data. The test of a hypothesis consists of the following
stages:
1. Setting a null H0 and alternative H1 hypotheses
2. Computing a test statistic
3. Determining a rejection region
4. Comparing the test statistic to the rejection region
For individual parameters of the linear model the set of hypotheses is:
H0 :αk = c
H1 :αk 6= c
(10)
Assuming the null is true we can substitute αk in (9) with c, hence the test statistic:
tαk =
α̂k − c
∼ tν .
Sk
(11)
The rejection region depends on a probability γ, called the significance level of the test (usually
γ = 5%). For a given γ we need to find the critical value t∗ν,γ of the t-distribution, which determines
the rejection region. Our decision is as follows:
if |tαk | ≥ t∗ν,γ we reject the null
if |tαk | < t∗ν,γ we don’t reject the null
Remark. The significance level γ describes the probability of Type I error, i.e. rejecting the null
when it is true. In practice there is also Type II error, i.e. not rejecting the null when it is false.
We cannot control for Type II error since its probability depends on the unknown value of αk .
6
The p-value
When reporting the outcome of statistical hypothesis tests it has become standard practice to report
the p-value (probability value) of the test. We can compare this p-value to the chosen significance
value γ. Our decision is as follows:
if p ≤ γ we reject the null
if p > γ we don’t reject the null
Example 6. For the Phillips curve example the values of t statistics are tα1 = 3.57 and tα2 = −1.28.
Given that the critical value for the 5% significance level is t∗ = 1.995, we can reject the null
H0 : α0 = 0, but cannot reject the null H0 : α1 = 0. This is confirmed by the probability values:
pα1 = 0.0007 < 0.05 and pα2 = 0.2051 > 0.05.
7
Tests for joint hypothesis
A null hypothesis of multiple restrictions on the parameters, which is called a joint hypothesis test,
can be tested with one of three substitutive tests: F -test, Lagrange Multiplier LM -test and the
likelihood ratio test. Here we will discuss the first two tests.
The general form of M linear restrictions on the parameters of a regression can be written as:
H0 :Rα = r
H1 :Rα 6= r
(12)
where R is a M × K matrix and r a M × 1 vector. If the null is true then the fit of the restricted
model shouldn’t be significantly worse than the fitPof the unrestricted model. We can test it by
T
comparing the Sum of Squared Residuals SEE = t=1 e2t of both models. If the null is true, the
F -test statistic:
F =
(SSER − SSEU )/M
SSEU /(T − K)
(13)
has an F distribution with v1 = M and v2 = T −K degrees of freedom. Subscripts R and U denotes
restricted and unrestricted model, respectively.
Remark. Since the fit of the restricted model cannot be better than the fit of the unrestricted model,
inequality SEER − SEEU ≥ 0 always holds.
In the case of the LM test, under the null the statistic:
LM =
(SSER − SSEU )
=M ×F
σ̂ 2
(14)
has a χ2 distribution with v = M degrees of freedom (on the correspondence between both tests:
see Appendix 6A in the “Principles of Econometrics”).
7
Example 7. We analyze the relationship between the interest rate (it ), year-on-year inflation (πt )
and year-on-year GDP growth rate (yt ). The quarterly data, which are taken from the MPdata.wf1
file, cover the period 1974-2011 and relate to the U.S. economy. The results of OLS estimation are:
ît = 1.41 + 0.97 πt + 0.26 yt .
(0.46)
(0.07)
(0.09)
We want to verify the null that the coefficients α2 and α3 , which describe the impact of inflation
and output, are 1.5 and 0.5, respectively. In terms of (12) we can write these hypothesis as:
0
H0 :
0
0
H1 :
0
1
0
1
0
0
1.5
α=
1
0.5
0
1.5
α 6=
1
0.5
The test statistics: F (2, 149) = 135.6 (p = 0.000) and LM (2) = 271.2 (p = 0.000) indicate that we
should reject the null.
Finally, it should be noticed that the joint hypothesis tests are usually applied to test for the
overall significance of the regression model. In this case, for a linear model with a constant:
yt = α1 + α2 x2t + . . . + αK xKt + t
the form of hypotheses given by (12) is:
H0 :α2 = 0 ∧ α3 = 0 ∧ . . . ∧ αk = 0
(15)
H1 :α2 = 0 ∨ α3 = 0 ∨ . . . ∨ αk = 0.
Given that under the null the model shrinks to yt = α1 + t , we get SEER =
result, the value of the F test statistic is:
F =
8
PT
R2 /M
(1 − R2 )/(T − K)
t=1 (yt
− ȳ)2 . As a
(16)
Recommended exercises from Principles of Econometrics
Exercise 1. Solve exercises for models with one explanatory variable: 2.10 (CAPM), 2.12 (House
price), 2.15 (wage vs education)
Exercise 2. Solve exercises for hypotheses testing: 3.7 (CAPM), 3.8 (House price), 3.12 (wage vs
education)
Exercise 3. Solve exercises for models with many explanatory variables: 5.13 (House price), 5.19
(wage vs education), 5.25 (production function)
Housework 1. Read Probability Primer, chapter 1, Principles of Econometrics
8
Download