Overview of Prerequisites Martin Weidner ECONG020: Econometrics UCL

advertisement
ECONG020: Econometrics
Overview of Prerequisites
Martin Weidner
UCL
1 / 20
Matrix Algebra
You are expected to be familiar with the following concepts:
I
row-vector v , column-vector w , matrix A
I
transpose v 0 , w 0 , A0
I
basic operations (+, −, ×)
I
inverse matrix: A−1
I
rank of a matrix: rank(A)
I
positive (semi-) definite matrix: A > 0, A ≥ 0.
I
trace and determinant of a matrix: tr(A), det(A)
You can study this from any linear algebra textbook. There will
also be a math-stats pre-course before the fall term starts, which
also includes some matrix algebra, probability theory and basic
asymptotic theory (see requirements below), but it might be useful
to start studying this over the summer already.
2 / 20
Probability Theory
I
random variable, random vector, random matrix
I
probability density function (pdf), cumulative distribution
function (cdf)
I
expected value
I
normal distribution N (µ, σ 2 ), properties
I
multivariate normal distribution N (µ, Σ)
I
conditional distribution, conditional expectation
I
independent and identically distributed = iid
I
convergence in probability →p
I
weak law of large numbers
I
convergence in distribution ⇒
I
central limit theorem
I
continuous mapping theorem, Slutsky’s theorem, delta method
E, variance Var
Reference: e.g. “Statistical Inference” by Casella and Berger
3 / 20
Basic Concepts in Estimation and Statistical Inference
I
Estimator
I
Unbiased
I
Consistent
I
Asymptotic Normality
I
Confidence Intervals
I
Hypothesis Testing
I
etc.
Literature:
I
e.g. Casella and Berger, Statistical Inference
I
many econometrics textbooks (e.g. Greene, Econometric
Analysis) have a mathematical appendix, covering matrix
algebra, statistical inference, etc.
4 / 20
Example: Inference on the Mean of a Sample
I
Example to illustrate terms and concepts in probability theory
and statistical inference.
I
Observe data (sample): y1 , y2 , . . . , yn
I
We model yi (i = 1, . . . , n) as random variables, and we think
of the observations as one concrete realization of these
random variables.
I
We assume that yi and yj are independent and identically
distributed if i 6= j. The distribution of yi is called population.
I
Question: from the sample what can we learn about the
population mean Eyi ?
5 / 20
Example: Inference on the Mean of a Sample (cont.)
I
e.g., we may be interested in the average height of students at
UCL. The distribution of student heights at UCL is the
population. We could sample all students at UCL to get E(yi )
exactly. However, given limited time and resources, we decide
to only take a random sample of n = 200 students and
measure their height. From this sample, what do we learn
about E(yi )?
6 / 20
Unbiasedness
I
I
I
I
Notation: yi = β + ui , where β = E(yi ) is the parameter of
interest and ui are iid random shocks with E(ui ) = 0.
P
Estimator (sample mean): β̂ = n1 ni=1 yi
(this is a random variable)
P
We have: β̂ = β + n1 ni=1 ui
Since we “assume” E(ui ) = 0 (you can either view this as an
assumption, or as a result of the definition β = E(yi )), we
find that the estimator is unbiased, i.e. Eβ̂ = β.
7 / 20
Consistency
I
I
Since we assume that yi are iid and (implicitly) that E(yi )
exists, we can conclude that β̂ is consistent, i.e. β̂ →p β as
n → ∞.
What does this mean? If samples get larger and larger
(n → ∞) we have a different β̂ = β̂n for each sample size, i.e.
a series of random variables indexed by n. Converge in
probability of β̂n to β is defined by
∀ > 0 :
lim P(|β̂n − β| < ) = 1
n→∞
i.e. as n becomes large “the probability that βn is arbitrary
close to β converges to one“. (in this definition β can be a
random variable, in our case it is just a number)
I
Why is this true? Weak law of large numbers (WLLN)
8 / 20
Weak Law of Large Numbers
Theorem (WLLN)
Let X1 , X2 , X3 , . . . be a sequence of iid random
P variables, and
assume that E|Xi | < ∞. Then we have n1 ni=1 Xi →p EXi .
Comment: Often it is assumed that
EXi2 < ∞ (because then the
E
proof is very easy, just apply Chebyshev inequality P(|Z | ≥ ) ≤ Z 2 /2 to
P
Z = n1 ni=1 Xi ), but the weaker condition |Xi | < ∞ is also
sufficient. We write
exists.
E
E|Xi | < ∞, which just means that E|Xi |
9 / 20
(Asymptotic) Normality of β̂
Finite Sample Normality:
I
Additional assumption: ui ∼ N (0, σ 2 )
I
Then: β̂ ∼ N (β, σ 2 /n)
(show!)
Asymptotic Normality:
E(ui2) = σ2.
I
Additional assumption:
I
Then: Var(β̂n ) = n1 σ 2
I
“Natural” rescaling of β̂:
I
Central Limit Theorem (CLT) implies that as n → ∞ we have
√
I
(show!)
√
n(β̂ − β)
n(β̂ − β) ⇒ N (0, σ 2 ).
Where “⇒” refers to confergence in distribution.
10 / 20
Convergence in Distribution
I
Denote by FX the cumulative distribution function (cdf) of
the random X , i.e. FX (x) ≡ P(X ≤ x).
I
Def: A sequence of random variables X1 , X2 , . . . converges in
distribution to the random variable X (we write Xn ⇒ X ) if
lim FXn (x) = FX (x)
n→∞
for all points x where FX (x) is continuous.
I
We write Xn ⇒ X , or more often Xn ⇒ DX , where DX is the
distribution of X , e.g. DX = N (µ, σ 2 ).
I
Theorem: if Xn →p X , then also Xn ⇒ X .
11 / 20
Central Limit Theorem
Lindeberg-Lévy CLT
Let X1 , X2 , X3 , . . . be a sequence of iid random
variables, with
1 Pn
2
EXi = µ and VarXi = σ < ∞. Let X n = n i=1 Xi . Then:
√
n(X n − µ) ⇒ N (0, σ 2 )
I
Later in the lecture we may need other versions of the CLT
that hold under weaker assumptions.
12 / 20
Asymptotic Variance of an Estimator
√
n(β̂ − β) ⇒ N (0, σ 2 ) as n → ∞.
I
We saw that
I
Thus, for the asymptotic variance of β̂ we have
√ AVar
n β̂ = σ 2 .
I
I
√
More generally, whenever n(β̂ − β) ⇒ X for some random
variable/vector X we write
√ n β̂ = Var(X ).
AVar
( There is a subtlety here: SOMETIMES we have
AVar
√ √ n β̂ = lim Var
n β̂ = lim nVar(β̂),
but this need not be true.
n→∞
n→∞
)
13 / 20
Confidence Intervals
I
I
We have a consistent estimator β̂, but for finite sample size n
we want to provide a measure how close β̂ is to the true mean
β = Eyi .
√
Since n(β̂ − β) ⇒ N (0, σ 2 ), we know that
√
lim P( n|β̂ − β| ≤ 1.96σ) = 0.95
n→∞
I
Thus, for sufficiently large n we have
1.96σ
1.96σ
P β ∈ β̂ − √ , β̂ + √
≈ 0.95
n
n
I
Need an estimator σ̂ that satisfies σ̂ →p σ as n → ∞. Then:
1.96σ̂
1.96σ̂
lim P β ∈ β̂ − √ , β̂ + √
= 0.95
n→∞
n
n
|
{z
}
95 % confidence interval for β
14 / 20
Confidence Intervals (cont.)
I
Estimator for σ:
n
σ̂ 2 =
1 X
(yi − β̂)2
n−1
i=1
I
Show:
Eσ̂2 = σ2 (unbiased).
(useful
Pn trick: 2
Pn
Pn
2
2
2
1
1
1
i=1 (yi − β) = n
i=1 (yi − β̂ + β̂ − β) = n
i=1 (yi − β̂) + (β̂ − β) ,
n
which you could also write in a more familiar form:
P
u 2 − ( u)2 = (u − u)2 , with = n1 i , and “u = ui ”)
Ê
Ê
Ê
Ê
Ê
I
Using the factor 1/(n − 1) instead of 1/n is often called the
“degree of freedom correction”.
I
Show: σ̂ →p σ as n → ∞ (consistent).
(Useful to know the “continuous mapping theorem” to show
this — see below)
15 / 20
Hypothesis Testing
I
Sometimes one is not interested in estimating β = Eyi , but in
testing whether β has a particular value. For example:
I
I
I
Null hypothesis H0 : β = r
Alternative hypothesis Ha : β 6= r
The so called t-test statistics for testing H0 reads
√
t=
n(β̂ − r )
σ̂
I
Under H0 as n → ∞ we have t ⇒ N (0, 1).
I
Therefore we reject H0 at 95% confidence level if |t| > 1.96.
16 / 20
Continuous Mapping Theorem
Continuous Mapping Theorem
Let g : Rk → R` be a continuous function, and let X1 , X2 , . . . be a
sequence of random k-vectors.
If Xn →p X as n → ∞, then g (Xn ) →p g (X ).
If Xn ⇒ X as n → ∞, then g (Xn ) ⇒ g (X ).
A direct application is that if Un →p U and Vn →p V , then
Un + Vn →p U + V , an analogously for other operations (−, ×, /).
17 / 20
Slutsky’s Theorem
Another application of the continuous mapping theorem:
Slutsky’s Theorem
Let U1 , U2 , . . . and V1 , V2 , . . . be sequences of random variables,
vectors or matrices (of appropriate dimension). As n → ∞ let
Un →p U, where U is a non-random constant scalar, vector or
matrix, and Vn ⇒ V , for some random variable, vector, matrix V .
Then
I
Un + Vn ⇒ U + V
I
Un Vn ⇒ UV ,
I
if also P(det(Un ) = 0) = 0, then Un−1 Vn ⇒ U −1 V .
√
Using this we can e.g. show that nσ̂ −1 (β̂ − β) ⇒ N (0, 1) (we
used that implicitly before when deriving confidence interval).
18 / 20
Delta Method
Delta Method
Let X1 , X2 , X3 ,. . . be a sequence of random k-vectors (e.g. β̂) and
√
X be a constant k-vector such that n(Xn − X ) ⇒ N (0, Σ) for
some k × k matrix Σ. Let g : Rk → R` be continuously
differentiable at X . Then
√
n(g (Xn ) − g (X )) ⇒ N (0, G ΣG 0 ),
where G =
I
∂g (x) ∂x 0 x=X ,
i.e. G is the ` × k Jacobian matrix.
√
For example: we know n(β̂ − β) ⇒ N (0, σ 2 ). Using the
√
delta method we can show that n(β̂ 2 − β 2 ) ⇒ N (0, 4β 2 σ 2 )
19 / 20
Econometrics
I
The course assumes knowledge of matrix algebra, probability
theory and basic asymptotic theory, as summarized above.
I
The first half of the course covers linear models:
OLS estimation, Instrumental Variables, Hypothesis Testing.
I
The second half of the course covers Maximum Likelihood
Estimation, General Methods of Moments, and some basic
Time Series Methods.
I
There is no required textbook for this course, but we often
follow the presentation in Wooldridge, “Econometric
Analysis of Cross Section and Panel Data”. Most
graduate econometrics textbooks cover very similar material,
and you can choose your favourite one to study from.
20 / 20
Download