1 Some Basic Probability Theory and Calculus

advertisement
1
Some Basic Probability Theory and Calculus
Marginals, conditionals
Suppose (X, Y ) ∼ fXY (x, y) =⇒ marginals are fX (x) =
(x,y)
Conditionals are fX|Y (x | y) = fXY
fY (y)
´
fXY (x, y) dy and fY (y) =
´
fXY (x, y) dx.
Inverse function theorem:
d −1
1
X ∼ fX (x) and Y = g (X) . Then if X ∈ R =⇒ fY (y) = fX g −1 (y) dy
g (y) = fX g −1 (y) g0 (g−1
(y)) .
−1
If X ∈ Rn =⇒ fY (y) = fX g −1 (y) det Jg−1 (y) = fX g −1 (y) det Jg g −1 (y ) Moment generating function: X ∼ fX (x) =⇒ MX (t) = E (exp (t0 x)) .
Properties:
1. If Y = AX + b =⇒ MY (t) = exp (t0 b) MX (A0 t)
2. If X, Y independent =⇒ MX+Y (t) = MX (t) MY (t)
(r)
0
00
3. MX
(0) = E (X) , MX
(0) = E X 2 , ..., MX (0) = E (X r )
4. If {Xn } is a sequence of r.v and MXn (t) → MX (t) =⇒ Xn →d X
Order Statistics:
X1 , ...., Xn ∼i.i.d fX (x) with cdf FX (x) . And X(k) the k − th order statistic (from smaller to
bigger). Then
n!
k−1
n−k
fX(k) (x) =
FX (x)
(1 − FX (x))
f (x)
(k − 1)! (n − k)!
n−1
Special cases: X(1) =⇒ fX(1) (x) = n (1 − FX (x))
fX (x) and for X(n) =⇒ fX(n) (x) =
n−1
n (FX (x))
fX (x)
k
k
Pi=k
f (x)
0)
Taylor Expansion: f (x) − f (x0 ) ' i=1 f (k) (x0 ) (x−x
with f (k) (x0 ) = ∂ ∂x
k . If f is multik!
T
T
1
variate and expand up to k = 2 =⇒ f (x) − f (x0 ) ' Jf (x0 ) (x − x0 ) + 2 (x − x0 ) Hf (x0 ) (x − x0 )
with Jf the Jacobian and Hf the hessian
Properties of exponential
P∞ n
• ex = n=0 xn!
• If an → a =⇒ limn→∞ 1 +
an n
n
= ea
1
2
Most important Distributions
Distribution
Support
Bernoulli (p)
{0, 1}
pdf
px (1 − p)
n
x
(1−x)
px (1 − p)
n−x
cdf
E (X)
V (X)
MX (t)
−
p
p (1 − p)
pet + 1 − p
−
np
np (1 − p)
(pet + 1 − p)
exp (λ (et − 1))
B (n, p)
{0, 1, ..., n}
Poisson (λ)
N
e−λ λx!
−
λ
λ
Geometric (p)
N
p(1 − p)(k−1)
1 − (1 − p)k
1
p
1−p
p2
exp (λ)
R+
1
1 −λ
x
λe
1 − e− λ x
λ
λ2
(1 − tλ)
Γ (α, β)
R+
x
1
α−1 − β
e
Γ(α)β α x
if x > 0
−
αβ
αβ 2
(1 − βt)
χ2 (k)
R+
k
x 2 −1 e− 2 if x > 0
−
k
2k
(1 − 2t)
√ 1
2π σ 2
2
exp − 12 (x−µ)
σ2
−
µ
σ2
N µ, σ 2
x
N (µ, Σ)
R
Rn
t (p)
R
N
2
2.1
R+
B (a, β)
[0, 1]
x
0
1
−1
e(− 2 (x−µ) Σ
1
1
(2π)
(x−µ))
1
p+1
1
Γ( p
2 ) (pπ) 2 (1+x2 1 ) 2
p
d
(d1 x)d1 d2 2
(d1 x+d2 )d1 +d2
B(
d1
2
,
d2
2
)
1
α−1
B(α,β ) x
with x > 0
(1 − x)
β −1
µ
Σ
−
0
p
p−2
−
−
d2
d2 −2
2d22 (d1 +d2 −2)
d1 (d2 −2)2 (d2 −4)
@
−
α
α+β
αβ
(α+β)2 (α+β+1)
−
Binomial : If X ∼ B (n, p), Y ∼ B (m, p) and X indep. of Y =⇒ X + Y ∼ B (m + n, p)
Pi=n
i=1
Exponential
• If X ∼ exp (λ) =⇒ X ∼ Γ (1, λ)
• If Xi ∼ exp (λ) independent =⇒
Pi=n
i=1
Xi ∼ Γ (n, λ)
• if X ∼ exp (λ) and α > 0 =⇒ αX ∼ exp (αλ)
Geometric
2
−1
−
Some Special Properties of Distributions
Poisson: If Xi ∼ Poisson (λi ) independent, then
Xi ∼ Poisson
for t<-ln(1-p)
−α
−k
2
exp µt + 21 σ 2 t2
exp µ0 t + 12 t0 Σt
det(Σ) 2
Γ( p+1
2 )
s
F (d1 , d2 )
k
1
2 2 Γ( k
2)
1
if x > 0
pet
1−(1−p)et
n
P
i=n
i=1
λi
• If we conduct an infinite number of iid bernoulli trials (indexed 1,2,3,...) all with sucess probability p, the number of the trial in which we observe the first sucess has a geometric distribution
with parameter p.
Chi-Squared
• If Yi ∼ χk2i
Pi=n
Xi2 = χ2 (n)
P
Pi=n
i=n
independent, then i=1 Yi = χ2
k
i=1 i
• If Xi ∼ N (0, 1) independent, then
• If X ∼ χ2n =⇒ X ∼ Γ
i=1
n
2,2
Gamma
• Gamma function: Γ (α) =
´∞
0
tα−1 e−t dt. It satisfies:
1. Γ (α + 1) = αΓ (α)
2. Γ (n) = n! if n ∈ N
√
3. Γ 12 = π
• If Xi ∼ Γ (αi , θ) indep. =⇒
P
P
Xi ∼ Γ ( αi , θ)
• If X ∼ Γ (α, θ) and φ > 0 =⇒ φX ∼ Γ (α, φθ)
Normal Distribution
P
P
P 2
• If Xi ∼ N µi , σi2 independent, then i Xi ∼ N
i µi ,
i σi
• If X ∼ N µ, σ 2 and a, b ∈ R =⇒ aX + b ∼ N aµ + b, a2 σ 2
• If X ∼ N (µ, Σ) ∈ Rn =⇒ marginals Xi ∼ N (µi , Σii )
• If X ∼ N (µ, Σ) =⇒ any subvector Xk is multivariate normal
• If X ∼ N (µ, Σ) =⇒ AX + b ∼ N (Aµ + b, AΣA0 )
t-student
• Defined as t = √N2(0,1)
χ (n)/n
∼ t (n)
• As n → ∞ =⇒ t (n) → N (0, 1)
F-distribution (Snedecor)
• Defined as F (d1 , d2 ) =
χ2 (d1 )/d1
χ2 (d2 )/d2
Beta ( B(α, β) )
• Beta function = B (α, β) ≡
´1
0
uα−1 (1 − u)
• If X ∼ Γ (αX , θ) and Y ∼ Γ (αY , θ) =⇒
• If X ∼ U [0, 1] =⇒ X 2 ∼ B 12 , 1
β −1
X
X+Y
3
∼ B (αX , αY )
3
Probability Limits
Almost sure convergence: Xn →a.s X almost surely ⇐⇒ Pr (ω : Xn (ω) → X (ω)) = 1
Convergence in Probability: p limn→∞ Xn = X ⇐⇒ for all ε > 0, limn→∞ Pr (|Xn − X| < ε) =
1
Convergence in Distribution: Xn →d X ⇐⇒ limn→∞ FXn (x) = FX (x) for x continuity point
of FX
2
Convergence in Quadratic Mean: Xn →c.m X ⇐⇒ E (Xn − X) → 0
Implications: Almost sure convergence =⇒ Convergence in Probability =⇒ Convergence in
Distribution
Convergence in Quadratic mean =⇒ Convergence in Probability
Slutsky’s Theorem: Let {Xn } , {Yn } be seq. of r.v
1. If Xn →p X, Yn →p Y =⇒ Xn Yn →p XY
2. If Xn →p X, Yn →p Y =⇒ Xn + Yn →p X + Y
3. If Xn →p X and g (x) is a continuous function =⇒ g (Xn ) →p g (X)
4. If Xn →a.s X and g (x) is a continuous function =⇒ g (Xn ) →a.s g (X)
5. If Xn →d X and g (x) is a continuous function =⇒ g (Xn ) →d g (X)
6. If Xn →p X and Yn →d Y =⇒ Xn Yn →d XY
Markov’s Inequality: Let X be a random variable and g (X) a nonnegative function. Then for all
r > 0 =⇒ Pr (g (X) ≥ r) ≤ 1r E (g (X))
Laws of Large Numbers: Let {Xn } be an i.i.d random sequence with E (X) = µ < ∞ and
V (X) = σ 2 < ∞. Then
i=n
Xn
=
1X
Xi →p µ (weak law of large numbers) and
n i=1
Xn
=
1X
Xi →a.s µ (strong law of large numbers)
n i=1
i=n
Central Limit Theorem: Let {Xn } be an i.i.d random sequence with E (X) = µ < ∞ and
V (X) = Σ. Then
√
n X n − µ →d N (0, Σ)
Delta method:
• Univariate: Suppose
√ b
n θn − θ →d N 0, σ 2 and g (x) is a differentiable function. Then
√ b 2
n g θn − g (θ) →d N 0, (g 0 (θ)) σ 2
2
00
2. If g 0 (θ) = 0 and g ∈ C 2 =⇒ n g θbn − g (θ) →d σ 2 g 2(µ) χ21
1. If g 0 (θ) 6= 0 =⇒
4
• Multivariate: Let θbn ∈ Rk be a sequence of r.v. and g : Rk → Rm a differentiable function at
x = θ.
√ √ 0
Then if n θbn − θ →d N (0, Σ) and Jg (µ) 6= 0 =⇒ n g θbn − g (θ) →d N 0, Jg (µ) ΣJg (µ)
with Jg (µ) the Jacobian of g at µ
4
Basic Properties of Statistics
Unbiasedness: θb (X) is unbiased ⇐⇒ Eθ θb (X) = θ
Consistency: θb (X) is consistent ⇐⇒ θb (X) →p θ
Consistency in MSE (mean squared error): θb (X) is consistent in MSE ⇐⇒ M SE θb ≡
2
2
E θb (X) − θ ≡ V θb (X) + E θb (X) − θ
→ 0 as n → ∞
Properties:
1. θb (X) is consistent in MSE ⇐⇒ E θb (X) → θ and V θb (X) → 0.
2. If θb (X) is consistent in MSE =⇒ is consistent.
3. X is consistent in MSE for E (X)
√ Asymptotic Normality: θb (X) is assympt. normal ⇐⇒ n θb (X) − θ →d N 0, Vθb .
Cramer-Rao Inequality: Let W (X) be an unbiased estimator for θ that satisfies
´
∂
X ∂θ
(W (X)) =
(W (x) f (x | θ)) dx. and has finite variance for all θ. Then
Vθ (W (X)) ≥
−1
With Jn (θ)
4.1
d
dθ Eθ
1
nEθ
∂
∂θ
1
2 = J (θ)
n
ln f (X | θ)
being the Cramer-Rao bound
Properties of Mean and Variance
Pi=n
2
• X = n1 i=1 Xi and SX
=
respectively
1
n−1
Pi=n
i=1
Xi − X
• Both are asymptotically normal
• If Xi ∼ N µ, σ 2 then
2
1. X ∼ N µ, σn
2.
(n−1) 2
σ 2 SX
∼ χ2n−1
2
3. X and SX
are independent
4. √X−µ
∼ t (n − 1)
2
SX /n
5
2
are unbiased estimators of E (X) and V (X)
4.2
Method of Moments Estimator
Let Xi ∼i.i.d f (X | θ) with θ ∈ RK
• Calculate E (X r ) = fr (θ) for all r = 1, 2, ..., K
• Replace E (X r ) with X r (i.e. X r = fr (θ))
• Solve for θ the system of R equations in R unknowns.
−1
If f −1 = (f1 (θ) , f2 (θ) , ..., fK (θ))
exists and is continuous, then θbM M is consistent. If f −1 is
differentiable, then because of delta method it is asymptotically normal
5
Properties of Maximum Likelihood Estimation
Invariance: If θbM LE is the M LE of θ, then τ θbM LE is the M LE of τ (θ) for any function τ
Regularity Conditions
1. X1 , X2 , ..., Xn i.i.d with Xi ∼ f (x | θ)
2. Identifiably: If θ =
6 θ0 =⇒ f (x | θ) 6= f (x | θ0 )
3. f (x | θ) has support that does not depend on θ
4. True parameter θ0 is interior to Θ
5.
∂ 3 f (x|θ)
∂θ 3
(θ) exists, is continuous, and satisfies that for all θ0
6. There exist a function Mθ0 (x) such that Eθ (Mθ0 (X))
< ∞ and cθ0 such that for all x and for
∂ 3 ln f (x|θ)
all θ ∈ (θ0 − cθ0 , θ0 + cθ0 ) we have that ∂θ3
(θ) ≤ Mθ0 (x)
Information equality: Under regularity conditions, Eθ
I1 (θ) , so Jn (θ) = nI1 (θ)
Asymptotic Properties of MLE
∂
∂θ
2
2 |θ)
ln f (X | θ)
= −E ∂ ln(X
≡
2
∂θ
1. θbM LE is consistent for θ
2. Asymptotic Normality:
√ b
−1
n θM LE − θ →d N 0, I1 (θ)
3. Asymptotic Efficiency: Asymptotically attains cramer-rao bound
6
6
Testing
Type 1 Error: reject H0 | H0 true
Type 2 Error: not reject H0 | H0 false
Power of a test: Pr (reject H0 | H0 false) .
Power function: P (θ) = Pr (reject H0 | θ)
Neyman-Pearson Lemma: Consider test H0 : θ = θ0 vs. H1 : θ = θ1 . Then there exist a UMP
test, with rejection region
reject ⇐⇒
f (X | θ1 )
>k
f (X | θ0 )
Mean Test:
1. H0 : µ = µ0 vs. µ > µ0 . Use test statistic
√
t= n
X − µ0
SX
and reject ⇐⇒ t > K. If X ∼ N µ, σ 2 =⇒ reject if t > t1−α (n − 1) being the 1 − α quantile
of t−student. If X is not normal, then reject if t > z1−α , with z quantile of N (0, 1)
2. H0 : µ = µ0 vs. µ 6= µ0 use same statistic, and If X ∼ N µ, σ 2 then reject if t ∈
/ −t1− α2 (n − 1) , t1− α2 (n − 1)
. If not normal, reject if t ∈
/ −z1− α2 , z1− α2
Variance test: Suppose X ∼ N µ, σ 2 and want to test H0 : σ 2 = σ02 vs H1 : σ 2 < σ02 . Use test
statistic
(n − 1) 2
χ2 =
SX ∼ χ2n−1 under null.
σ02
2
2
2
2
2
2
and reject if χ
<2 χn−1 (1 α−α) . If test H0 : σ = σ0 vs H1 : σ 6= σ0 use same statistic and reject
α
2
2
if χ ∈
/ χn−1 2 , χn−1 1 − 2
Some critical values:
z0.9
z0.95
= 1.2816
= 1.6449
z0.975
z0.99
= 1.9600
= 2.3263
z0.995
=
2.5758
and for χ21 :
χ21 (0.95)
χ21 (0.975)
= 3.8415
= 5.0239
χ21 (0.99) = 6.6349
χ21 (0.05) = 0.0039
7
χ21 (0.01)
=
0.00015
6.1
MLE Tests
Want to test H0 : θ = θ0 vs. θ =
6 θ0
Wald test:
• Univariate:
2 √ θbM LE −θ0
∼ N (0, 1) . Do a test like mean test. Or equivalently n θbM LE − θ0 I1 θbM LE '
n √ −1
I1 (θ0 )
χ21
0 • Multivariate: n θbM LE − θ0 I1 θb θbM LE − θ0 ∼ χ2p with p = # of parameters
LR Test
• Let λ (X) =
L(θ0 )
L(θbM LE )
=
f (X|θ0 )
maxθ∈Θ f (X|θ) .
Then −2 ln (λ (X)) ' χ21 and rejection region is −2 ln (λ (X)) >
χ21 (1 − α) .
• In general, if we test H0 : θ ∈ Θ0 vs H1 : θ ∈
/ Θ0 use LR λ (X) =
maxθ∈Θ0 f (X|θ0 )
maxθ∈Θ f (X|θ0 )
=
L(θbRESTRICTED )
L(θbM LE )
and then −2 ln (λ (X)) ' χ2p with p = # restrictions.
LM test (score test)
• Univariate:
√1
n
• Multivariate:
√s(θ0 ) ' N (0, 1) and do mean test. Equivalently
I1 (θ0 )
T
1
n s (θ0 )
−1
I1 (θ0 )
2 1
1
n s (θ0 ) I(θ0 )
s (θ0 ) ∼ χ2p with p = # of parameters
8
' χ21
MIT OpenCourseWare
http://ocw.mit.edu
14.381 Statistical Method in Economics
Fall 2013
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Download