1 Some Basic Probability Theory and Calculus Marginals, conditionals Suppose (X, Y ) ∼ fXY (x, y) =⇒ marginals are fX (x) = (x,y) Conditionals are fX|Y (x | y) = fXY fY (y) ´ fXY (x, y) dy and fY (y) = ´ fXY (x, y) dx. Inverse function theorem: d −1 1 X ∼ fX (x) and Y = g (X) . Then if X ∈ R =⇒ fY (y) = fX g −1 (y) dy g (y) = fX g −1 (y) g0 (g−1 (y)) . −1 If X ∈ Rn =⇒ fY (y) = fX g −1 (y) det Jg−1 (y) = fX g −1 (y) det Jg g −1 (y ) Moment generating function: X ∼ fX (x) =⇒ MX (t) = E (exp (t0 x)) . Properties: 1. If Y = AX + b =⇒ MY (t) = exp (t0 b) MX (A0 t) 2. If X, Y independent =⇒ MX+Y (t) = MX (t) MY (t) (r) 0 00 3. MX (0) = E (X) , MX (0) = E X 2 , ..., MX (0) = E (X r ) 4. If {Xn } is a sequence of r.v and MXn (t) → MX (t) =⇒ Xn →d X Order Statistics: X1 , ...., Xn ∼i.i.d fX (x) with cdf FX (x) . And X(k) the k − th order statistic (from smaller to bigger). Then n! k−1 n−k fX(k) (x) = FX (x) (1 − FX (x)) f (x) (k − 1)! (n − k)! n−1 Special cases: X(1) =⇒ fX(1) (x) = n (1 − FX (x)) fX (x) and for X(n) =⇒ fX(n) (x) = n−1 n (FX (x)) fX (x) k k Pi=k f (x) 0) Taylor Expansion: f (x) − f (x0 ) ' i=1 f (k) (x0 ) (x−x with f (k) (x0 ) = ∂ ∂x k . If f is multik! T T 1 variate and expand up to k = 2 =⇒ f (x) − f (x0 ) ' Jf (x0 ) (x − x0 ) + 2 (x − x0 ) Hf (x0 ) (x − x0 ) with Jf the Jacobian and Hf the hessian Properties of exponential P∞ n • ex = n=0 xn! • If an → a =⇒ limn→∞ 1 + an n n = ea 1 2 Most important Distributions Distribution Support Bernoulli (p) {0, 1} pdf px (1 − p) n x (1−x) px (1 − p) n−x cdf E (X) V (X) MX (t) − p p (1 − p) pet + 1 − p − np np (1 − p) (pet + 1 − p) exp (λ (et − 1)) B (n, p) {0, 1, ..., n} Poisson (λ) N e−λ λx! − λ λ Geometric (p) N p(1 − p)(k−1) 1 − (1 − p)k 1 p 1−p p2 exp (λ) R+ 1 1 −λ x λe 1 − e− λ x λ λ2 (1 − tλ) Γ (α, β) R+ x 1 α−1 − β e Γ(α)β α x if x > 0 − αβ αβ 2 (1 − βt) χ2 (k) R+ k x 2 −1 e− 2 if x > 0 − k 2k (1 − 2t) √ 1 2π σ 2 2 exp − 12 (x−µ) σ2 − µ σ2 N µ, σ 2 x N (µ, Σ) R Rn t (p) R N 2 2.1 R+ B (a, β) [0, 1] x 0 1 −1 e(− 2 (x−µ) Σ 1 1 (2π) (x−µ)) 1 p+1 1 Γ( p 2 ) (pπ) 2 (1+x2 1 ) 2 p d (d1 x)d1 d2 2 (d1 x+d2 )d1 +d2 B( d1 2 , d2 2 ) 1 α−1 B(α,β ) x with x > 0 (1 − x) β −1 µ Σ − 0 p p−2 − − d2 d2 −2 2d22 (d1 +d2 −2) d1 (d2 −2)2 (d2 −4) @ − α α+β αβ (α+β)2 (α+β+1) − Binomial : If X ∼ B (n, p), Y ∼ B (m, p) and X indep. of Y =⇒ X + Y ∼ B (m + n, p) Pi=n i=1 Exponential • If X ∼ exp (λ) =⇒ X ∼ Γ (1, λ) • If Xi ∼ exp (λ) independent =⇒ Pi=n i=1 Xi ∼ Γ (n, λ) • if X ∼ exp (λ) and α > 0 =⇒ αX ∼ exp (αλ) Geometric 2 −1 − Some Special Properties of Distributions Poisson: If Xi ∼ Poisson (λi ) independent, then Xi ∼ Poisson for t<-ln(1-p) −α −k 2 exp µt + 21 σ 2 t2 exp µ0 t + 12 t0 Σt det(Σ) 2 Γ( p+1 2 ) s F (d1 , d2 ) k 1 2 2 Γ( k 2) 1 if x > 0 pet 1−(1−p)et n P i=n i=1 λi • If we conduct an infinite number of iid bernoulli trials (indexed 1,2,3,...) all with sucess probability p, the number of the trial in which we observe the first sucess has a geometric distribution with parameter p. Chi-Squared • If Yi ∼ χk2i Pi=n Xi2 = χ2 (n) P Pi=n i=n independent, then i=1 Yi = χ2 k i=1 i • If Xi ∼ N (0, 1) independent, then • If X ∼ χ2n =⇒ X ∼ Γ i=1 n 2,2 Gamma • Gamma function: Γ (α) = ´∞ 0 tα−1 e−t dt. It satisfies: 1. Γ (α + 1) = αΓ (α) 2. Γ (n) = n! if n ∈ N √ 3. Γ 12 = π • If Xi ∼ Γ (αi , θ) indep. =⇒ P P Xi ∼ Γ ( αi , θ) • If X ∼ Γ (α, θ) and φ > 0 =⇒ φX ∼ Γ (α, φθ) Normal Distribution P P P 2 • If Xi ∼ N µi , σi2 independent, then i Xi ∼ N i µi , i σi • If X ∼ N µ, σ 2 and a, b ∈ R =⇒ aX + b ∼ N aµ + b, a2 σ 2 • If X ∼ N (µ, Σ) ∈ Rn =⇒ marginals Xi ∼ N (µi , Σii ) • If X ∼ N (µ, Σ) =⇒ any subvector Xk is multivariate normal • If X ∼ N (µ, Σ) =⇒ AX + b ∼ N (Aµ + b, AΣA0 ) t-student • Defined as t = √N2(0,1) χ (n)/n ∼ t (n) • As n → ∞ =⇒ t (n) → N (0, 1) F-distribution (Snedecor) • Defined as F (d1 , d2 ) = χ2 (d1 )/d1 χ2 (d2 )/d2 Beta ( B(α, β) ) • Beta function = B (α, β) ≡ ´1 0 uα−1 (1 − u) • If X ∼ Γ (αX , θ) and Y ∼ Γ (αY , θ) =⇒ • If X ∼ U [0, 1] =⇒ X 2 ∼ B 12 , 1 β −1 X X+Y 3 ∼ B (αX , αY ) 3 Probability Limits Almost sure convergence: Xn →a.s X almost surely ⇐⇒ Pr (ω : Xn (ω) → X (ω)) = 1 Convergence in Probability: p limn→∞ Xn = X ⇐⇒ for all ε > 0, limn→∞ Pr (|Xn − X| < ε) = 1 Convergence in Distribution: Xn →d X ⇐⇒ limn→∞ FXn (x) = FX (x) for x continuity point of FX 2 Convergence in Quadratic Mean: Xn →c.m X ⇐⇒ E (Xn − X) → 0 Implications: Almost sure convergence =⇒ Convergence in Probability =⇒ Convergence in Distribution Convergence in Quadratic mean =⇒ Convergence in Probability Slutsky’s Theorem: Let {Xn } , {Yn } be seq. of r.v 1. If Xn →p X, Yn →p Y =⇒ Xn Yn →p XY 2. If Xn →p X, Yn →p Y =⇒ Xn + Yn →p X + Y 3. If Xn →p X and g (x) is a continuous function =⇒ g (Xn ) →p g (X) 4. If Xn →a.s X and g (x) is a continuous function =⇒ g (Xn ) →a.s g (X) 5. If Xn →d X and g (x) is a continuous function =⇒ g (Xn ) →d g (X) 6. If Xn →p X and Yn →d Y =⇒ Xn Yn →d XY Markov’s Inequality: Let X be a random variable and g (X) a nonnegative function. Then for all r > 0 =⇒ Pr (g (X) ≥ r) ≤ 1r E (g (X)) Laws of Large Numbers: Let {Xn } be an i.i.d random sequence with E (X) = µ < ∞ and V (X) = σ 2 < ∞. Then i=n Xn = 1X Xi →p µ (weak law of large numbers) and n i=1 Xn = 1X Xi →a.s µ (strong law of large numbers) n i=1 i=n Central Limit Theorem: Let {Xn } be an i.i.d random sequence with E (X) = µ < ∞ and V (X) = Σ. Then √ n X n − µ →d N (0, Σ) Delta method: • Univariate: Suppose √ b n θn − θ →d N 0, σ 2 and g (x) is a differentiable function. Then √ b 2 n g θn − g (θ) →d N 0, (g 0 (θ)) σ 2 2 00 2. If g 0 (θ) = 0 and g ∈ C 2 =⇒ n g θbn − g (θ) →d σ 2 g 2(µ) χ21 1. If g 0 (θ) 6= 0 =⇒ 4 • Multivariate: Let θbn ∈ Rk be a sequence of r.v. and g : Rk → Rm a differentiable function at x = θ. √ √ 0 Then if n θbn − θ →d N (0, Σ) and Jg (µ) 6= 0 =⇒ n g θbn − g (θ) →d N 0, Jg (µ) ΣJg (µ) with Jg (µ) the Jacobian of g at µ 4 Basic Properties of Statistics Unbiasedness: θb (X) is unbiased ⇐⇒ Eθ θb (X) = θ Consistency: θb (X) is consistent ⇐⇒ θb (X) →p θ Consistency in MSE (mean squared error): θb (X) is consistent in MSE ⇐⇒ M SE θb ≡ 2 2 E θb (X) − θ ≡ V θb (X) + E θb (X) − θ → 0 as n → ∞ Properties: 1. θb (X) is consistent in MSE ⇐⇒ E θb (X) → θ and V θb (X) → 0. 2. If θb (X) is consistent in MSE =⇒ is consistent. 3. X is consistent in MSE for E (X) √ Asymptotic Normality: θb (X) is assympt. normal ⇐⇒ n θb (X) − θ →d N 0, Vθb . Cramer-Rao Inequality: Let W (X) be an unbiased estimator for θ that satisfies ´ ∂ X ∂θ (W (X)) = (W (x) f (x | θ)) dx. and has finite variance for all θ. Then Vθ (W (X)) ≥ −1 With Jn (θ) 4.1 d dθ Eθ 1 nEθ ∂ ∂θ 1 2 = J (θ) n ln f (X | θ) being the Cramer-Rao bound Properties of Mean and Variance Pi=n 2 • X = n1 i=1 Xi and SX = respectively 1 n−1 Pi=n i=1 Xi − X • Both are asymptotically normal • If Xi ∼ N µ, σ 2 then 2 1. X ∼ N µ, σn 2. (n−1) 2 σ 2 SX ∼ χ2n−1 2 3. X and SX are independent 4. √X−µ ∼ t (n − 1) 2 SX /n 5 2 are unbiased estimators of E (X) and V (X) 4.2 Method of Moments Estimator Let Xi ∼i.i.d f (X | θ) with θ ∈ RK • Calculate E (X r ) = fr (θ) for all r = 1, 2, ..., K • Replace E (X r ) with X r (i.e. X r = fr (θ)) • Solve for θ the system of R equations in R unknowns. −1 If f −1 = (f1 (θ) , f2 (θ) , ..., fK (θ)) exists and is continuous, then θbM M is consistent. If f −1 is differentiable, then because of delta method it is asymptotically normal 5 Properties of Maximum Likelihood Estimation Invariance: If θbM LE is the M LE of θ, then τ θbM LE is the M LE of τ (θ) for any function τ Regularity Conditions 1. X1 , X2 , ..., Xn i.i.d with Xi ∼ f (x | θ) 2. Identifiably: If θ = 6 θ0 =⇒ f (x | θ) 6= f (x | θ0 ) 3. f (x | θ) has support that does not depend on θ 4. True parameter θ0 is interior to Θ 5. ∂ 3 f (x|θ) ∂θ 3 (θ) exists, is continuous, and satisfies that for all θ0 6. There exist a function Mθ0 (x) such that Eθ (Mθ0 (X)) < ∞ and cθ0 such that for all x and for ∂ 3 ln f (x|θ) all θ ∈ (θ0 − cθ0 , θ0 + cθ0 ) we have that ∂θ3 (θ) ≤ Mθ0 (x) Information equality: Under regularity conditions, Eθ I1 (θ) , so Jn (θ) = nI1 (θ) Asymptotic Properties of MLE ∂ ∂θ 2 2 |θ) ln f (X | θ) = −E ∂ ln(X ≡ 2 ∂θ 1. θbM LE is consistent for θ 2. Asymptotic Normality: √ b −1 n θM LE − θ →d N 0, I1 (θ) 3. Asymptotic Efficiency: Asymptotically attains cramer-rao bound 6 6 Testing Type 1 Error: reject H0 | H0 true Type 2 Error: not reject H0 | H0 false Power of a test: Pr (reject H0 | H0 false) . Power function: P (θ) = Pr (reject H0 | θ) Neyman-Pearson Lemma: Consider test H0 : θ = θ0 vs. H1 : θ = θ1 . Then there exist a UMP test, with rejection region reject ⇐⇒ f (X | θ1 ) >k f (X | θ0 ) Mean Test: 1. H0 : µ = µ0 vs. µ > µ0 . Use test statistic √ t= n X − µ0 SX and reject ⇐⇒ t > K. If X ∼ N µ, σ 2 =⇒ reject if t > t1−α (n − 1) being the 1 − α quantile of t−student. If X is not normal, then reject if t > z1−α , with z quantile of N (0, 1) 2. H0 : µ = µ0 vs. µ 6= µ0 use same statistic, and If X ∼ N µ, σ 2 then reject if t ∈ / −t1− α2 (n − 1) , t1− α2 (n − 1) . If not normal, reject if t ∈ / −z1− α2 , z1− α2 Variance test: Suppose X ∼ N µ, σ 2 and want to test H0 : σ 2 = σ02 vs H1 : σ 2 < σ02 . Use test statistic (n − 1) 2 χ2 = SX ∼ χ2n−1 under null. σ02 2 2 2 2 2 2 and reject if χ <2 χn−1 (1 α−α) . If test H0 : σ = σ0 vs H1 : σ 6= σ0 use same statistic and reject α 2 2 if χ ∈ / χn−1 2 , χn−1 1 − 2 Some critical values: z0.9 z0.95 = 1.2816 = 1.6449 z0.975 z0.99 = 1.9600 = 2.3263 z0.995 = 2.5758 and for χ21 : χ21 (0.95) χ21 (0.975) = 3.8415 = 5.0239 χ21 (0.99) = 6.6349 χ21 (0.05) = 0.0039 7 χ21 (0.01) = 0.00015 6.1 MLE Tests Want to test H0 : θ = θ0 vs. θ = 6 θ0 Wald test: • Univariate: 2 √ θbM LE −θ0 ∼ N (0, 1) . Do a test like mean test. Or equivalently n θbM LE − θ0 I1 θbM LE ' n √ −1 I1 (θ0 ) χ21 0 • Multivariate: n θbM LE − θ0 I1 θb θbM LE − θ0 ∼ χ2p with p = # of parameters LR Test • Let λ (X) = L(θ0 ) L(θbM LE ) = f (X|θ0 ) maxθ∈Θ f (X|θ) . Then −2 ln (λ (X)) ' χ21 and rejection region is −2 ln (λ (X)) > χ21 (1 − α) . • In general, if we test H0 : θ ∈ Θ0 vs H1 : θ ∈ / Θ0 use LR λ (X) = maxθ∈Θ0 f (X|θ0 ) maxθ∈Θ f (X|θ0 ) = L(θbRESTRICTED ) L(θbM LE ) and then −2 ln (λ (X)) ' χ2p with p = # restrictions. LM test (score test) • Univariate: √1 n • Multivariate: √s(θ0 ) ' N (0, 1) and do mean test. Equivalently I1 (θ0 ) T 1 n s (θ0 ) −1 I1 (θ0 ) 2 1 1 n s (θ0 ) I(θ0 ) s (θ0 ) ∼ χ2p with p = # of parameters 8 ' χ21 MIT OpenCourseWare http://ocw.mit.edu 14.381 Statistical Method in Economics Fall 2013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.