X X Θ̂ Estimator pΘ(θ) X X Θ Estimator Θ̂ NN Θ Estimator pΘ(θ) Θ ∈ {0, 1} X =Θ+W W ∼ fW (w) Estimator Θ̂ Θ̂ N Estimator Estimator (x | θ) p Θ pΘ(θ) X|Θ N pX|Θ(x | θ) Θ Estimator Θ ∈ {0, 1} Estimator X =Θ+W W ∼ fW (w) pX|Θ(x | θ) (w) Θ ∈ {0, 1} X = Θ + W W ∼ fW (w) Θ ∈ {0, 1} X = Θ + W ∼ f W W N pX|Θ(x | θ) pΘ(θ) XX pΘ(θ) Classical statistics X Estimator W ∼ fW (w) • Readings: Section 9.1 fΘ(θ) (not responsible for t-based confidence intervals, in pp. 471-473) Θ Θ̂ Θ̂ LECTURE 23 pΘp(θ) (θ) Θ̂ Θ ∈ {0, 1} N 1/6 4 pX|Θ(x | θ) Θ̂ N Estimator X 10 θ Estimator pX|Θ pX (x; (x θ) | θ) pX|Θ(x | θ) X • Outline X Θ̂ =Θ̂Θ + W Estimator Θ̂ X • also for vectors X and θ: Estimator pX1,...,Xn (x Θ̂1 , . . . , xn ; θ1 , . . . , θm ) – Classical statistics Θ̂ – Maximum likelihood (ML) estimation Estimator X Θ̂ Estimator • These areEstimator NOT conditional probabilities; θ is NOT random – Estimating a sample mean – Confidence intervals (CIs) Θ ∈ {0, 1} W ∼– fW (w) mathematically: X=Θ +W many models, one for each possible value of θ – CIs using an estimated variance fΘ(θ) 1/6 4 10 θ • Problem types: – Hypothesis testing: H0 : θ = 1/2 versus H1 : θ = 3/4 – Composite hypotheses: � 1/2 H0 : θ = 1/2 versus H1 : θ = – Estimation: design an estimator Θ̂, to keep estimation error Θ̂ − θ small Maximum Likelihood Estimation Desirable properties of estimators (should hold FOR ALL θ !!!) • Model, with unknown parameter(s): X ∼ pX (x; θ) • Unbiased: E[Θ̂n] = θ • Pick θ that “makes data most likely” – exponential example, with n = 1: E[1/X1] = ∞ �= θ (biased) θ̂ML = arg max pX (x; θ) θ • Compare to Bayesian MAP estimation: • Consistent: Θ̂n → θ (in probability) θ̂MAP = arg max pΘ|X (θ | x) – exponential example: (X1 + · · · + Xn)/n → E[X] = 1/θ θ θ̂MAP = arg max θ pX |Θ(x|θ)pΘ(θ) – can use this to show that: Θ̂n = n/(X1 + · · · + Xn) → 1/E[X] = θ pX (x) • Example: X1, . . . , Xn: i.i.d., exponential(θ) max θ n � E[(Θ̂ − θ)2] = var(Θ̂ − θ) + (E[Θ̂ − θ])2 = var(Θ̂) + (bias)2 i=1 � max n log θ − θ θ • “Small” mean squared error (MSE) θe−θxi θ̂ML = n � xi i=1 n x1 + · · · + xn � Θ̂n = n X1 + · · · + X n 1 Estimate a mean Confidence intervals (CIs) • An estimate Θ̂n may not be informative enough • X1, . . . , Xn: i.i.d., mean θ, variance σ 2 Xi = θ + W i • An 1 − α confidence interval + is a (random) interval [Θ̂− n , Θ̂n ], Wi: i.i.d., mean, 0, variance σ 2 Θ̂n = sample mean = Mn = X1 + · · · + X n n ∀ θ – often α = 0.05, or 0.25, or 0.01 – interpretation is subtle Properties: • E[Θ̂n] = θ • CI in estimation of the mean Θ̂n = (X1 + · · · + Xn)/n – normal tables: Φ(1.96) = 1 − 0.05/2 (unbiased) • WLLN: Θ̂n → θ • + P(Θ̂− n ≤ θ ≤ Θ̂n ) ≥ 1 − α, s.t. (consistency) MSE: σ 2/n P � � |Θ̂n − θ| ≤ 1.96 ≈ 0.95 (CLT) √ σ/ n � • Sample mean often turns out to also be the ML estimate. E.g., if Xi ∼ N (θ, σ 2), i.i.d. � 1.96 σ 1.96 σ P Θ̂n− √ ≤ θ ≤ Θ̂n+ √ ≈ 0.95 n n More generally: let z be s.t. Φ(z) = 1 − α/2 � � zσ zσ P Θ̂n − √ ≤ θ ≤ Θ̂n + √ ≈1−α n n The case of unknown σ • Option 1: use upper bound on σ – if Xi Bernoulli: σ ≤ 1/2 • Option 2: use ad hoc estimate of σ – if Xi Bernoulli(θ): σ̂ = � Θ̂(1 − Θ̂) • Option 3: Use generic estimate of the variance – Start from σ 2 = E[(Xi − θ)2] 2= σ̂n n 1 � (Xi − θ)2 → σ 2 n i=1 (but do not know θ) Ŝn2 = n 1 � ˆ n )2 → σ 2 (Xi − Θ n − 1 i=1 (unbiased: E[Ŝn2] = σ 2) 2 MIT OpenCourseWare http://ocw.mit.edu 6.041SC Probabilistic Systems Analysis and Applied Probability Fall 2013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.