Θ Θ pΘ(θ) Θ pΘ(θ) Θ Θ Θ Θ pΘ(θ) N pΘ(θ) N N Θ pΘ(θ) Θ Θ pX|Θ(x | θ) Types of Inference models/approaches pΘ(θ)pΘ(θ) Sample Applications N (x | θ) pX|Θ • Model building versus inferring unkno wn (x | θ) p pΘ(θ) X|Θ N pΘ(θ) pΘ(θ) • Readings: Sections• 8.1-8.2 X Polling variables. E.g., assume X = aS + W N N (x | θ) p X – ModelNbuilding: X X|Θ – Design of experiments/sampling N pX|Θ(x | θ) methodologies Θ̂ N “It is the mark of truly educated people | θ)(x | know pX|Θ(x θ) p “signal” S, observe X, infer a X|Θ Lancet study on Iraq death toll ΘΘ X to be deeply moved by–statistics.” Θ̂ (x | the θ)Θ̂ presence pX|Θ – Estimation in of noise: X (x | θ) p Θ Estimator X|Θ pX (x;Xθ) pX|Θ(x | θ) (Oscar Wilde) X • Medical/pharmaceutical trials a, observe X, pknow Θ Θ̂ estimate S. Θ p(θ) Estimator Θ (θ) Estimator X Θ̂ X pΘ(θ) • Hypothesis testing: unknown takes one of • Data mining X Θ̂ Θ̂ Reality Model N N possible values; Θ Estimator pΘ(θ) few at small (e.g., customer arrivals) Poisson) Θ ∈ {0, 1} X =Θ+W W ∼ aim fW (w) –(e.g.,Netflix competition Estimator Θ̂ Θ̂ N probability of incorrect decision Estimator Θ̂ Estimator (x(x | θ) pX|Θ Θ pΘ(θ) N | θ) pX|Θ Finance Θ Estimator Data • at a small error ∈ {0, 1} Estimator X = aim Θ+W W ∼ fW (w) • ΘEstimation: | θ) pX|Θ(xestimation LECTURE 21 Estimator • Design &Θ̂interpretation of experiments Θ̂ – polling, trials. W ∼ .f.W (w) Matrix Completion pmedical/pharmaceutical Y |X (y | x) Θ̂ • Netflix competition • Finance | ·) Y |X (·to ! Partially observed matrix:pgoal predict the X̂ sensors unobserved entriesΘ̂ fΘ(θ) pY |X (y | x) objects X̂ θ N 2 1 pY 4|X (y |5x) X̂ 5 4 ? 1 3 3 5 2 Y θ measurement 4 ? 5 3 ? pY (y ; θ) 4 1X̂ 3 5 θ 2 1? 4 1 3 3 4 2? 1 1 2 3 5 2 3 1 3 ?5 θ1 1 51 5 ? 5 3? 5 2 3 4 2 3 Θ̂ 5 pX (x) X̂ Estimator X 10 θ Estimator pX|Θ pX (x; (x θ) | θ) Θ pX|Θ(x | θ) Θ̂ X Θ̂ due to copyright restrictions. Θ̂ pY |X (· | ·) N pX (x) 2 ? X4 4 1 5 4 5pY |X (y | x) 4 5? 4 Θ̂ pY (y ; θ) p1Y (y ; θ) 1/6 Estimator Graph of S&P 500 index removed 4 1 W ∼ fW Θ 1} ∈ {0, 1}X = Θ X+ =W Θ+W ∼ fW (w)Θ ∈ {0, W (w) N pΘ(θ) XX pΘ(θ) • Classical statistics: X Θ ∈ {0, 1} X Θ̂ =Θ̂Θ + W pX|Θ(x |Θθ) N N Θ̂ θ: unknownXparameter a r.v.) (θ) pΘ(not Θ Θ pX|Θ(x | θ) Estimator ◦ E.g., θ = mass of electron Θ̂ p (θ) Estimator N N Estimator Θ X ∈ {0, 1} Θ W ∼ fW (w) • Signal processing W ∼ fW (w) fΘ(θ) θ – Tracking, detection, speaker identification,. . . Y =X +W pY (y ; θ) X Θ Θ̂ pΘ(θ) Estimator N N pX|Θ(x | θ) pX|Θ(x | θ) Θ̂ X X Θ̂ Θ pΘ(θ) • Bayesian: Use priors &XBayes rule Estimator X̂ Y pΘ(θ) Estimator N pX|Θ(x | θ) pΘp(θ) (θ) {0, 1}| θ) p∈(x |(x θ) pΘ X|ΘX|Θ Θ 1/6 X X4 NN 10 X = Estimator Θ+W Θ̂ Estimator θ Estimator Θ̂ Θ̂ (x(x | θ) pX|Θ | θ) pX|Θ Estimator Estimator XX Θ̂Θ̂ Estimator Estimator Bayesian inference: Use Bayes rule Estimation with discrete data • Hypothesis testing – discrete data pΘ|X (θ | x) = fΘ|X (θ | x) = pΘ(θ) pX |Θ(x | θ) pX (x) pX (x) = – continuous data pΘ|X (θ | x) = pΘ(θ) fX |Θ(x | θ) ! fΘ(θ) pX |Θ(x | θ) pX (x) fΘ(θ)pX |Θ(x | θ) dθ • Example: fX (x) – Coin with unknown parameter θ – Observe X heads in n tosses • Estimation; continuous data fΘ|X (θ | x) = • What is the Bayesian approach? fΘ(θ) fX |Θ(x | θ) – Want to find fΘ|X (θ | x) fX (x) – Assume a prior on Θ (e.g., uniform) Zt = Θ0 + tΘ1 + t2Θ2 Xt = Zt + Wt, t = 1, 2, . . . , n Bayes rule gives: fΘ0,Θ1,Θ2|X1,...,Xn (θ0, θ1, θ2 | x1, . . . , xn) 1 1 θ) pX|Θ(x | W (w) pX (x) = f ( ) pf W X | (x | ) d (x | θ) p X|Θ W pW ( w ) pX (x) pΘ(θ) X Y = X + W + N | θ) pN N (x X|Θ 0 , p1Y} ( y ; ) Y = X + W EXxaX m{ ple: N | θ) (x pX|Θ X (x | θ|) θ) Θ̂ pXp|(x object at unknown location X X|Θ Θ Θ W Θ̂p W ( w ) sensors Least Mean Squares Estimation pX|Θ(x | θ) Estimator pΘ(θ) XX X Θ̂ Y = X + W Estimator in the absence of information • Estimation X N Θ+W Θ ∈ {0, 1} X= W ∼ fW (w) Θ̂ Θ̂ Θ̂ Estimator Θ ∈ {0, 1} X =Θ+W W ∼ fW (w) pX|Θ(x | θ) Θ̂ θ p X | (f1Θ = 1}1/6 X =4Θ + W10 Θ| (θ) Estimator ∈) {0, W ∼ ft W E sEstimator i m(w) a t or f=ΘP(θ) (se nsor i1/6 “ se nses ” t4 h e o b j e c10 t | = ) θX Estimator ∈ {0, = XΘ + W ∼W fW∼ 4t aΘ θnsor fW (w ) = hΘ Θ ∈n c∈ {e1} 0,o{0, 1 } 1} == Θ ++ WW W ∼ f(w) ( dis f10 fr o X m X se i )Θ W (w) Θ (θ) Wf1/6 NpΘp(θ) Θ)(θ) Θp(θ Output of Bayesian Inference • Posterior distribution: – pmf pΘ|X (· | x) or pdf fΘ|X (· | x) Θ̂ W • If interested in a single answer: – Maximum a posteriori probability (MAP): ◦ fW (w) Θ {0, 1} X =Θ+W 4 4 4 101 010 θ θ θ fΘf(θ) 1 /1/6 6 Θ)(θ) 1/6 Θf(θ fΘ(θ) pΘ|X (θ∗ | x) = maxθ pΘ|X (θ | x) # • Optimal estimate: c = E[Θ] ◦ fΘ|X (θ∗ | x) = maxθ fΘ|X (θ | x) • Optimal mean squared error: – Conditional expectation: E[Θ | X = y] = " minimize E (Θ − c)2 minimizes probability of error; often used in hypothesis testing ! 1/6 4 10 θ • find estimate c, to: Estimator " # E (Θ − E[Θ])2 = Var(Θ) θfΘ|X (θ | x) dθ – Single answers can be misleading! LMS Estimation of Θ based on X LMS Estimation w. several measurements • Two r.v.’s Θ, X • Unknown r.v. Θ • we observe that X = x • Observe values of r.v.’s X1, . . . , Xn – new universe: condition on X = x • E " # (Θ − c)2 | X = x is minimized by " (Θ − E[Θ | X = x])2 | X = x • Best estimator: E[Θ | X1, . . . , Xn] • Can be hard to compute/implement c= • E – involves multi-dimensional integrals, etc. # ≤ E[(Θ − g(x))2 | X = x] " # " ◦ E (Θ − E[Θ | X])2 | X ≤ E (Θ − g(X))2 | X " # " ◦ E (Θ − E[Θ | X ])2 ≤ E (Θ − g(X))2 " # # E[Θ | X] minimizes E (Θ − g (X))2 over all estimators g(·) # 2 MIT OpenCourseWare http://ocw.mit.edu 6.041SC Probabilistic Systems Analysis and Applied Probability Fall 2013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.