Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion A Short Introdution to Bayesian Inferene Fredrik Lingvall February 14, 2008 Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Outline 1 Statistial Inferene 2 Probabilities and rules for manipulating them 3 Marginalization 4 Parameter Estimation Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging 5 Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion The Sienti Method Predictions Observations (data) Testable Hypotheses (theory), model Hypothesis Testing Parameter Estimation Stati st ical (plausable) inference Statistial Inferene: a tool for, Assessing the plausibility of one or more ompeting models Estimate model parameters and their unertainties Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Statistial Inferene Engelskt uppslagsord: inferene Svensk översättning: slutsats, logisk följd utifran vissa förutsättningar Statistial inferene is the proess of inferring the truth of our theories of nature on the basis of inomplete information The available information is always inomplete knowledge is probabilisti. ⇒ our Normal (dedutive) logi: true-false [0,1℄. Bayesian (extended) logi uses the whole range from 0→1. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion The basi desiderata of Bayesian probability theory: Representation of degrees of plausibility with real numbers Quantitative orrespondene with ommon sense: 1 New information supporting the truth → the number representing the plausibility must inrease (ontinuously and monotonially). 2 Dedutive limit must be obtained when appropriate. Consisteny: 1 2 3 If a onlusion an be reasoned in many ways then all must lead to the same result. All information relevant to the question must be taken into aount by the theory. Equivalent states on knowledge must be represented by the same probability assignments. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion p(A|B) A real number measure of the plausibility of proposition/hypotheses A given by the information represented by the proposition The sum rule: B. p(A|B) + p(Ā|B) = 1 The produt rule: p(A, B|C) = p(A|C)p(B|A, C) = p(B|C)p(A|B, C) ⇒ p(A|B, C) = p(A|C)p(B|A, C) p(B|C) Bayes' rule! Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion The usual form of Bayes' rule: p(Hi |D, I) = where Hi p(Hi |I)p(D|Hi , I) p(D|I) is the hypothesis of interest and p(D|I) = X i p(Hi |I)p(D|Hi , I) is a normalization fator. We are often only interested in: p(Hi |D, I) ∝ p(Hi |I)p(D|Hi , I) where p(D|Hi , I) , L(Hi ) is the Fredrik Lingvall likelihood funtion. A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Continuous Parameters Probability Density Funtion (PDF) p(H0 |D, I) = lim δh→0 W the proposition that H0 p(h ≤ H0 < h + δh|D, I) δh is in the interval p(W |D, I) = Z a [a, b]: b p(H0 |D, I)dH0 In Bayesian inferene a PDF is a measure of the state of knowledge of the hypotheses. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion The Learning Rule p(A|D, I) ∝ p(A|I)p(D|A, I) New data D2 : p(A|D, I) ⇒ p(A|I ′ ) p(A|D2 , I ′ ) ∝ p(A|I ′ )p(D2 |A, I ′ ) Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion p(A|D, I) ∝ p(A|I)p(D|A, I) | {z } | {z }| {z } Posterior Prior Likelihood Prior Likelihood Posterior 1 0.8 0.6 0.4 0.2 0 -4 -2 0 2 4 A Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Marginalization The extended sum rule: p(A + B|C) = p(A|C) + p(B|C) − p(A, B|C) A and B mutually exlusive (only one an be true): p(A + B|C) = p(A|C) + p(B|C) Let (for simpliity) Ai be a disrete parameter, then p(A1 + A2 + · · · |I) = p(A1 |I) + p(A2 |I) + · · · = 1 Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Using the produt rule gives: p(ω, A1 + A2 + · · · |D, I) =p(A1 + A2 + · · · |I) p(ω|A1 + A2 + · · · , D, I) = 1 × p(ω|D, I) Ai mutually exlusive: p(ω, A1 + A2 + · · · |D, I) =p(ω, A1 |D, I) + p(ω, A2 |D, I) + · · · X p(ω, Ai |D, I) = i p(ω|D, I) = X i Fredrik Lingvall p(ω, Ai |D, I) A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion p(ω|D, I) = Z p(ω, A|D, I)dA If the priors are independent then: p(ω, A|I)p(D|ω, A, I) p(D|I) p(ω|I)p(A, I)p(D|ω, A, I) = p(D|I) p(w, A|D, I) = p(ω|D, I) ∝ p(ω|I) Z Fredrik Lingvall p(A|I)p(D|ω, A, I)dA A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging Parameter Estimation p(A|D, I) is our urrent (omplete) state of knowledge. Parameter estimation: hoose one partiular (point) estimate, Â, from the posterior. Common hoies: Conditional mean (CM), Maximum a posteriori (MAP, or posterior mode) Median. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging Example: DC Level in Gaussian Noise y1 1 e1 y2 1 e2 .. = .. a + .. = 1a + e . . . yN 1 eN Objetive: Estimate the onstant a from N noisy observations. Assume a zero-mean Gaussian error: e = y − 1a ∈ N (0, Ce ) and that Fredrik Lingvall Ce (or σe2 ) is known. A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging The Maximum Likelihood (ML) method: 1 p(y|a, I) = N 2 1 1 e 2 T C−1 (y−1a) e− 2 (y−1a) e (2π) |C | âml = arg if Ce = σe2 I max −1 −1 p(y|a, I) = (1T Ce 1)−1 1T Ce y a then âml = (1T 1)−1 1T y = N 1 X yn = ȳ N n=1 Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging The Maximum A Posteriori (MAP) Estimator: âmap = arg = arg max max a × p(a|y, I) a 1 N 2 1 p 1 e 2 − 2πσa2 1 e 1 2 2 (a−ma ) 2σa T C−1 (y−1a) e− 2 (y−1a) e (2π) |C | σa2 = ma + 2 (ȳ − ma ) σa2 + σN e Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging N=50 N=500 Posterior Prior Likelihood 0.016 Posterior Prior Likelihood 0.05 0.014 0.04 0.012 0.01 0.03 0.008 0.02 0.006 0.004 0.01 0.002 0 -20 -15 -10 -5 0 5 10 15 a 20 0 -20 -15 -10 -5 0 5 10 15 20 a Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging Example: Deonvolution Estimate the input signal u(t) smeared with h(t) from noisy observations: y(t) = h(t) ∗ u(t) + e(t) y = Hu + e 1 p(y|u, I) = N 2 1 1 e 2 T C−1 (y−Hu) e− 2 (y−Hu) e (2π) |C | ûml = arg max u −1 T −1 p(y|u, I) = (HT C−1 H Ce y e H) Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Gaussian prior for Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging u 1 p(u|I) = L 2 1 1 u 2 T C−1 u e− 2 u u (2π) |C | The MAP estimate: ûmap = arg × max p(u|I) u 1 N 2 1 1 e 2 T C−1 (y−1a) e− 2 (y−1a) e (2π) |C | = (HT Ce−1 H + Cu−1 )−1 HT C−1 e y = Cu HT (HCu HT + Ce )−1 y = the Wiener lter. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging Example: Ultrasoni Array Imaging Traditional imaging: Focal point Model based imaging: y1 P1 y2 P2 y = . = . o + e = Po + e (B-san) .. .. yL PL Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion 16 Element Phased Array Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging 45 46 47 Point targets z [mm] 48 49 50 51 52 53 54 55 −20 −10 0 10 20 x [mm] Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging 1) Gaussian prior, MAP estimate: 2 ô = Co PT (PCo PT + Ce )−1 y 1.8 1.6 1.4 1.2 2) Exponential prior (positivity onstraints): 1 0.8 0.6 0.4 0.2 p(o|y, I) ∝ 1 0 −1 −0.5 0 0.5 1 Scattering strength N 2 (2π) |Ce |1/2 „ « 1 exp − (y − Po)T Ce−1 (y − Po) 2 ΠN n=1 λo exp(−λo on ). 5 MAP estimate: 4.5 4 3.5 3 ô = arg min o 1 (y − Po)T Ce−1 (y − Po) + λo 1T o 2 subjet to on ≥ 0 ∀n, Fredrik Lingvall 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 Scattering strength A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging 6 0.5 2 0 −0.5 −1 1 0 −5 70 0 80 5 90 x [mm] t [µs] Exponential prior: 45 1 Gaussian prior: Traditional: 3 Data: Normalized Amplitude x 10 1 −20 47 −10 49 0 51 10 53 55 20 x [mm] z [mm] 1 0.8 0.6 0.5 0.4 0.2 0 45 −20 47 −10 49 0 51 10 53 55 z [mm] 20 x [mm] Fredrik Lingvall 45 −20 47 −10 49 0 51 10 53 55 z [mm] 20 x [mm] A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise Example: Deonvolution Example: Ultrasoni Array Imaging What (point) estimate should we use here? 0.25 0.2 0.15 0.1 0.05 0 -40 -20 Fredrik Lingvall 0 20 40 A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Model Seletion Say we have 2 models (i = 1, 2): y = Mi (θ) + e Whih of the two models desribe our data, p(Mi |y, I) = y, best? p(Mi |I)p(y|Mi , I) p(y|I) Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data We are not interested in the parameters use marginalization to remove them: p(Mi |y, I) = Z p(Mi , θ|y, I)dθ Then apply Bayes' rule and the produt rule: R p(Mi |I) p(θ|I)p(y|θ, Mi , I)dθ p(Mi |y, I) = p(y|I) Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Example: DC Level in Gaussian Noise on't Model M1 (no free parameters): y=e p(M1 |I)p(y|M1 , I) p(M1 |y, I) = p(y|I) Model M2 (one free parameter): y = 1a + e p(M2 |y, I) = p(M2 |I) Fredrik Lingvall R p(a|I)p(y|a, M2 , I)da p(y|I) A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Assume for simpliity a uniform prior for the parameter a: 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 1/(amax-amin) 0.02 0 -60 -40 amin -20 Fredrik Lingvall 0 a 20 amax 40 60 A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data p(y|a, M2 , I) = L(a) 1 1 T exp − 2 y − 1a) (y − 1a) = N √ 2σe (2π) 2 N σe 1 1 T 2 exp − 2 y y − 2N ȳa + N a = N √ 2σe (2π) 2 N σe L(âml ) = L(ȳ) = 1 N √ (2π) 2 1 T 2 exp − 2 y y − N ȳ 2σe N σe Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Z a 1 p(M2 |I) p(M2 |y, I) = p(y|a, M2 , I)da p(y|I) amax − amin a 1 1 p(M2 |I) T 2 × exp − 2 y y − N ȳ ≈ N √ p(y|I) 2σe (2π) 2 N σe √ 1 2πσ √ e × amax − amin N p(M2 |I) = Ωa × L(âml ) × |{z} | {z } p(y|I) max min Lmax Fredrik Lingvall Oam fator A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data The posterior odds: p(M2 |I) L(âml ) Ωa p(M2 |y, I) = × × p(M1 |y, I) p(M1 |I) L(M1 ) 1 Oam's Razor: 1 Simpler explanations are to be preferred unless there is suient evidene in favor of more ompliated explanations. 1 William of Okham (also Oam) 12881347 was an English Franisan friar (≈ monk) and sholasti philosopher. Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data 0.12 0.35 0.25 0.1 0.3 0.2 0.25 0.08 0.15 0.2 0.06 0.15 0.1 0.04 0.1 1/(amax-amin) 0.05 0.02 0.05 1/(amax-amin) 0 -40 -30 amin -20 -10 0 a 10 20 30 0 -40 40 -30 a min -20 -10 0 10 20 a max 30 a amax 0 -40 amin -30 -20 -10 0 10 amax 20 30 a N Odds Odds [dB℄ 2 0.18 -15.00 10 2.02 6.13 20 7423.34 77.41 Fredrik Lingvall 1/(amax-amin) 40 A Short Introdution to Bayesian Inferene 40 Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Example: DC Level in Gaussian Noise on't Example: NMR Data Example: Nulear magneti resonane (NMR) Data 2 ABSORBTION SPECTRUM Model as sinusoid(s) with deay. POWER SPECTRAL DENSITY Marginalize over the phase, amplitude,deay, and noise (variane) parameters. The absorbtion spectrum (described in the text, see page 117) gives a clear indication of the three frequencies and hints at three others (A). Usinf the full width at half maximum of the absorbion spectrum to determine the accuracy estimate an converting to physical units, it determines the frequencies to within ±15 Hz. The probability analysis (B) used a seven-frequency model with decay. The estimated accuracy is approximately ±0.001 Hz. 2 From Bretthorst: Bayesian Spetrum Analysis and Parameter Estimation Fredrik Lingvall A Short Introdution to Bayesian Inferene Outline Statistial Inferene Probabilities and rules for manipulating them Marginalization Parameter Estimation Model Seletion Further important topis: Assigning probabilities Experimental design Reommended reading: P. Gregory: Bayesian Logial Data Analysis for the Physial Sienes E.T. Jaynes: Probability Theory The Logi of Siene G.L. Bretthorst: Bayesian Spetrum Analysis and Parameter Estimation D.S. Sivia: Data Analysis A Bayesian Tutorial Fredrik Lingvall A Short Introdution to Bayesian Inferene