Document 11413130

advertisement
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
A Short Introdution to Bayesian
Inferene
Fredrik Lingvall
February 14, 2008
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Outline
1
Statistial Inferene
2
Probabilities and rules for manipulating them
3
Marginalization
4
Parameter Estimation
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
5
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
The Sienti Method
Predictions
Observations
(data)
Testable Hypotheses
(theory), model
Hypothesis Testing
Parameter Estimation
Stati
st
ical (plausable) inference
Statistial Inferene: a tool for,
Assessing the plausibility of one or more ompeting models
Estimate model parameters and their unertainties
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Statistial Inferene
Engelskt uppslagsord: inferene
Svensk översättning: slutsats, logisk följd utifran vissa
förutsättningar
Statistial inferene is the proess of inferring the truth
of our theories of nature on the basis of inomplete
information
The available information is always inomplete
knowledge is probabilisti.
⇒
our
Normal (dedutive) logi: true-false [0,1℄. Bayesian
(extended) logi uses the whole range from 0→1.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
The basi desiderata of Bayesian probability theory:
Representation of degrees of plausibility with real
numbers
Quantitative orrespondene with ommon sense:
1
New information supporting the truth → the number representing the
plausibility must inrease (ontinuously and monotonially).
2
Dedutive limit must be obtained when appropriate.
Consisteny:
1
2
3
If a onlusion an be reasoned in many ways then all must lead to the
same result.
All information relevant to the question must be taken into aount by
the theory.
Equivalent states on knowledge must be represented by the same
probability assignments.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
p(A|B) A
real number measure of the plausibility of
proposition/hypotheses
A
given by the information
represented by the proposition
The sum rule:
B.
p(A|B) + p(Ā|B) = 1
The produt rule:
p(A, B|C) = p(A|C)p(B|A, C) = p(B|C)p(A|B, C)
⇒ p(A|B, C) =
p(A|C)p(B|A, C)
p(B|C)
Bayes' rule!
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
The usual form of Bayes' rule:
p(Hi |D, I) =
where
Hi
p(Hi |I)p(D|Hi , I)
p(D|I)
is the hypothesis of interest and
p(D|I) =
X
i
p(Hi |I)p(D|Hi , I)
is a normalization fator.
We are often only interested in:
p(Hi |D, I) ∝ p(Hi |I)p(D|Hi , I)
where
p(D|Hi , I) , L(Hi )
is the
Fredrik Lingvall
likelihood funtion.
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Continuous Parameters
Probability Density Funtion (PDF)
p(H0 |D, I) = lim
δh→0
W
the proposition that
H0
p(h ≤ H0 < h + δh|D, I)
δh
is in the interval
p(W |D, I) =
Z
a
[a, b]:
b
p(H0 |D, I)dH0
In Bayesian inferene a PDF is a measure of the state of
knowledge of the hypotheses.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
The Learning Rule
p(A|D, I) ∝ p(A|I)p(D|A, I)
New data
D2 : p(A|D, I) ⇒ p(A|I ′ )
p(A|D2 , I ′ ) ∝ p(A|I ′ )p(D2 |A, I ′ )
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
p(A|D, I) ∝ p(A|I)p(D|A, I)
| {z } | {z }| {z }
Posterior
Prior
Likelihood
Prior
Likelihood
Posterior
1
0.8
0.6
0.4
0.2
0
-4
-2
0
2
4
A
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Marginalization
The extended sum rule:
p(A + B|C) = p(A|C) + p(B|C) − p(A, B|C)
A
and
B
mutually exlusive (only one an be true):
p(A + B|C) = p(A|C) + p(B|C)
Let (for simpliity)
Ai
be a disrete parameter, then
p(A1 + A2 + · · · |I) = p(A1 |I) + p(A2 |I) + · · · = 1
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Using the produt rule gives:
p(ω, A1 + A2 + · · · |D, I) =p(A1 + A2 + · · · |I)
p(ω|A1 + A2 + · · · , D, I)
= 1 × p(ω|D, I)
Ai
mutually exlusive:
p(ω, A1 + A2 + · · · |D, I) =p(ω, A1 |D, I) + p(ω, A2 |D, I) + · · ·
X
p(ω, Ai |D, I)
=
i
p(ω|D, I) =
X
i
Fredrik Lingvall
p(ω, Ai |D, I)
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
p(ω|D, I) =
Z
p(ω, A|D, I)dA
If the priors are independent then:
p(ω, A|I)p(D|ω, A, I)
p(D|I)
p(ω|I)p(A, I)p(D|ω, A, I)
=
p(D|I)
p(w, A|D, I) =
p(ω|D, I) ∝ p(ω|I)
Z
Fredrik Lingvall
p(A|I)p(D|ω, A, I)dA
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
Parameter Estimation
p(A|D, I)
is our urrent (omplete) state of knowledge.
Parameter estimation: hoose one partiular (point)
estimate,
Â,
from the posterior.
Common hoies:
Conditional mean (CM),
Maximum a posteriori (MAP, or posterior mode)
Median.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
Example: DC Level in Gaussian Noise

  
 
y1
1
e1
 y2  1
 e2 
   
 
 ..  =  ..  a +  ..  = 1a + e
 .  .
 . 
yN
1
eN
Objetive: Estimate the onstant
a
from
N
noisy
observations.
Assume a zero-mean Gaussian error:
e = y − 1a ∈ N (0, Ce )
and that
Fredrik Lingvall
Ce
(or
σe2 )
is known.
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
The Maximum Likelihood (ML) method:
1
p(y|a, I) =
N
2
1
1
e 2
T C−1 (y−1a)
e− 2 (y−1a)
e
(2π) |C |
âml = arg
if
Ce = σe2 I
max
−1
−1
p(y|a, I) = (1T Ce 1)−1 1T Ce y
a
then
âml = (1T 1)−1 1T y =
N
1 X
yn = ȳ
N
n=1
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
The Maximum A Posteriori (MAP) Estimator:
âmap = arg
= arg
max
max
a
×
p(a|y, I)
a
1
N
2
1
p
1
e 2
−
2πσa2
1
e
1
2
2 (a−ma )
2σa
T C−1 (y−1a)
e− 2 (y−1a)
e
(2π) |C |
σa2
= ma +
2 (ȳ − ma )
σa2 + σN
e
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
N=50
N=500
Posterior
Prior
Likelihood
0.016
Posterior
Prior
Likelihood
0.05
0.014
0.04
0.012
0.01
0.03
0.008
0.02
0.006
0.004
0.01
0.002
0
-20
-15
-10
-5
0
5
10
15
a
20
0
-20
-15
-10
-5
0
5
10
15
20
a
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
Example: Deonvolution
Estimate the input signal
u(t)
smeared with
h(t)
from noisy
observations:
y(t) = h(t) ∗ u(t) + e(t)
y = Hu + e
1
p(y|u, I) =
N
2
1
1
e 2
T C−1 (y−Hu)
e− 2 (y−Hu)
e
(2π) |C |
ûml = arg
max
u
−1 T −1
p(y|u, I) = (HT C−1
H Ce y
e H)
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Gaussian prior for
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
u
1
p(u|I) =
L
2
1
1
u 2
T C−1 u
e− 2 u
u
(2π) |C |
The MAP estimate:
ûmap = arg
×
max
p(u|I)
u
1
N
2
1
1
e 2
T C−1 (y−1a)
e− 2 (y−1a)
e
(2π) |C |
= (HT Ce−1 H + Cu−1 )−1 HT C−1
e y
= Cu HT (HCu HT + Ce )−1 y
= the
Wiener lter.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
Example: Ultrasoni Array Imaging
Traditional imaging:
Focal point
Model based imaging:

  
y1
P1
 y2   P2 
   
y =  .  =  .  o + e = Po + e (B-san)
 ..   .. 
yL
PL
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
16 Element Phased Array
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
45
46
47
Point targets
z [mm]
48
49
50
51
52
53
54
55
−20
−10
0
10
20
x [mm]
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
1) Gaussian prior, MAP estimate:
2
ô = Co PT (PCo PT + Ce )−1 y
1.8
1.6
1.4
1.2
2) Exponential prior (positivity onstraints):
1
0.8
0.6
0.4
0.2
p(o|y, I) ∝
1
0
−1
−0.5
0
0.5
1
Scattering strength
N
2
(2π) |Ce |1/2
„
«
1
exp − (y − Po)T Ce−1 (y − Po)
2
ΠN
n=1 λo exp(−λo on ).
5
MAP estimate:
4.5
4
3.5
3
ô = arg min
o
1
(y − Po)T Ce−1 (y − Po) + λo 1T o
2
subjet to
on ≥ 0 ∀n,
Fredrik Lingvall
2.5
2
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
1
Scattering strength
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
6
0.5
2
0
−0.5
−1
1
0
−5
70
0
80
5
90
x [mm]
t [µs]
Exponential prior:
45
1
Gaussian prior:
Traditional:
3
Data:
Normalized Amplitude
x 10
1
−20
47
−10
49
0
51
10
53
55
20
x [mm]
z [mm]
1
0.8
0.6
0.5
0.4
0.2
0
45
−20
47
−10
49
0
51
10
53
55
z [mm]
20
x [mm]
Fredrik Lingvall
45
−20
47
−10
49
0
51
10
53
55
z [mm]
20
x [mm]
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise
Example: Deonvolution
Example: Ultrasoni Array Imaging
What (point) estimate should we use here?
0.25
0.2
0.15
0.1
0.05
0
-40
-20
Fredrik Lingvall
0
20
40
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Model Seletion
Say we have 2 models (i
= 1, 2):
y = Mi (θ) + e
Whih of the two models desribe our data,
p(Mi |y, I) =
y,
best?
p(Mi |I)p(y|Mi , I)
p(y|I)
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
We are not interested in the parameters use marginalization
to remove them:
p(Mi |y, I) =
Z
p(Mi , θ|y, I)dθ
Then apply Bayes' rule and the produt rule:
R
p(Mi |I) p(θ|I)p(y|θ, Mi , I)dθ
p(Mi |y, I) =
p(y|I)
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Example: DC Level in Gaussian Noise on't
Model
M1
(no free parameters):
y=e
p(M1 |I)p(y|M1 , I)
p(M1 |y, I) =
p(y|I)
Model
M2
(one free parameter):
y = 1a + e
p(M2 |y, I) =
p(M2 |I)
Fredrik Lingvall
R
p(a|I)p(y|a, M2 , I)da
p(y|I)
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Assume for simpliity a uniform prior for the parameter
a:
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
1/(amax-amin)
0.02
0
-60
-40
amin
-20
Fredrik Lingvall
0
a
20
amax
40
60
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
p(y|a, M2 , I) = L(a)
1
1
T
exp − 2 y − 1a) (y − 1a)
=
N √
2σe
(2π) 2 N σe
1
1
T
2
exp − 2 y y − 2N ȳa + N a
=
N √
2σe
(2π) 2 N σe
L(âml ) = L(ȳ) =
1
N √
(2π) 2
1
T
2
exp − 2 y y − N ȳ
2σe
N σe
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Z a
1
p(M2 |I)
p(M2 |y, I) =
p(y|a, M2 , I)da
p(y|I) amax − amin a
1
1
p(M2 |I)
T
2
×
exp − 2 y y − N ȳ
≈
N √
p(y|I)
2σe
(2π) 2 N σe
√
1
2πσ
√ e
×
amax − amin
N
p(M2 |I)
=
Ωa
× L(âml ) ×
|{z}
| {z }
p(y|I)
max
min
Lmax
Fredrik Lingvall
Oam fator
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
The posterior odds:
p(M2 |I) L(âml ) Ωa
p(M2 |y, I)
=
×
×
p(M1 |y, I)
p(M1 |I)
L(M1 )
1
Oam's Razor:
1
Simpler explanations are to be preferred
unless there is suient evidene in favor of more
ompliated explanations.
1
William of Okham (also Oam) 12881347 was an English Franisan
friar (≈ monk) and sholasti philosopher.
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
0.12
0.35
0.25
0.1
0.3
0.2
0.25
0.08
0.15
0.2
0.06
0.15
0.1
0.04
0.1
1/(amax-amin)
0.05
0.02
0.05
1/(amax-amin)
0
-40
-30
amin
-20
-10
0
a
10
20
30
0
-40
40
-30
a
min
-20
-10
0
10
20
a
max
30
a
amax
0
-40
amin
-30
-20
-10
0
10
amax
20
30
a
N
Odds
Odds [dB℄
2
0.18
-15.00
10
2.02
6.13
20
7423.34
77.41
Fredrik Lingvall
1/(amax-amin)
40
A Short Introdution to Bayesian Inferene
40
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Example: DC Level in Gaussian Noise on't
Example: NMR Data
Example: Nulear magneti resonane (NMR) Data
2
ABSORBTION SPECTRUM
Model as sinusoid(s) with
deay.
POWER SPECTRAL DENSITY
Marginalize over the phase,
amplitude,deay, and noise
(variane) parameters.
The absorbtion spectrum (described in the text, see page 117) gives a clear indication
of the three frequencies and hints at three others (A). Usinf the full width at half maximum of the absorbion spectrum to determine the accuracy estimate an converting
to physical units, it determines the frequencies to within ±15 Hz. The probability
analysis (B) used a seven-frequency model with decay. The estimated accuracy is
approximately ±0.001 Hz.
2
From Bretthorst: Bayesian Spetrum Analysis and Parameter
Estimation
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Outline
Statistial Inferene
Probabilities and rules for manipulating them
Marginalization
Parameter Estimation
Model Seletion
Further important topis:
Assigning probabilities
Experimental design
Reommended reading:
P. Gregory: Bayesian Logial Data Analysis for the
Physial Sienes
E.T. Jaynes: Probability Theory The Logi of Siene
G.L. Bretthorst: Bayesian Spetrum Analysis and
Parameter Estimation
D.S. Sivia: Data Analysis A Bayesian Tutorial
Fredrik Lingvall
A Short Introdution to Bayesian Inferene
Download