Statistical modelling with WinBUGS

advertisement
model fit: mu
35.0
30.0
25.0
20.0
15.0
180.0
190.0
200.0
210.0
Statistical modelling
with WinBUGS
model {
s_1 ~ dbin(theta_1,n_1)
s_2 ~ dbin(theta_2,n_2)
theta_1 ~ dbeta(0.5,0.5)
theta_2 ~ dbeta(0.5,0.5)
diff <- theta_1 - theta_2
P <- step(diff)
}
Robert Piché
Tampere University of Technology
# data
list(s_1=15,n_1=18,s_2=21,n_2=23)
# initialisation
list( theta_1=0.8, theta_2=0.8)
node
mean
sd
MC error 2.5% median 97.5% start sample
theta_1 0.8124 0.08634 0.001894 0.6214 0.8238 0.9449 1 2000
theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0.9804 1 2000
diff -0.08315 0.1064 0.002292 -0.3003 -0.0787 0.124 1 2000
P 0.2075
diff sample: 2000
6.0
4.0
1
The results after 5000 simulation steps are
2.0
0.0
-0.75
-0.5
-0.25
0.0
0.25
node
mean
sd
2.5%
median 97.5%
alpha 1.274
1.067 -0.6333 1.193
3.607
beta
11.38
5.56
3.463
10.44
24.78
LD50
-0.1052 0.09512 -0.2738 -0.1109 0.1196
y/n
0
!1
0
x
1
Literature
•
WinBUGS free download www.mrc-bsu.cam.ac.uk/bugs/
•
Read the first 15 pages of www.stat.uiowa.edu/~gwoodwor/BBIText/AppendixBWinbugs.pdf
•
For a full intro to Bayesian statistics, take my course math.tut.fi/~piche/bayes
•
My course is based on Antti Penttinen’s course users.jyu.fi/~penttine/bayes09/
in this lesson we look at
6 basic statistical models
model {
s_1 ~ dbin(theta_1,n_1)
s_2 ~ dbin(theta_2,n_2)
theta_1 ~ dbeta(0.5,0.5)
theta_2 ~ dbeta(0.5,0.5)
diff <- theta_1 - theta_2
P <- step(diff)
}
•
# data
list(s_1=15,n_1=18,s_2=21,n_2=23)
inferring a proportion
# initialisation
list( theta_1=0.8, theta_2=0.8)
node
mean
sd
MC error 2.5% median 97.5% start sample
theta_1 0.8124 0.08634 0.001894 0.6214 0.8238 0.9449 1 2000
theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0.9804 1 2000
diff -0.08315 0.1064 0.002292 -0.3003 -0.0787 0.124 1 2000
P 0.2075
•
comparing proportions
•
inferring a mean
•
Forbes
diff sample: 2000
6.0
model {
x_bar <- mean(x[ ])
4.0
for ( i in 1 : n ) {
log(mu[i]) <- alpha + beta * (x[i] - 2.0
x_bar)
y[i] ~ dnormal(mu[i],tau,4)
0.0
}
-0.75
alpha ~ dnorm( 0.0,1.0E-6)
beta ~ dnorm( 0.0,1.0E-6)
tau ~ dgamma(0.001,0.001)
y190 <- exp(alpha + beta*(190 - x_bar))
}
-0.5
-0.25
0.0
0.25
list( x=c(210.8, 210.2, 208.4, 202.5, 200.6, 200.1, 199.5, 197, 196.4, 196.3,
195.6, 193.4, 193.6, 191.4, 191.1, 190.6, 189.5, 188.8, 188.5, 185.7,
186, 185.6, 184.1, 184.6, 184.1, 183.2, 182.4, 181.9, 181.9, 181.15, 180.6),
y=c(29.211, 28.559, 27.972, 24.697, 23.726, 23.369, 23.03, 21.892, 21.928, 21.654,
21.605, 20.48, 20.212, 19.758, 19.49, 19.386, 18.869, 18.356, 18.507, 17.267,
17.221, 17.062, 16.959, 16.881, 16.817, 16.385, 16.235, 16.106, 15.928, 15.919, 25.376),
n=31)
comparing means
list(alpha=0,beta=0,tau=1)
node
mean
sd
y190 19.42 0.3159
•
linear regression
MC error 2.5% median 97.5% start sample
0.02019 18.78 19.42 20.12 4001 1000
model fit: mu
35.0
30.0
25.0
20.0
1
15.0
•
slide 3 of 20
The results
after 5000
simulation 200.0
steps are
180.0
190.0
logistic regression
210.0
node
mean
sd
2.5%
median 97.5%
alpha 1.274
1.067 -0.6333 1.193
3.607
beta
11.38
5.56
3.463
10.44
24.78
LD50
-0.1052 0.09512 -0.2738 -0.1109 0.1196
y/n
0
!1
0
x
1
inferring a proportion
•
What proportion of the population of Ireland
supports the Lisbon agreement? (let’s call this: !)
•
When we ask n Irish adults, s of them say
that they are in favour.
•
What can we conclude about ! ?
1. construct an observation model p( s | ! )
2. construct a prior model p(!)
3. compute posterior p( ! | s ) with WinBUGS
slide 4 of 20
inferring a proportion (2)
is represented as yi = 1 for success an
considered by Bayes and Laplace) is to
The likelihood pmf for a single obse
represented
as yiyi==11for
and
yiin= favour
0 for
failure.
(first
Let
if success
person
i issuccess
(“success”),
otherwise
yi =�0 θ(first (y = 1) �
is•represented
as
yi = 1 for
and
yi = 0The
for problem
failure.
The problem
i
nsidered by Bayes and Laplace) is to infer θ from a set of such observations.p(yi | θ ) =
considered by Bayes and Laplace) is to infer θ from a set of such observations.
1 − θ (yi = 0)
The likelihood
pmf
for
a
single
observation
is
assume:
givenpmf
!, for
thea probability
that case
i is a success is !, i.e.
• The
likelihood
single observation
is
�
�
�1−yi
θ
(y�i = 1)
where 00 is taken to be equal to 1. Th
yi
−θ)
(yi ∈ {0,1−y
1}),
=(yθi =(11)
p(yi | θ ) =
θ
y
i
1
−
θ
(y
=
0)
assumed
to be mutually independ
= θ (1 − θ ) thati is(y
p(yi | θ ) = i
i ∈ {0, 1}),
1 − θ (yi = 0)
Observation model
n
here 00 is taken to be equal to 1. The likelihood pmf of a sequence y1 , . . . , yn
yi
0
p(y
|
θ
)
=
θ
(1 −
1:n
!
where
0
is
taken
to
be
equal
to
1.
The
likelihood
pmf
of
a
sequence
y
,
.
.
.
,
y
at is assumed
to
be
mutually
independent
given
θ
is
then
n
1
assume: given !, the observations are statistically independent, so
•
i=1
that is assumed to be
mutually
independent
given
θ
is
then
n
p(y1:n | θ ) = ! θ yi (1 − θ )n1−yi = θ s (1 − θ )n−s ,
(2)"ni=1 yi is the number of suc
where s =
i=1
yi
1−yi
that appears
p(y
= θ s (1the
− θobservations
)n−s ,
(2) in the like
1:n | θ ) = ! θ (1 − θ )
i=1 Note that s is the only
equally
featurewell
of be stated with s treated as
here s = "ni=1 yi is the number of successes.
given
!,
is likelihood,
a randomandvariable
withproblem
Binomial
• Thus,
hood distribution
would be s | θ ∼ Binom
e observations
that appears
in sthe
the inference
coulddistribution
n
where s = i=1 yi is the number of successes. Note that s is the only feature of
ually well be stated"with
s treated as the observation, in which case the
likeliComputations
are facilitated by taki
the observations
in the
likelihood, and the inference problem could
od distribution
would be sthat
| θ ∼appears
Binomial(n,
θ
).
with parameter values chosen so as to g
s
|
θ
∼
Binomial(θ
,
n)
equally are
wellfacilitated
be statedbywith
s treated
the observation,
in which
the likeliComputations
taking
as prioras
a beta
distributionprior
Beta(α,
β )ofcase
state
knowledge. The beta distri
hood distribution
begive
s | θ an
∼ acceptable
Binomial(n,
θ ).
th parameter
values chosenwould
so as to
approximation
of your
Computations
taking
Beta(α,
orslide
state
knowledge.
The are
betafacilitated
distributionbyhas
pdf as prior a beta distribution#(α
+ β )β )α−1
5 of
of 20
p(θ ) =
θ
(1 − θ )β −
inferring a proportion (3)
•
The observation model (“likelihood”) tells us how we could generate
random s, given ! : s | θ ∼ Binomial(θ , n)
•
Inference is the inverse problem: given s, what is ! ?
Example:inference
Opinion survey An opinion survey is conducted to determine the
Bayesian
•
•
•
proportion θ of the population that is in favour of a certain policy. After some
discussion with various experts, you determine that the prior belief has E(θ ) > 0.5,
The unknown
is treated
a random
but with a lotproportion
of uncertainty.!The
results of like
the survey
are that,variable
out of n = 1000
respondents, s = 650 were in favour of the policy. What do you conclude?
Its probability
density
function
(pdf)
in [0, 1]
You decide
on a prior
distribution
withlives
E(θ ) =
2
0.6 and V(θ ) = 0.32 , which corresponds to a beta
posterior
distribution with α = 1 and β = 23 , that is,
slide 6 of 20
The prior pdf (“before observation”) and
posterior pdf
p(θ )are
! (1related
− θ )−1/3byθBayes’
∈ [0, 1]. law:
The survey resultsp(s
give
| θyou
)p(θa )posterior distri� θ | y1:n ∼ Beta(651, 350.667),
p(θ with
| s) =
bution
density
p(s | θ )p(θ
whose mean and variance
are ) dθ
2
20
p(!)
p(!|y)
prior
0
0
0
!
1
said to be conjugate to a likelihood distribution if, for every prior chosen from C ,
The conjugate families and inference solutions that will be presented in this
the posterior also belongs to C . In practice, conjugate families are parametrised,
section are summarised below.
and by choosing appropriate parameter values you can usually obtain a distribumodel of your
prior
yi tion
| θ ∼that is an acceptable
θ∼
θ | y1:n
∼ state of knowledge.
ỹ | y1:n ∼
The conjugate families and inference solutions that will be presented in this
Normal(θ
, v)summarised
Normal(m
Normal(mn , v + wn )
0 , w0 ) Normal(mn , wn )
section are
below.
α+s
Binomial(1, θ ) Beta(α, β )
Beta(α + s, β + n − s)
Binomial(1, α+β
+n )
yi | θ ∼
θ∼
θ | y1:n ∼
ỹ | y1:n ∼
Poisson(θ )
Gamma(α, β )
Gamma(α + s, β + n)
NegBin(α + s, β + n)
Normal(θ , v) Normal(m0 , w0 ) Normal(mn , wn )
Normal(mn , v + wn )
Exp(θ
)
Gamma(α, β )
Gamma(α
+ n, β + s)
α+s
Binomial(1, θ ) Beta(α, β )
Beta(α + s, β + n − s)
Binomial(1, α+β
+n )
n
n 2
Normal(m,
θ
)
InvGam(α,
β
)
InvGam(α
+
,
β
+
2 β + n)
2 s0 ) NegBin(α + s, β + n)
Poisson(θ )
Gamma(α, β )
Gamma(α + s,
inferring a proportion (4)
The prior, p(!)
•
It is our state of belief about
!ȳ before we make observation
m
•
We can use any pdf that lives in [0, 1], e.g. Unif(0, 1)
•
It’s convenient to use Beta distribution, with 2 parameters ">0, #>0
1
1
1
1 n
2
2
s =Exp(θ
,Gamma(α
mn = ( w00 +
+n,v/nβ )w
Gamma(α,
β )v/n
+ s)
!ni=1 y) i , ȳ = s/n,
n , s0 = n !i=1 (yi − m)
wn = w0 +
Normal(m, θ ) InvGam(α, β ) InvGam(α + 2n , β + n2 s20 )
The properties of the distributions are summarised below.
ȳ
1
2 = 1 n (y − m)2
0
s = !ni=1 yi , ȳ = s/n, w1n = w10 + v/n
, mn = ( m
+
)w
,
s
n
0
w
n !i=1 i
v/n
x∼
p(x)
x∈ 0
E(x) mode(x)
V(x)
�
�
(x−µ)2 are summarised below.
1the distributions
2
2
The
properties
of
√
R
µ
µ
σ
Normal(µ, σ )
exp
−
2
2σ
�n2πσ
� x
Binomial(n,
p)
p (1 − p)n−x
{0,
�(n + 1)p� V(x)
np(1 − p)
x∼
x ∈1, . . . , n} np
E(x) mode(x)
x p(x)
�
�
(x−µ)2
2 ) "(α+β
√ 1 ) exp
R 1]
Normal(µ,
σ
−
α−1 (1 −
2 β −1
2σx)
Beta(α, β )
x
[0,
2πσ
"(α)"(β
� � x)
Binomial(n, p) 1 nxx p−λ
(1 − p)n−x
{0, 1, . . . , n}
Poisson(λ )
λ
e
{0,
1, . . .}
x!
α−1
β α"(α+β
α−1) ex−β
x (1 − x)β −1 (0,
Beta(α,
β
)
[0,#)
1]
Gamma(α, β ) "(α)
x
"(α)"(β )
1 x −λ
Poisson(λ
)
{0,#)
1, . . .}
Exp(λ
)
λ ex!−λλx e
(0,
�x+α−1
β α �α−1β −β x
Gamma(α,
β
)
x ( e )α ( 1 )x {0,
(0,1,
#). . .}
NegBin(α, β )
"(α)
α−1
β +1
β +1
x
β αλ e−λ
−(α+1)
Exp(λ ) β )
(0,#)
#)
InvGam(α,
x
e−β /x
(0,
"(α)
�x+α−1� β α 1 x
µ ∈NegBin(α,
R, σ > 0,βα
β > (0,β +1
λ )>(0,
n )∈ {1,{0,
2, 1,
. . .},
) > 0,α−1
. . .}p ∈
β +1
β α −(α+1) −β /x
InvGam(α, β ) "(α)
x
e
(0, #)
•
•
5.1
Beta(1, 1) = Unif
µα
µ α−1
np
λ
�(n + 1)p� np(1 − p)
�λ �
λ
α+β
αα
α+β
β
λ1
λ
α
α
β
β
1β
λ
α−1
α
[0,
β 1]
β
α−1
small " and # give “vague” prior
µ ∈ R, σ > 0, α > 0, β > 0, λ > 0, n ∈ {1, 2, . . .}, p ∈ [0, 1]
α+β −2
α−1
α−1
α+β
β −2
�λ
0 �
α−1
β
0β
α+1
β
α+1
σ2
αβ
(α+β )2 (α+β +1)
αβ
α
2
2
(α+β
β ) (α+β +1)
λ 12
λ
αα
ββ2 2 (β + 1)
1
β2
2
λ(α−1)2 (α−2)
α
(β + 1)
β2
β2
(α−1)2 (α−2)
• plot the Beta pdf with
Estimating the mean of a normal likelihood
Matlab’s disttool
is the situation that was considered in sections 3 and 4: we have real-valued
slide 7This
of5.1
20
Estimating the mean of a normal likelihood
is P(ỹ =
1
|
y
)
=
E(θ
|
y
)
=
0.6499.
1:n
1:n
In
the
following
WinBUGS
model,
we
base
the
inference
on
t
successes
observed,
and
use
the
likelihood
s
|
θ
∼
Binomial(n,
θ
).
T
Introduction
to
Bayesian
The
95%
credibility
interval
found
using
t
95% credibility In
interval
found
using
the
normal
approximation
is
the following
WinBUGS
model,
we sbase
the
inferenceθ ).on
successes
observed,
and
use
the
likelihood
|
θ
∼
Binomial(n,
in the DAG denotes a constant.
0.6499 ± 1.96 · 0.0151
inferring
a
proportion
(5)
successes
observed,
and
use
the
likelihood
s
|
θ
∼
Binomial(n,
θ)
0.6499
1.96 ·denotes
0.0151 = a
(0.620,
0.679),
in
the ±
DAG
constant.
which agrees (to three decimals) with the 9
in the
DAG
denotes
a
constant.
Initial
theta
model { values
WinBUGS model
the using
inverse beta cdf. The probability that Ot
ch agrees (to three decimals) with the 95% credibility interval computed
is P(ỹ = 1 | y1:n ) =
E(θ | y1:n ) = 0.6499.
theta
model
{
2
s
∼
dbin(theta,n)
inverse beta cdf. The probability that Opinion
the 1001st
In the following WinBUGS model, w
survey respondent will be in favour
theta
∼ dbin(theta,n)
successes observed,
and
use the likelihood
can automatically generate
initial
values
for
∼ dbeta(1,0.667)
(ỹ = 1 | y1:n model
) = E(θ•|theta
ys{WinBUGS
1:n ) = 0.6499.
in the DAG denotes a constant.
theta
∼
dbeta(1,0.667)
In the followingsWinBUGS
model,
we
base
the
inference
on
the
number
of
using
gen
inits
∼
dbin(theta,n)
p(!)
ypred ∼ dbin(theta,1)
ypred
cesses observed, theta
and ypred
use the
likelihood
s | θ ∼ Binomial(n, θ ). The rectangle
model {
∼
dbin(theta,1)
ypred
∼
dbeta(1,0.667)
}
s
∼
dbin(theta,n)
he DAG denotes a constant.
have informative prior information
} • Fine
theta ∼ dbeta(1,0.667)
ypred
∼ ifdbin(theta,1)
ypred
0
ypred ∼ dbin(theta,1)
el {
} The data are entered as theta
The
data
are fairly
entered‘vague’
as
• If
have
priors,
s ∼ dbin(theta,n)
theta ∼ dbeta(1,0.667)
values list
list(s=650,n=1000)
Thelist(s=650,n=1000)
data are entered as
ypred ∼ dbin(theta,1)
ypred
data
0n
}
!
1
better to provide reasonable
The data are entered as
•
The•results
after 2000
simulation steps are
try s=65,
n=100
The results
after
2000
simulation
steps are
Initial
values
list
can
be
after
model
description
or
in
a
separ
initialisation
list(s=650,n=1000)
node
mean
sd
The
results
after
2000
simulation
steps
are
n=10000
• try s=6500,
data are entered as
theta
0.6497 0.01548
node
mean
sd
2.5%
median
97.5%
s
try dbeta(1,1)
list(s=650,n=1000)
list(theta=0.1)
ypred
0.6335
node
mean
sd
2.5%
median
97.5%
0.6497 0.01548
The results aftertheta
2000 simulation
steps are 0.6185 0.6499 0.680
t(s=650,n=1000)
theta 0.6335
0.6497 0.01548 0.6185 0.6499 0.680
ypred
results
results after 2000 simulation steps
are
node
mean
sd
2.5% median 97.
ypred
0.6335
node
theta
ypred
slide 8 of 20
mean
0.6497
0.6335
sd
2.5%
median0.01548
97.5%
theta
0.6497
0.01548 0.6185 0.6499 0.6802
ypred 0.6335
95% credibility interval is [0.6185,0.6802]
0.6185
0.6499
0.68
m
{
}
comparing proportions
•
In a study of larynx cancer patients, s1 of the n1
patients who were treated with radiation
therapy were cured, compared to s2 of the n2
patients who were treated with surgery.
•
What can we say about !1 (success rate of
radiation) vs !2 (surgery)?
1. construct an observation model p( s | ! )
2. construct a prior model p(!)
3. compute posterior p( ! | s ) with WinBUGS
4. compute posterior probability that (!1 $ !2 )
slide 9 of 20
comparing proportions (2)
Observation model
•
assume: given !=(!1,!2), the probability that a radiation therapy patient
is cured is !1 and the probability that a surgery therapy patient is
cured is !2
•
•
assume: given !, the observations are independent
i.e.
p(s1 , s2 | θ1 , θ2 ) = p(s1 | θ1 )p(s2 | θ2 )
s1 |θ1 ∼ Binomial(θ1 , n1 )
s2 |θ2 ∼ Binomial(θ2 , n2 )
The prior, p(!1,!2)
independent
vague
slide 10 of 20
p(θ1 , θ2 ) = p(θ1 )p(θ2 )
θ1 ∼ Beta(0.5, 0.5)
θ2 ∼ Beta(0.5, 0.5)
• plot these
with dissttool
comparing proportions (3)
WinBUGS
model {
s_1 ~ dbin(theta_1,n_1)
s_2 ~ dbin(theta_2,n_2)
theta_1 ~ dbeta(0.5,0.5)
theta_2 ~ dbeta(0.5,0.5)
diff <- theta_1-theta_2
P <- step(diff)
}
# data
list(s_1=15,n_1=18,s_2=21,n_2=23)
# initialisation
list(theta_1=0.8,theta_2=0.8)
results
node
theta 1
theta 2
diff
P
slide 11 of 20
mean
0.8124
0.8955
−0.08315
0.2075
model {
variables
that are deterministic
s_1 ~ dbin(theta_1,n_1)
s_2 ~ dbin(theta_2,n_2)
functions
of stochastic variables are
theta_1 ~ dbeta(0.5,0.5)
theta_2 with
~ dbeta(0.5,0.5)
specified
<diff <- theta_1 - theta_2
P <- step(diff)
}
step(diff)
= 1 if diff ! 0,
# data
list(s_1=15,n_1=18,s_2=21,n_2=23)
= 0 otherwise
# initialisation
list( theta_1=0.8, theta_2=0.8)
comments are indicated with #
node
mean
sd
MC error 2.5% median 97
theta_1 0.8124 0.08634 0.001894 0.6214 0.8238
theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0
diff -0.08315 0.1064 0.002292 -0.3003 -0.0787
P 0.2075
diff sample: 2000
sd
0.0863
0.0619
0.1064
2.5%
0.6214
0.7501
0.3003
median
0.8238
0.9061
−0.0787
97.5%
0.9449
0.9804
0.124
6.0
4.0
2.0
0.0
-0.75
-0.5
-0.25
0.0
0.25
inferring a mean
In 1798, Henry Cavendish performed experiments to
measure the specific density of the Earth (!).
He repeated the experiment n times, obtaining
results y1, y2, ... , yn . What can we conclude about !?
http://www.jstor.org/pss/106988
Observation model
Assume that observation noise is zero-mean gaussian with
precision " , and that noises are independent given " and !
yi = µ + ei
ei | µ, τ ∼ Normal(0, τ)
n
slide 12 of 20
p(y1 , . . . , yn | µ, τ) = ∏ p(yi | µ, τ)
i=1
precision = 1/variance
The conjugate
familiesβand
will be
presented α+β
inα+s
this
Binomial(1,
θ ) Beta(α,
) inference
Beta(αsolutions
+ s, β + that
n − s)
Binomial(1,
+n )
section are summarised below.
Poisson(θ )
Gamma(α, β )
Gamma(α + s, β + n)
NegBin(α + s, β + n)
yExp(θ
θGamma(α,
∼
θGamma(α
| y1:n ∼ + n, β + s)
ỹ | y1:n ∼
i |θ ∼ )
β)
inferring a mean (2)
Normal(θ
, v)
Normal(m,
θ)
Normal(m
w)0 ) Normal(m
, w2nn,)β + n2 s20 )
InvGam(α,
InvGam(αn+
0 ,β
Normal(mn , v + wn )
α+s
ȳ n − s) 2
1 β )1
1 Beta(α +ms,
1 n
n
Binomial(1,
θ
)
Beta(α,
Binomial(1,
2 +n )
0 β +
s = !i=1 yi , ȳ = s/n, wn = w0 + v/n , mn = ( w0 + v/n )wn , s0 = n !i=1 (yi − m)
α+β
µ~Gamma(10,2)
Poisson(θ )
Gamma(α, β )
Gamma(α + s, β + n)
NegBin(α + s, β + n)
The properties of the distributions are summarised below.
p(µ)
Exp(θ )
Gamma(α, β )
Gamma(α + n, β + s)
The prior, p(",#)
Assume
independence:
• Normal(m,
x∼
θ ) p(x)
InvGam(α, β )
Normal(µ,
σ 2)
n
�
p(µ,xτ)
= p(µ)p(τ)
n E(x)n 2mode(x)
∈
� InvGam(α + 2 , β + 2 s0 )
2
√ 1 exp − (x−µ)
1 2σ 2 1
2πσ 1
s/n,
�n� x wn = wn−x+ v/n ,
0
p
(1
−
p)
x
ȳ µ
+ v/n )wn ,
{0, 1, . . . , n} np
R
canyi ,use
= !i=1
ȳ = any pdfs that live in [0, !)
• sWe
Binomial(n, p)
•
0
mn = ( m
w0
xPoisson(λ
∼
)
Normal(µ,
Gamma(α,σβ2))
Binomial(n,
p)
Exp(λ )
α+β
x ∈1, . . .}
{0,
R #)
(0,
{0,#)
1, . . . , n}
(0,
� # give
�
small
"
and
“vague”
prior
•Beta(α,
NegBin(α,
β
)
(
)
(
)
{0,
1, . . .}
β)
x
(1 − x)
[0, 1]
α−1
α+β −2
mode(x)
λE(x) �λ
�
α
µ
β
1np
λ
α
β α
α+β
β
α−1
λ
σ2
s0 = 1n !ni=1 (yi − m)2
�(n + 1)p� np(1 − p)
0
The
properties
of the
distributions
are summarised
below.
"(α+β
α
It’s
convenient
to) xα−1
use(1 Gamma
Beta(α,
β)
− x)β −1 [0,distributions
1]
"(α)"(β )
1 x −λ
p(x)
x! λ e
�
�
2
α
(x−µ)
α−1 e−β
x
√β1 xexp
−
2σ 2
"(α)
2πσ
�n� x
x − p)n−x
p (1
λx e−λ
µ2
V(x)
α−1
µβ
0�(n + 1)p�
☝
αβ
2
(α+β ) (α+β +1)
µ
☝
10
lead ore
7.5
granite
λV(x) 2.5
ασ 2
β2
1np(1 − p)
λ2
α
(β +αβ
1)
β2
2
(α+β
+1)
! ~ Gamma(2.5,0.1)
β 2) (α+β
(α−1)
λ 2 (α−2)
β
x+α−1
1 x
"(α+β
) α−1 α
α−1
α−1
β +1
β +1β −1
α+β
−2
"(α)"(β
)
β α −(α+1) −β /x
β
InvGam(α, β ) 1"(α)x x−λ
e
(0, #)
α+1
Poisson(λ )
λ
e
{0, 1, . . .}
�λ �
x!
µ ∈ R, σ > 0, α >
0, β > 0, λ > 0, n ∈ {1, 2, . . .}, p ∈ [0, 1]
β α α−1 −β x
α
α−1
α
Gamma(α, β ) "(α) x
e
(0, #)
β
β
p(!) β 2
1
1
Exp(λ )
λ e−λ x
(0, #)
0
λ
λ2
�x+α−1� β α 1 x
α
α
NegBin(α, β )
(
)
(
)
{0,
1,
.
.
.}
(β + 1)
α−1
β +1
β +1
β
β2
β α −(α+1) −β /x
β
β
β2
InvGam(α, β ) "(α) x
#)
α−1 4: α+1
(α−1)2 (α−2)
This is the situation that wase considered (0,
in sections
3 and
we have real-valued
5.1 Estimating the mean of a normal likelihood
note:
Matlab’s Gamma parameters are A= ", B= 1/#
"=25
#=0.2
☟
µ ∈ R, σ > 0,y1α, .>. .0,
β that
> 0,are
λ >assumed
0, n ∈ {1,
2,be. . mutually
.}, p ∈ [0,independent
1]
0
observations
,
y
to
given
θ and
n
slide 13 of 20
!
40
inferring a mean (3)
WinBUGS
model {
mu ~ dgamma(a_mu,b_mu)
try y1 = 15.36, i.e. “outlier”
tau ~ dgamma(a_tau,b_tau)
for( i intry
1 : n“robust”
){
distribution
y[i] ~ dnorm(mu,tau)
y[i] ~ dt(mu,tau,4)
}
}
•
•
model {
mu ~ dgamma(a_mu,b_mu)
tau ~ dgamma(a_tau,b_tau)
for (i in 1:n) {
# data
y[i] ~ dnorm(mu,tau)
list( y=c(5.36,5.29,5.58,5.65,5.57,5.53,5.62,5.29,5.44,5.
}
5.79,5.10,5.27,5.39,5.42,5.47,5.63,5.34,5.46,5.30,
5.78,5.68,5.85), n=23, a_mu=10, b_mu=2, a_tau=2.5, b_
}
# data
# initialisation
list(y=c(5.36,5.29,5.58,5.65,5.57,5.53,5.62,5.29,5.44,5.34,
list( mu=5, tau=25)
5.79,5.10,5.27,5.39,5.42,5.47,5.63,5.34,5.46,5.30,
5.78,5.68,5.85),n=23,a_mu=10,b_mu=2,a_tau=2.5,b_tau=0.1)
node
mean
sd
MC error 2.5% me
mu 5.485 0.04192 4.311E-4 5.402 5.485
# initialisation
list(mu=5,tau=25)
results
node
mu
slide 14 of 20
mean
5.485
mu sample: 10000
10.0
sd
0.0419
2.5%
5.402
median
5.485
97.5%
5.568
5.0
0.0
5.2
5.3
5.4
5.5
5.6
comparing means
Cuckoo eggs found in m dunnock nests have diameters
x1, x2, ... , xn (mm). Cuckoo eggs found in n sedge warbler
nests have diameters y1, y2, ... , yn (mm). Do cuckoos lay
bigger eggs in the nests of dunnocks than in the nests of
sedge warblers?
Observation model
n
p(x1 , . . . , xn , y1 , . . . , yn | µx , τx , µy , τy ) = ∏ p(xi | µx , τx )p(yi | µy , τy )
i=1
xi | µx , τx ∼ Normal(µx , τx ),
yi | µy , τy ∼ Normal(µy , τy )
Prior
p(µx , τx , µy , τy ) = p(µx )p(τx )p(µy )p(τy )
µx ∼ Gamma(0.22, .01),
µy ∼ Gamma(0.22, .01),
slide 15 of 20
τx ∼ Gamma(0.1, 0.1)
τy ∼ Gamma(0.1, 0.1)
• plot these
with dissttool
comparing means (2)
cuckoo
WinBUGS
model {
for( i in 1 : m ) { x[i] ~ dnorm(mu_x,tau_x) }
for( i in 1 : n ) { y[i] ~ dnorm(mu_y,tau_y) }
mu_x ~ dgamma(0.22,0.01)
mu_y ~ dgamma( 0.22, 0.01)
tau_x ~ dgamma(0.1,0.1)
tau_y ~ dgamma( 0.1,0.1)
diff <- mu_x - mu_y
P <- step(diff)
}
model {
• do the sizes of cuckoo eggs
for(i in 1:m){ x[i] ~ dnorm(mu_x,tau_x) }
in dunnock nests have
for(i in 1:n){ y[i] ~ dnorm(mu_y,tau_y) }
greater variance than those
mu_x ~ dgamma(0.22,0.01)
mu_y ~ dgamma(0.22,0.01)
in sedge warbler nests?
tau_x ~ dgamma(0.1,0.1)
tau_y ~ dgamma(0.1,0.1)
list(x=c(22, 23.9, 20.9, 23.8, 25, 24, 21.7, 23.8, 22.8, 2
diff <- mu_x - mu_y
y=c(23.2, 22, 22.2, 21.2, 21.6, 21.9, 22, 22.9, 22.8),n=
P <- step(diff)
}
list(mu_x=22,mu_y=22,tau_x=1,tau_y=1)
# data
node
mean
sd
MC error 2.5% medi
list(x=c(22,23.9,20.9,23.8,25,24,21.7,23.8,22.8,23.1),m=10,
P 0.9562 0.2046 0.003929 0.0 1.0
delta 0.8847 0.5222 0.006927 -0.1572
y=c(23.2,22,22.2,21.2,21.6,21.9,22,22.9,22.8),n=9)
# init
diff sample: 4500
list(mu_x=22,mu_y=22,tau_x=1,tau_y=1)
1.0
results
node
diff
P
slide 16 of 20
mean
0.8847
0.9562
sd
0.2046
2.5%
−0.1572
median
0.882
0.75
0.5
0.25
0.0
97.5%
1.925
-2.0
0.0
2.0
linear regression
In1875, Scottish physicist James D. Forbes published a
study relating data on the boiling temperature of water
x1, x2, ... , xn (deg F) and the atmospheric pressure
y1, y2, ... , yn (inches of Hg). If water boils at
190 deg F, what is the atmospheric pressure?
Observation model
p(y1 , . . . , yn | µ1 , . . . , µn , τ) = ∏ p(yi | µi , τ)
i
yi | µi , τ ∼ Normal(µi , τ)
ln(µi ) = α + β (xi − x̄)
Prior
p(α, β , τ) = p(α)p(β )p(τ)
α ∼ Normal(0, 10−6 ),
slide 17 of 20
physics predicts
a straight line fit to ln(y)
as a function of x
β ∼ Normal(0, 10−6 )
τ ∼ Gamma(0.001, 0.001)
17.221, 17.062, 16.959, 16.881, 16.817, 16.3
n=31)
list(alpha=0,beta=0,tau=1)
linear regression
(2)
node
mean
sd
y190 19.42 0.3159
WinBUGS
model {
x_bar <- mean(x[ ])
for ( i in 1 : n ) {
log(mu[i]) <- alpha+beta*(x[i]-x_bar)
y[i] ~ dnorm(mu[i],tau)
}
alpha ~ dnorm( 0.0,1.E-6)
beta ~ dnorm( 0.0,1.E-6)
tau ~ dgamma(0.001,0.001)
y190 <- exp(alpha+beta*(190-x_bar))
}
model fit: mu
35.0
30.0
MC error 2.5% median 97
0.02019 18.78 19.42 20.12
Inference > Compare
• node = mu
• other = y
• axis = x
25.0
20.0
15.0
180.0
190.0
200.0
210.0
# data
list( x=c(210.8, 210.2, 208.4, 202.5, 200.6, 200.1, 199.5, 197, 196.4, 196.3, 195.6, 193.4, 193.6, 191.4,
191.1, 190.6, 189.5, 188.8, 188.5, 185.7, 186, 185.6, 184.1, 184.6, 184.1, 183.2, 182.4, 181.9, 181.9,
181.15, 180.6), P=c(29.211, 28.559, 27.972, 24.697, 23.726, 23.369, 23.03, 21.892, 21.928, 21.654,
21.605, 20.48, 20.212, 19.758, 19.49, 19.386, 18.869, 18.356, 18.507, 17.267, 17.221, 17.062, 16.959, 16.881,
16.817, 16.385, 16.235, 16.106, 15.928, 15.919, 25.376), n=31)
# init
list(alpha=0,beta=0,tau=1)
results
slide 18 of 20
• try y[i] ~
node
y190
mean
19.42
sd
0.3159
2.5%
18.78
median
19.42
dt(mu[i],tau,4)
97.5%
20.12
0.5
logistic regression
0.0
-5.0
0.0
5.0
10.0
0.0
1.0
2.0
3.0
•
pD
n1 lab mice are injected with a substance at log concentration x1, and y1 of
t
DIC
them
die.
122.306
2.834
127.974
Logistic regression
•
#########################################################
The experiment
repeated
times an
with
different
es of experiments,
n lab israts
are each4 given
injec-
i
concentrations, yielding
further
data (n2, x2, yX2i),(in
(n3g/ml);
, x3, y3),shortly
(n4, x4,aftery4).
substance
at concentration
xi
ni
Letting θi be the mortality rate for dosage -0.863 5
i rats die.
What dosage corresponds to a 50% chance of mortality?
•
(Xi ), the number of deaths can be modelled as
-0.296 5
!-0.053 5
1
yi | θi ∼ Binomial(ni ,model
θi ).
Observation
0.727 5
1/2
tion between mortality
dosage isi ,modelled
as
yi |rate
θi ∼and
Binomial(θ
ni )
logit(θi ) = α + β xi
� �� �
θ
Prior
log 1−θi
yi
0
1
3
5
x
0
!"/#
Rats i
plot prior
with dissttool
ind of study, a parameter
of
interest
is
x
=
−α/β
,
the
dosage
corre•
LD50
p(α, β ) = p(α)p(β )
(note: Matlab’s parametrization
g to 50% mortality rate, that is, logit(0.5) = α + β xLD50 .
of Normal differs from
α
∼
Normal(0,
.001),
β
∼
Normal(0,
.001)
WinBUGS usage)
’sslide
a WinBUGS
model.
19 of 20
c[1] sample: 5000
0.3
0.2
c[2] sample: 5000
logistic regression (2)
1.5
1.0
0.1
0.0
WinBUGS
-5.0
0.0
0.5
0.0
5.0
10.0
0.0
1.0
2.0
3.0
model {
• what log-concentration
for (i in 1:nx) {
Dbar
Dhat pD DIC
corresponds to a mortality
logit(theta[i])
<- alpha + beta*x[i]
y 125.140
122.306 2.834
127.974
11.4
Logistic
regression
probability of 1% ?
y[i] ~
dbin(theta[i],n[i])
}
#####################################################################
In a series
of experiments, ni lab rats are each given an injecalpha
~ dnorm(0.0,0.001)
tion of~ adnorm(0.0,0.001)
substance at concentration Xi (in g/ml); shortly afterxi
ni yi
beta
wards,<-yi (logit(0.50)-alpha)/beta
rats die. Letting θi be the mortality rate for dosage -0.863 5 0
LD50
}xi = log(Xi ), the number of deaths can be modelled as
-0.296 5 1
# data
-0.053 5 3
list(y=c(0,1,3,5),
yi | θn=c(5,5,5,5),
i ∼ Binomial(ni , θi ).
0.727 5 5
x=c(-0.863,-0.296,-0.053,0.727), nx=4)
# init
The relation between mortality rate and dosage is modelled as
list(alpha=0,beta=1)
1 1
logit(θi ) = α + β xi
�� � steps
� simulation
Theafter
results5000
after 5000
stepsare
are
The results
simulation
resultsnode
slide 20 of 20
θ
i
1−θ
sd log
i
Rats2.5%
y/ny/n
median 97.5%
node
mean meansd
2.5%
median
97.5%
alpha 1.274
1.067 -0.6333 1.193
3.607
alpha
1.274
3.607
In this kindbeta
of study,
a1.067
parameter
of interest
is10.44
xLD50 =
−α/β , the dosage corre11.38
5.56-0.6333
3.463 1.193
24.78
0
beta
11.38
5.56
3.463
10.44
24.78
LD50
-0.1052
0.09512
-0.2738
-0.1109
0.1196
sponding to 50% mortality rate, that is, logit(0.5) = α + β xLD50 . !1 0
0
LD50
0.09512
x
Here’s-0.1052
a WinBUGS
model. -0.2738 -0.1109 0.1196
!1
LD50
1
0
Download