model fit: mu 35.0 30.0 25.0 20.0 15.0 180.0 190.0 200.0 210.0 Statistical modelling with WinBUGS model { s_1 ~ dbin(theta_1,n_1) s_2 ~ dbin(theta_2,n_2) theta_1 ~ dbeta(0.5,0.5) theta_2 ~ dbeta(0.5,0.5) diff <- theta_1 - theta_2 P <- step(diff) } Robert Piché Tampere University of Technology # data list(s_1=15,n_1=18,s_2=21,n_2=23) # initialisation list( theta_1=0.8, theta_2=0.8) node mean sd MC error 2.5% median 97.5% start sample theta_1 0.8124 0.08634 0.001894 0.6214 0.8238 0.9449 1 2000 theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0.9804 1 2000 diff -0.08315 0.1064 0.002292 -0.3003 -0.0787 0.124 1 2000 P 0.2075 diff sample: 2000 6.0 4.0 1 The results after 5000 simulation steps are 2.0 0.0 -0.75 -0.5 -0.25 0.0 0.25 node mean sd 2.5% median 97.5% alpha 1.274 1.067 -0.6333 1.193 3.607 beta 11.38 5.56 3.463 10.44 24.78 LD50 -0.1052 0.09512 -0.2738 -0.1109 0.1196 y/n 0 !1 0 x 1 Literature • WinBUGS free download www.mrc-bsu.cam.ac.uk/bugs/ • Read the first 15 pages of www.stat.uiowa.edu/~gwoodwor/BBIText/AppendixBWinbugs.pdf • For a full intro to Bayesian statistics, take my course math.tut.fi/~piche/bayes • My course is based on Antti Penttinen’s course users.jyu.fi/~penttine/bayes09/ in this lesson we look at 6 basic statistical models model { s_1 ~ dbin(theta_1,n_1) s_2 ~ dbin(theta_2,n_2) theta_1 ~ dbeta(0.5,0.5) theta_2 ~ dbeta(0.5,0.5) diff <- theta_1 - theta_2 P <- step(diff) } • # data list(s_1=15,n_1=18,s_2=21,n_2=23) inferring a proportion # initialisation list( theta_1=0.8, theta_2=0.8) node mean sd MC error 2.5% median 97.5% start sample theta_1 0.8124 0.08634 0.001894 0.6214 0.8238 0.9449 1 2000 theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0.9804 1 2000 diff -0.08315 0.1064 0.002292 -0.3003 -0.0787 0.124 1 2000 P 0.2075 • comparing proportions • inferring a mean • Forbes diff sample: 2000 6.0 model { x_bar <- mean(x[ ]) 4.0 for ( i in 1 : n ) { log(mu[i]) <- alpha + beta * (x[i] - 2.0 x_bar) y[i] ~ dnormal(mu[i],tau,4) 0.0 } -0.75 alpha ~ dnorm( 0.0,1.0E-6) beta ~ dnorm( 0.0,1.0E-6) tau ~ dgamma(0.001,0.001) y190 <- exp(alpha + beta*(190 - x_bar)) } -0.5 -0.25 0.0 0.25 list( x=c(210.8, 210.2, 208.4, 202.5, 200.6, 200.1, 199.5, 197, 196.4, 196.3, 195.6, 193.4, 193.6, 191.4, 191.1, 190.6, 189.5, 188.8, 188.5, 185.7, 186, 185.6, 184.1, 184.6, 184.1, 183.2, 182.4, 181.9, 181.9, 181.15, 180.6), y=c(29.211, 28.559, 27.972, 24.697, 23.726, 23.369, 23.03, 21.892, 21.928, 21.654, 21.605, 20.48, 20.212, 19.758, 19.49, 19.386, 18.869, 18.356, 18.507, 17.267, 17.221, 17.062, 16.959, 16.881, 16.817, 16.385, 16.235, 16.106, 15.928, 15.919, 25.376), n=31) comparing means list(alpha=0,beta=0,tau=1) node mean sd y190 19.42 0.3159 • linear regression MC error 2.5% median 97.5% start sample 0.02019 18.78 19.42 20.12 4001 1000 model fit: mu 35.0 30.0 25.0 20.0 1 15.0 • slide 3 of 20 The results after 5000 simulation 200.0 steps are 180.0 190.0 logistic regression 210.0 node mean sd 2.5% median 97.5% alpha 1.274 1.067 -0.6333 1.193 3.607 beta 11.38 5.56 3.463 10.44 24.78 LD50 -0.1052 0.09512 -0.2738 -0.1109 0.1196 y/n 0 !1 0 x 1 inferring a proportion • What proportion of the population of Ireland supports the Lisbon agreement? (let’s call this: !) • When we ask n Irish adults, s of them say that they are in favour. • What can we conclude about ! ? 1. construct an observation model p( s | ! ) 2. construct a prior model p(!) 3. compute posterior p( ! | s ) with WinBUGS slide 4 of 20 inferring a proportion (2) is represented as yi = 1 for success an considered by Bayes and Laplace) is to The likelihood pmf for a single obse represented as yiyi==11for and yiin= favour 0 for failure. (first Let if success person i issuccess (“success”), otherwise yi =�0 θ(first (y = 1) � is•represented as yi = 1 for and yi = 0The for problem failure. The problem i nsidered by Bayes and Laplace) is to infer θ from a set of such observations.p(yi | θ ) = considered by Bayes and Laplace) is to infer θ from a set of such observations. 1 − θ (yi = 0) The likelihood pmf for a single observation is assume: givenpmf !, for thea probability that case i is a success is !, i.e. • The likelihood single observation is � � �1−yi θ (y�i = 1) where 00 is taken to be equal to 1. Th yi −θ) (yi ∈ {0,1−y 1}), =(yθi =(11) p(yi | θ ) = θ y i 1 − θ (y = 0) assumed to be mutually independ = θ (1 − θ ) thati is(y p(yi | θ ) = i i ∈ {0, 1}), 1 − θ (yi = 0) Observation model n here 00 is taken to be equal to 1. The likelihood pmf of a sequence y1 , . . . , yn yi 0 p(y | θ ) = θ (1 − 1:n ! where 0 is taken to be equal to 1. The likelihood pmf of a sequence y , . . . , y at is assumed to be mutually independent given θ is then n 1 assume: given !, the observations are statistically independent, so • i=1 that is assumed to be mutually independent given θ is then n p(y1:n | θ ) = ! θ yi (1 − θ )n1−yi = θ s (1 − θ )n−s , (2)"ni=1 yi is the number of suc where s = i=1 yi 1−yi that appears p(y = θ s (1the − θobservations )n−s , (2) in the like 1:n | θ ) = ! θ (1 − θ ) i=1 Note that s is the only equally featurewell of be stated with s treated as here s = "ni=1 yi is the number of successes. given !, is likelihood, a randomandvariable withproblem Binomial • Thus, hood distribution would be s | θ ∼ Binom e observations that appears in sthe the inference coulddistribution n where s = i=1 yi is the number of successes. Note that s is the only feature of ually well be stated"with s treated as the observation, in which case the likeliComputations are facilitated by taki the observations in the likelihood, and the inference problem could od distribution would be sthat | θ ∼appears Binomial(n, θ ). with parameter values chosen so as to g s | θ ∼ Binomial(θ , n) equally are wellfacilitated be statedbywith s treated the observation, in which the likeliComputations taking as prioras a beta distributionprior Beta(α, β )ofcase state knowledge. The beta distri hood distribution begive s | θ an ∼ acceptable Binomial(n, θ ). th parameter values chosenwould so as to approximation of your Computations taking Beta(α, orslide state knowledge. The are betafacilitated distributionbyhas pdf as prior a beta distribution#(α + β )β )α−1 5 of of 20 p(θ ) = θ (1 − θ )β − inferring a proportion (3) • The observation model (“likelihood”) tells us how we could generate random s, given ! : s | θ ∼ Binomial(θ , n) • Inference is the inverse problem: given s, what is ! ? Example:inference Opinion survey An opinion survey is conducted to determine the Bayesian • • • proportion θ of the population that is in favour of a certain policy. After some discussion with various experts, you determine that the prior belief has E(θ ) > 0.5, The unknown is treated a random but with a lotproportion of uncertainty.!The results of like the survey are that,variable out of n = 1000 respondents, s = 650 were in favour of the policy. What do you conclude? Its probability density function (pdf) in [0, 1] You decide on a prior distribution withlives E(θ ) = 2 0.6 and V(θ ) = 0.32 , which corresponds to a beta posterior distribution with α = 1 and β = 23 , that is, slide 6 of 20 The prior pdf (“before observation”) and posterior pdf p(θ )are ! (1related − θ )−1/3byθBayes’ ∈ [0, 1]. law: The survey resultsp(s give | θyou )p(θa )posterior distri� θ | y1:n ∼ Beta(651, 350.667), p(θ with | s) = bution density p(s | θ )p(θ whose mean and variance are ) dθ 2 20 p(!) p(!|y) prior 0 0 0 ! 1 said to be conjugate to a likelihood distribution if, for every prior chosen from C , The conjugate families and inference solutions that will be presented in this the posterior also belongs to C . In practice, conjugate families are parametrised, section are summarised below. and by choosing appropriate parameter values you can usually obtain a distribumodel of your prior yi tion | θ ∼that is an acceptable θ∼ θ | y1:n ∼ state of knowledge. ỹ | y1:n ∼ The conjugate families and inference solutions that will be presented in this Normal(θ , v)summarised Normal(m Normal(mn , v + wn ) 0 , w0 ) Normal(mn , wn ) section are below. α+s Binomial(1, θ ) Beta(α, β ) Beta(α + s, β + n − s) Binomial(1, α+β +n ) yi | θ ∼ θ∼ θ | y1:n ∼ ỹ | y1:n ∼ Poisson(θ ) Gamma(α, β ) Gamma(α + s, β + n) NegBin(α + s, β + n) Normal(θ , v) Normal(m0 , w0 ) Normal(mn , wn ) Normal(mn , v + wn ) Exp(θ ) Gamma(α, β ) Gamma(α + n, β + s) α+s Binomial(1, θ ) Beta(α, β ) Beta(α + s, β + n − s) Binomial(1, α+β +n ) n n 2 Normal(m, θ ) InvGam(α, β ) InvGam(α + , β + 2 β + n) 2 s0 ) NegBin(α + s, β + n) Poisson(θ ) Gamma(α, β ) Gamma(α + s, inferring a proportion (4) The prior, p(!) • It is our state of belief about !ȳ before we make observation m • We can use any pdf that lives in [0, 1], e.g. Unif(0, 1) • It’s convenient to use Beta distribution, with 2 parameters ">0, #>0 1 1 1 1 n 2 2 s =Exp(θ ,Gamma(α mn = ( w00 + +n,v/nβ )w Gamma(α, β )v/n + s) !ni=1 y) i , ȳ = s/n, n , s0 = n !i=1 (yi − m) wn = w0 + Normal(m, θ ) InvGam(α, β ) InvGam(α + 2n , β + n2 s20 ) The properties of the distributions are summarised below. ȳ 1 2 = 1 n (y − m)2 0 s = !ni=1 yi , ȳ = s/n, w1n = w10 + v/n , mn = ( m + )w , s n 0 w n !i=1 i v/n x∼ p(x) x∈ 0 E(x) mode(x) V(x) � � (x−µ)2 are summarised below. 1the distributions 2 2 The properties of √ R µ µ σ Normal(µ, σ ) exp − 2 2σ �n2πσ � x Binomial(n, p) p (1 − p)n−x {0, �(n + 1)p� V(x) np(1 − p) x∼ x ∈1, . . . , n} np E(x) mode(x) x p(x) � � (x−µ)2 2 ) "(α+β √ 1 ) exp R 1] Normal(µ, σ − α−1 (1 − 2 β −1 2σx) Beta(α, β ) x [0, 2πσ "(α)"(β � � x) Binomial(n, p) 1 nxx p−λ (1 − p)n−x {0, 1, . . . , n} Poisson(λ ) λ e {0, 1, . . .} x! α−1 β α"(α+β α−1) ex−β x (1 − x)β −1 (0, Beta(α, β ) [0,#) 1] Gamma(α, β ) "(α) x "(α)"(β ) 1 x −λ Poisson(λ ) {0,#) 1, . . .} Exp(λ ) λ ex!−λλx e (0, �x+α−1 β α �α−1β −β x Gamma(α, β ) x ( e )α ( 1 )x {0, (0,1, #). . .} NegBin(α, β ) "(α) α−1 β +1 β +1 x β αλ e−λ −(α+1) Exp(λ ) β ) (0,#) #) InvGam(α, x e−β /x (0, "(α) �x+α−1� β α 1 x µ ∈NegBin(α, R, σ > 0,βα β > (0,β +1 λ )>(0, n )∈ {1,{0, 2, 1, . . .}, ) > 0,α−1 . . .}p ∈ β +1 β α −(α+1) −β /x InvGam(α, β ) "(α) x e (0, #) • • 5.1 Beta(1, 1) = Unif µα µ α−1 np λ �(n + 1)p� np(1 − p) �λ � λ α+β αα α+β β λ1 λ α α β β 1β λ α−1 α [0, β 1] β α−1 small " and # give “vague” prior µ ∈ R, σ > 0, α > 0, β > 0, λ > 0, n ∈ {1, 2, . . .}, p ∈ [0, 1] α+β −2 α−1 α−1 α+β β −2 �λ 0 � α−1 β 0β α+1 β α+1 σ2 αβ (α+β )2 (α+β +1) αβ α 2 2 (α+β β ) (α+β +1) λ 12 λ αα ββ2 2 (β + 1) 1 β2 2 λ(α−1)2 (α−2) α (β + 1) β2 β2 (α−1)2 (α−2) • plot the Beta pdf with Estimating the mean of a normal likelihood Matlab’s disttool is the situation that was considered in sections 3 and 4: we have real-valued slide 7This of5.1 20 Estimating the mean of a normal likelihood is P(ỹ = 1 | y ) = E(θ | y ) = 0.6499. 1:n 1:n In the following WinBUGS model, we base the inference on t successes observed, and use the likelihood s | θ ∼ Binomial(n, θ ). T Introduction to Bayesian The 95% credibility interval found using t 95% credibility In interval found using the normal approximation is the following WinBUGS model, we sbase the inferenceθ ).on successes observed, and use the likelihood | θ ∼ Binomial(n, in the DAG denotes a constant. 0.6499 ± 1.96 · 0.0151 inferring a proportion (5) successes observed, and use the likelihood s | θ ∼ Binomial(n, θ) 0.6499 1.96 ·denotes 0.0151 = a (0.620, 0.679), in the ± DAG constant. which agrees (to three decimals) with the 9 in the DAG denotes a constant. Initial theta model { values WinBUGS model the using inverse beta cdf. The probability that Ot ch agrees (to three decimals) with the 95% credibility interval computed is P(ỹ = 1 | y1:n ) = E(θ | y1:n ) = 0.6499. theta model { 2 s ∼ dbin(theta,n) inverse beta cdf. The probability that Opinion the 1001st In the following WinBUGS model, w survey respondent will be in favour theta ∼ dbin(theta,n) successes observed, and use the likelihood can automatically generate initial values for ∼ dbeta(1,0.667) (ỹ = 1 | y1:n model ) = E(θ•|theta ys{WinBUGS 1:n ) = 0.6499. in the DAG denotes a constant. theta ∼ dbeta(1,0.667) In the followingsWinBUGS model, we base the inference on the number of using gen inits ∼ dbin(theta,n) p(!) ypred ∼ dbin(theta,1) ypred cesses observed, theta and ypred use the likelihood s | θ ∼ Binomial(n, θ ). The rectangle model { ∼ dbin(theta,1) ypred ∼ dbeta(1,0.667) } s ∼ dbin(theta,n) he DAG denotes a constant. have informative prior information } • Fine theta ∼ dbeta(1,0.667) ypred ∼ ifdbin(theta,1) ypred 0 ypred ∼ dbin(theta,1) el { } The data are entered as theta The data are fairly entered‘vague’ as • If have priors, s ∼ dbin(theta,n) theta ∼ dbeta(1,0.667) values list list(s=650,n=1000) Thelist(s=650,n=1000) data are entered as ypred ∼ dbin(theta,1) ypred data 0n } ! 1 better to provide reasonable The data are entered as • The•results after 2000 simulation steps are try s=65, n=100 The results after 2000 simulation steps are Initial values list can be after model description or in a separ initialisation list(s=650,n=1000) node mean sd The results after 2000 simulation steps are n=10000 • try s=6500, data are entered as theta 0.6497 0.01548 node mean sd 2.5% median 97.5% s try dbeta(1,1) list(s=650,n=1000) list(theta=0.1) ypred 0.6335 node mean sd 2.5% median 97.5% 0.6497 0.01548 The results aftertheta 2000 simulation steps are 0.6185 0.6499 0.680 t(s=650,n=1000) theta 0.6335 0.6497 0.01548 0.6185 0.6499 0.680 ypred results results after 2000 simulation steps are node mean sd 2.5% median 97. ypred 0.6335 node theta ypred slide 8 of 20 mean 0.6497 0.6335 sd 2.5% median0.01548 97.5% theta 0.6497 0.01548 0.6185 0.6499 0.6802 ypred 0.6335 95% credibility interval is [0.6185,0.6802] 0.6185 0.6499 0.68 m { } comparing proportions • In a study of larynx cancer patients, s1 of the n1 patients who were treated with radiation therapy were cured, compared to s2 of the n2 patients who were treated with surgery. • What can we say about !1 (success rate of radiation) vs !2 (surgery)? 1. construct an observation model p( s | ! ) 2. construct a prior model p(!) 3. compute posterior p( ! | s ) with WinBUGS 4. compute posterior probability that (!1 $ !2 ) slide 9 of 20 comparing proportions (2) Observation model • assume: given !=(!1,!2), the probability that a radiation therapy patient is cured is !1 and the probability that a surgery therapy patient is cured is !2 • • assume: given !, the observations are independent i.e. p(s1 , s2 | θ1 , θ2 ) = p(s1 | θ1 )p(s2 | θ2 ) s1 |θ1 ∼ Binomial(θ1 , n1 ) s2 |θ2 ∼ Binomial(θ2 , n2 ) The prior, p(!1,!2) independent vague slide 10 of 20 p(θ1 , θ2 ) = p(θ1 )p(θ2 ) θ1 ∼ Beta(0.5, 0.5) θ2 ∼ Beta(0.5, 0.5) • plot these with dissttool comparing proportions (3) WinBUGS model { s_1 ~ dbin(theta_1,n_1) s_2 ~ dbin(theta_2,n_2) theta_1 ~ dbeta(0.5,0.5) theta_2 ~ dbeta(0.5,0.5) diff <- theta_1-theta_2 P <- step(diff) } # data list(s_1=15,n_1=18,s_2=21,n_2=23) # initialisation list(theta_1=0.8,theta_2=0.8) results node theta 1 theta 2 diff P slide 11 of 20 mean 0.8124 0.8955 −0.08315 0.2075 model { variables that are deterministic s_1 ~ dbin(theta_1,n_1) s_2 ~ dbin(theta_2,n_2) functions of stochastic variables are theta_1 ~ dbeta(0.5,0.5) theta_2 with ~ dbeta(0.5,0.5) specified <diff <- theta_1 - theta_2 P <- step(diff) } step(diff) = 1 if diff ! 0, # data list(s_1=15,n_1=18,s_2=21,n_2=23) = 0 otherwise # initialisation list( theta_1=0.8, theta_2=0.8) comments are indicated with # node mean sd MC error 2.5% median 97 theta_1 0.8124 0.08634 0.001894 0.6214 0.8238 theta_2 0.8955 0.0619 0.00152 0.7501 0.9061 0 diff -0.08315 0.1064 0.002292 -0.3003 -0.0787 P 0.2075 diff sample: 2000 sd 0.0863 0.0619 0.1064 2.5% 0.6214 0.7501 0.3003 median 0.8238 0.9061 −0.0787 97.5% 0.9449 0.9804 0.124 6.0 4.0 2.0 0.0 -0.75 -0.5 -0.25 0.0 0.25 inferring a mean In 1798, Henry Cavendish performed experiments to measure the specific density of the Earth (!). He repeated the experiment n times, obtaining results y1, y2, ... , yn . What can we conclude about !? http://www.jstor.org/pss/106988 Observation model Assume that observation noise is zero-mean gaussian with precision " , and that noises are independent given " and ! yi = µ + ei ei | µ, τ ∼ Normal(0, τ) n slide 12 of 20 p(y1 , . . . , yn | µ, τ) = ∏ p(yi | µ, τ) i=1 precision = 1/variance The conjugate familiesβand will be presented α+β inα+s this Binomial(1, θ ) Beta(α, ) inference Beta(αsolutions + s, β + that n − s) Binomial(1, +n ) section are summarised below. Poisson(θ ) Gamma(α, β ) Gamma(α + s, β + n) NegBin(α + s, β + n) yExp(θ θGamma(α, ∼ θGamma(α | y1:n ∼ + n, β + s) ỹ | y1:n ∼ i |θ ∼ ) β) inferring a mean (2) Normal(θ , v) Normal(m, θ) Normal(m w)0 ) Normal(m , w2nn,)β + n2 s20 ) InvGam(α, InvGam(αn+ 0 ,β Normal(mn , v + wn ) α+s ȳ n − s) 2 1 β )1 1 Beta(α +ms, 1 n n Binomial(1, θ ) Beta(α, Binomial(1, 2 +n ) 0 β + s = !i=1 yi , ȳ = s/n, wn = w0 + v/n , mn = ( w0 + v/n )wn , s0 = n !i=1 (yi − m) α+β µ~Gamma(10,2) Poisson(θ ) Gamma(α, β ) Gamma(α + s, β + n) NegBin(α + s, β + n) The properties of the distributions are summarised below. p(µ) Exp(θ ) Gamma(α, β ) Gamma(α + n, β + s) The prior, p(",#) Assume independence: • Normal(m, x∼ θ ) p(x) InvGam(α, β ) Normal(µ, σ 2) n � p(µ,xτ) = p(µ)p(τ) n E(x)n 2mode(x) ∈ � InvGam(α + 2 , β + 2 s0 ) 2 √ 1 exp − (x−µ) 1 2σ 2 1 2πσ 1 s/n, �n� x wn = wn−x+ v/n , 0 p (1 − p) x ȳ µ + v/n )wn , {0, 1, . . . , n} np R canyi ,use = !i=1 ȳ = any pdfs that live in [0, !) • sWe Binomial(n, p) • 0 mn = ( m w0 xPoisson(λ ∼ ) Normal(µ, Gamma(α,σβ2)) Binomial(n, p) Exp(λ ) α+β x ∈1, . . .} {0, R #) (0, {0,#) 1, . . . , n} (0, � # give � small " and “vague” prior •Beta(α, NegBin(α, β ) ( ) ( ) {0, 1, . . .} β) x (1 − x) [0, 1] α−1 α+β −2 mode(x) λE(x) �λ � α µ β 1np λ α β α α+β β α−1 λ σ2 s0 = 1n !ni=1 (yi − m)2 �(n + 1)p� np(1 − p) 0 The properties of the distributions are summarised below. "(α+β α It’s convenient to) xα−1 use(1 Gamma Beta(α, β) − x)β −1 [0,distributions 1] "(α)"(β ) 1 x −λ p(x) x! λ e � � 2 α (x−µ) α−1 e−β x √β1 xexp − 2σ 2 "(α) 2πσ �n� x x − p)n−x p (1 λx e−λ µ2 V(x) α−1 µβ 0�(n + 1)p� ☝ αβ 2 (α+β ) (α+β +1) µ ☝ 10 lead ore 7.5 granite λV(x) 2.5 ασ 2 β2 1np(1 − p) λ2 α (β +αβ 1) β2 2 (α+β +1) ! ~ Gamma(2.5,0.1) β 2) (α+β (α−1) λ 2 (α−2) β x+α−1 1 x "(α+β ) α−1 α α−1 α−1 β +1 β +1β −1 α+β −2 "(α)"(β ) β α −(α+1) −β /x β InvGam(α, β ) 1"(α)x x−λ e (0, #) α+1 Poisson(λ ) λ e {0, 1, . . .} �λ � x! µ ∈ R, σ > 0, α > 0, β > 0, λ > 0, n ∈ {1, 2, . . .}, p ∈ [0, 1] β α α−1 −β x α α−1 α Gamma(α, β ) "(α) x e (0, #) β β p(!) β 2 1 1 Exp(λ ) λ e−λ x (0, #) 0 λ λ2 �x+α−1� β α 1 x α α NegBin(α, β ) ( ) ( ) {0, 1, . . .} (β + 1) α−1 β +1 β +1 β β2 β α −(α+1) −β /x β β β2 InvGam(α, β ) "(α) x #) α−1 4: α+1 (α−1)2 (α−2) This is the situation that wase considered (0, in sections 3 and we have real-valued 5.1 Estimating the mean of a normal likelihood note: Matlab’s Gamma parameters are A= ", B= 1/# "=25 #=0.2 ☟ µ ∈ R, σ > 0,y1α, .>. .0, β that > 0,are λ >assumed 0, n ∈ {1, 2,be. . mutually .}, p ∈ [0,independent 1] 0 observations , y to given θ and n slide 13 of 20 ! 40 inferring a mean (3) WinBUGS model { mu ~ dgamma(a_mu,b_mu) try y1 = 15.36, i.e. “outlier” tau ~ dgamma(a_tau,b_tau) for( i intry 1 : n“robust” ){ distribution y[i] ~ dnorm(mu,tau) y[i] ~ dt(mu,tau,4) } } • • model { mu ~ dgamma(a_mu,b_mu) tau ~ dgamma(a_tau,b_tau) for (i in 1:n) { # data y[i] ~ dnorm(mu,tau) list( y=c(5.36,5.29,5.58,5.65,5.57,5.53,5.62,5.29,5.44,5. } 5.79,5.10,5.27,5.39,5.42,5.47,5.63,5.34,5.46,5.30, 5.78,5.68,5.85), n=23, a_mu=10, b_mu=2, a_tau=2.5, b_ } # data # initialisation list(y=c(5.36,5.29,5.58,5.65,5.57,5.53,5.62,5.29,5.44,5.34, list( mu=5, tau=25) 5.79,5.10,5.27,5.39,5.42,5.47,5.63,5.34,5.46,5.30, 5.78,5.68,5.85),n=23,a_mu=10,b_mu=2,a_tau=2.5,b_tau=0.1) node mean sd MC error 2.5% me mu 5.485 0.04192 4.311E-4 5.402 5.485 # initialisation list(mu=5,tau=25) results node mu slide 14 of 20 mean 5.485 mu sample: 10000 10.0 sd 0.0419 2.5% 5.402 median 5.485 97.5% 5.568 5.0 0.0 5.2 5.3 5.4 5.5 5.6 comparing means Cuckoo eggs found in m dunnock nests have diameters x1, x2, ... , xn (mm). Cuckoo eggs found in n sedge warbler nests have diameters y1, y2, ... , yn (mm). Do cuckoos lay bigger eggs in the nests of dunnocks than in the nests of sedge warblers? Observation model n p(x1 , . . . , xn , y1 , . . . , yn | µx , τx , µy , τy ) = ∏ p(xi | µx , τx )p(yi | µy , τy ) i=1 xi | µx , τx ∼ Normal(µx , τx ), yi | µy , τy ∼ Normal(µy , τy ) Prior p(µx , τx , µy , τy ) = p(µx )p(τx )p(µy )p(τy ) µx ∼ Gamma(0.22, .01), µy ∼ Gamma(0.22, .01), slide 15 of 20 τx ∼ Gamma(0.1, 0.1) τy ∼ Gamma(0.1, 0.1) • plot these with dissttool comparing means (2) cuckoo WinBUGS model { for( i in 1 : m ) { x[i] ~ dnorm(mu_x,tau_x) } for( i in 1 : n ) { y[i] ~ dnorm(mu_y,tau_y) } mu_x ~ dgamma(0.22,0.01) mu_y ~ dgamma( 0.22, 0.01) tau_x ~ dgamma(0.1,0.1) tau_y ~ dgamma( 0.1,0.1) diff <- mu_x - mu_y P <- step(diff) } model { • do the sizes of cuckoo eggs for(i in 1:m){ x[i] ~ dnorm(mu_x,tau_x) } in dunnock nests have for(i in 1:n){ y[i] ~ dnorm(mu_y,tau_y) } greater variance than those mu_x ~ dgamma(0.22,0.01) mu_y ~ dgamma(0.22,0.01) in sedge warbler nests? tau_x ~ dgamma(0.1,0.1) tau_y ~ dgamma(0.1,0.1) list(x=c(22, 23.9, 20.9, 23.8, 25, 24, 21.7, 23.8, 22.8, 2 diff <- mu_x - mu_y y=c(23.2, 22, 22.2, 21.2, 21.6, 21.9, 22, 22.9, 22.8),n= P <- step(diff) } list(mu_x=22,mu_y=22,tau_x=1,tau_y=1) # data node mean sd MC error 2.5% medi list(x=c(22,23.9,20.9,23.8,25,24,21.7,23.8,22.8,23.1),m=10, P 0.9562 0.2046 0.003929 0.0 1.0 delta 0.8847 0.5222 0.006927 -0.1572 y=c(23.2,22,22.2,21.2,21.6,21.9,22,22.9,22.8),n=9) # init diff sample: 4500 list(mu_x=22,mu_y=22,tau_x=1,tau_y=1) 1.0 results node diff P slide 16 of 20 mean 0.8847 0.9562 sd 0.2046 2.5% −0.1572 median 0.882 0.75 0.5 0.25 0.0 97.5% 1.925 -2.0 0.0 2.0 linear regression In1875, Scottish physicist James D. Forbes published a study relating data on the boiling temperature of water x1, x2, ... , xn (deg F) and the atmospheric pressure y1, y2, ... , yn (inches of Hg). If water boils at 190 deg F, what is the atmospheric pressure? Observation model p(y1 , . . . , yn | µ1 , . . . , µn , τ) = ∏ p(yi | µi , τ) i yi | µi , τ ∼ Normal(µi , τ) ln(µi ) = α + β (xi − x̄) Prior p(α, β , τ) = p(α)p(β )p(τ) α ∼ Normal(0, 10−6 ), slide 17 of 20 physics predicts a straight line fit to ln(y) as a function of x β ∼ Normal(0, 10−6 ) τ ∼ Gamma(0.001, 0.001) 17.221, 17.062, 16.959, 16.881, 16.817, 16.3 n=31) list(alpha=0,beta=0,tau=1) linear regression (2) node mean sd y190 19.42 0.3159 WinBUGS model { x_bar <- mean(x[ ]) for ( i in 1 : n ) { log(mu[i]) <- alpha+beta*(x[i]-x_bar) y[i] ~ dnorm(mu[i],tau) } alpha ~ dnorm( 0.0,1.E-6) beta ~ dnorm( 0.0,1.E-6) tau ~ dgamma(0.001,0.001) y190 <- exp(alpha+beta*(190-x_bar)) } model fit: mu 35.0 30.0 MC error 2.5% median 97 0.02019 18.78 19.42 20.12 Inference > Compare • node = mu • other = y • axis = x 25.0 20.0 15.0 180.0 190.0 200.0 210.0 # data list( x=c(210.8, 210.2, 208.4, 202.5, 200.6, 200.1, 199.5, 197, 196.4, 196.3, 195.6, 193.4, 193.6, 191.4, 191.1, 190.6, 189.5, 188.8, 188.5, 185.7, 186, 185.6, 184.1, 184.6, 184.1, 183.2, 182.4, 181.9, 181.9, 181.15, 180.6), P=c(29.211, 28.559, 27.972, 24.697, 23.726, 23.369, 23.03, 21.892, 21.928, 21.654, 21.605, 20.48, 20.212, 19.758, 19.49, 19.386, 18.869, 18.356, 18.507, 17.267, 17.221, 17.062, 16.959, 16.881, 16.817, 16.385, 16.235, 16.106, 15.928, 15.919, 25.376), n=31) # init list(alpha=0,beta=0,tau=1) results slide 18 of 20 • try y[i] ~ node y190 mean 19.42 sd 0.3159 2.5% 18.78 median 19.42 dt(mu[i],tau,4) 97.5% 20.12 0.5 logistic regression 0.0 -5.0 0.0 5.0 10.0 0.0 1.0 2.0 3.0 • pD n1 lab mice are injected with a substance at log concentration x1, and y1 of t DIC them die. 122.306 2.834 127.974 Logistic regression • ######################################################### The experiment repeated times an with different es of experiments, n lab israts are each4 given injec- i concentrations, yielding further data (n2, x2, yX2i),(in (n3g/ml); , x3, y3),shortly (n4, x4,aftery4). substance at concentration xi ni Letting θi be the mortality rate for dosage -0.863 5 i rats die. What dosage corresponds to a 50% chance of mortality? • (Xi ), the number of deaths can be modelled as -0.296 5 !-0.053 5 1 yi | θi ∼ Binomial(ni ,model θi ). Observation 0.727 5 1/2 tion between mortality dosage isi ,modelled as yi |rate θi ∼and Binomial(θ ni ) logit(θi ) = α + β xi � �� � θ Prior log 1−θi yi 0 1 3 5 x 0 !"/# Rats i plot prior with dissttool ind of study, a parameter of interest is x = −α/β , the dosage corre• LD50 p(α, β ) = p(α)p(β ) (note: Matlab’s parametrization g to 50% mortality rate, that is, logit(0.5) = α + β xLD50 . of Normal differs from α ∼ Normal(0, .001), β ∼ Normal(0, .001) WinBUGS usage) ’sslide a WinBUGS model. 19 of 20 c[1] sample: 5000 0.3 0.2 c[2] sample: 5000 logistic regression (2) 1.5 1.0 0.1 0.0 WinBUGS -5.0 0.0 0.5 0.0 5.0 10.0 0.0 1.0 2.0 3.0 model { • what log-concentration for (i in 1:nx) { Dbar Dhat pD DIC corresponds to a mortality logit(theta[i]) <- alpha + beta*x[i] y 125.140 122.306 2.834 127.974 11.4 Logistic regression probability of 1% ? y[i] ~ dbin(theta[i],n[i]) } ##################################################################### In a series of experiments, ni lab rats are each given an injecalpha ~ dnorm(0.0,0.001) tion of~ adnorm(0.0,0.001) substance at concentration Xi (in g/ml); shortly afterxi ni yi beta wards,<-yi (logit(0.50)-alpha)/beta rats die. Letting θi be the mortality rate for dosage -0.863 5 0 LD50 }xi = log(Xi ), the number of deaths can be modelled as -0.296 5 1 # data -0.053 5 3 list(y=c(0,1,3,5), yi | θn=c(5,5,5,5), i ∼ Binomial(ni , θi ). 0.727 5 5 x=c(-0.863,-0.296,-0.053,0.727), nx=4) # init The relation between mortality rate and dosage is modelled as list(alpha=0,beta=1) 1 1 logit(θi ) = α + β xi �� � steps � simulation Theafter results5000 after 5000 stepsare are The results simulation resultsnode slide 20 of 20 θ i 1−θ sd log i Rats2.5% y/ny/n median 97.5% node mean meansd 2.5% median 97.5% alpha 1.274 1.067 -0.6333 1.193 3.607 alpha 1.274 3.607 In this kindbeta of study, a1.067 parameter of interest is10.44 xLD50 = −α/β , the dosage corre11.38 5.56-0.6333 3.463 1.193 24.78 0 beta 11.38 5.56 3.463 10.44 24.78 LD50 -0.1052 0.09512 -0.2738 -0.1109 0.1196 sponding to 50% mortality rate, that is, logit(0.5) = α + β xLD50 . !1 0 0 LD50 0.09512 x Here’s-0.1052 a WinBUGS model. -0.2738 -0.1109 0.1196 !1 LD50 1 0