Bayesian Inference for Binomial Probabilities, Parameter of interest = constant probability of success on each of n independent and identical dichotomous (Bernoulli) trials. Prior Distribution of the parameter of interest ~ Uniform (0, 1) = Beta (1, 1) 1, 0 1 g 0, elsewhere or ~ Beta(, ) k 1 1 1 , 0 1 g elsewhere 0, E Var Mode 1 2 1 , , 1 2 Median G 1 0.5 687313075 1 2/5/2016 where G is the CDF of the Beta distribution with PDF g, i.e., where G g u du u Likelihood Function Y | Binomial(n, ) likelihood function n y n y 1 , y 0,1, , n f y | y 0, elsewhere (1) E Y | n Var Y | n 1 E Y n | Var Y n | 1 n Figure 1 This is actually the sampling distribution of binomial random variable Y, i.e., the binomial probability mass function (PMF), Equation (1), as a function of Y, for particular fixed values of the binomial parameters n and 687313075 2 2/5/2016 π. The Likelihood function, on the other hand, is Equation (1) viewed as a function of π for fixed Y and n. Posterior | Y~ Beta ( + y, + n – y) Theorem The Beta family of distributions is the conjugate family of priors to the binomial likelihood, i.e., if (i) the prior distribution is Beta(, ), and (ii) the likelihood function is Binomial(n, ), then the posterior distribution is Beta( + y, + n – y). Proof: posterior = constant × prior × likelihood g | y kg f y | k 1 1 1 k y 1 1 k 1 1 y 1 n y n y 1 1 , y, n y Note. The Beta family of priors is the conjugate family of priors for the binomial likelihood. Conjugate means that the posterior distribution family is the same as the prior distribution family. 687313075 3 2/5/2016 The constant of integration, i.e., the denominator of Bayes Law, can be found easily without integration. Distribution Mean prior likelihood Y | posterior | Y y ~ Beta(, ) Y | ~ |Y ~ Beta( + y, + n – y) Binomial(n, ) = Beta(’, ’) E E Y | n E | Y y Var Var Y | Var | Y y 1 n 1 2 1 (Equivalent) equivalent = n sample size neq 1 Variance 2 neq 1 1 n Methods for assigning prior for Choose a vague, uninformative conjugate prior for Uniform: ~ Beta(1, 1) Jeffreys’ prior for binomial: ~ Beta(0.5, 0.5) Choose a conjugate prior for matching prior belief about location and scale of the distribution of Beliefs held (taken as given) prior to evidence (data): Location() = mean() = 0, a specific value in accord with prior beliefs. Scale() = SD() = 0 a specific value according to prior beliefs. Given this notation, our prior beliefs about the location and scale of are 687313075 4 2/5/2016 E 0 (2) Var 02 (3) This yields two equations in two unknowns: 0 (4) 02 1 (5) 2 From (4) it’s easy to prove 1 0 (6) 0 1 0 02 1 (7) Substituting (4) and (6) into (5), we get which suggests the form of the variance of a binomial random variable. Therefore, we call denominator of the left hand side of (7) the equivalent sample size and denote the equivalent sample size by neq, and denote the equivalent sample size of the prior by n0 to match0 and 0: neq n0 1 (8) With this notation, equation (7) becomes 0 1 0 n0 02 (9) Solving (9) for the equivalent sample size, we get 687313075 5 2/5/2016 0 1 0 02 (10) 0 n0 1 (11) 1 0 n0 1 (12) n0 From (4) From (6) Summary In summary, based on (i) our prior beliefs, (2) and (3) about the location and scale of the distribution g of the unknown population proportion (or the unknown probability of success), and (ii) the assumption that the prior g is a member of the Beta family, we get equations (4) and (5), which, after define the prior sample size equivalent n0, yields formulas (11) and (12) for the prior Beta parameters and . Example (Bolstad 2007, Exercise 8.1) In order to determine how effective a magazine is at reaching its target audience, a market research company selects a random sample of people from the target audience and interviews them. Out of the 150 people in the sample, 29 had seen the latest issue. Underlying random variable of interest (from survey): whether or not the ith randomly sampled person had seen the latest issue (categorical). This is an iid Bernoulli (0, 1) random variable, the indicator that a person has seen the latest issue 1, if the j-th person sampled has seen the lastest issue Ij 0, otherwise 687313075 6 (13) 2/5/2016 Parameter of interest: = the proportion of the target audience (population) that have seen the latest issue. P I j 1 P j-th person sampled saw the latest issue (14) Statistic: Y = the number of people in a sample of size n who have seen the latest issue, the sample sum of the indicator n Y Ij (15) j 1 Sample: Observe y = 29, n = 150. a) What is the distribution of y, the number who have seen the latest issue. Likelihood(Y|) = Binomial (150, ) b) Use a uniform prior for , what is the posterior distribution of . Prior() = Uniform = Beta(1, 1) Posterior( |Y = 29) = Beta(1 + 29, 1 + 150 – 29) = Beta(30, 122) Prior Mean = E() = 1/(1+1) = ½ = 0.500 c) Posterior Mean = E( |Y = 29) = 30/(30 + 122) = 30/152 = 0.197 12 10 PDF 8 6 4 2 0 0 0.2 0.4 0.6 0.8 1 pi 687313075 7 2/5/2016 d) Change in mean (from prior to posterior) is negative. I.e., change in belief about expectation of is negative. e) Makes sense because sample proportion was 29/150 = 0.193 < prior mean = 0.500. f) (Notice that posterior mean, the Bayes estimator of , is very close to the unbiased frequentist estimator, the sample proportion 0.193.) g) (Notice that it is slightly greater. Why? Because the prior was greater than the sample. The bias in the Bayes estimator comes from the (incorporation of) prior (beliefs).) h) Prior Var() = (1)(1)/[(1+1)2(1+1+1)] = 1/12 = 0.0833= 0.2892 i) Prior SD() = 0.289 j) Posterior Variance(|Y = 29) = (30)(122)/[(30 + 122)2(30 + 122 + 1)] = 0.001035 k) Posterior SD(|Y = 29) = 0.0322 l) Variance of has decreased because of added information. m) (Declined by nearly a factor of 10, because of a lot of information n = 150, compared to prior equivalent prior sample size of n0 = (1 + 1 + 1) = 3). n) Equivalent prior sample size = n0 = (1 + 1 + 1) = 3). o) Equivalent posterior sample size = n1 = (30 + 122 + 1) = 153). p) Difference is sample size of n = 150. (makes sense) q) Exact 95% Bayesian credible interval: use Minitab, JMP, SAS, etc., to compute endpoints: the 2.5 percentile and the 97.5 percentile of Beta(30, 122) 0.025 quantile of Beta (30, 122) is Q0.025 0.1382 0.975 quantile of Beta (30, 122) is Q0.975 0.2640 Exact 95% credible interval for π is [0.138, 0.264] r) Approximate 95% Bayesian Credible Interval: Use normal approximation with | Y 29 0.197, 2 | Y 29 0.0833, | Y 29 0.289 Approximate lower 95% confidence limit is | y z0.975 | y 0.197 1.96 0.289 0.197 0.564 [0.134, 0.260] (16) Approximate 95% credible interval for π is [0.134, 0.260] 687313075 8 2/5/2016 Example (Bolstad 2007, Exercise 8.3) Sophie, the editor of the school newspaper, is going to conduct a survey of the students to to determine the level of support for the current president of the student association. She needs to determine her prior distribution for π, the proportion of students who support the president. She decides the prior mean is 0.5 and the prior standard deviation is 0.15. Underlying random variable of interest (from survey): whether or not the ith randomly sampled student supports the current president of the student association (categorical, dichotomous). Parameter of interest: = the proportion of students who support the president. (a) Determine the beta(a, b) prior that matches her prior belief. Given the prior mean of 0.5 and the prior standard deviation of 0.15, we substitute into equations (4) and (5) we get 0 0.5, 0 0.15 0 1 0 n0 02 0.5 1 0.5 0.152 n0 n0 0.5 1 0.5 11.11 0.152 n0 a b 1 11.11 a b 10.11 0 a ab 0.5 a 10.11 a 5.055 b 5.055 687313075 9 2/5/2016 (b) What is the equivalent sample size of her prior? n0 a b 1 11.11 (c) Out of the 68 students that she polls, y = 21 support the current president. Determine the posterior distribution. Statistic: Y = the number of students in a sample of size n who support the president. Sample: Observe Y = 21, n = 68. g | y 21 beta a y, b n y beta 5.055 21, 5.055 68 21 beta 26.06, 52.06 (d) NOT IN BOLSTAD. We can also find the posterior mean, variance, and standard deviation, and we can graph the prior and the posterior. This is especially easy with the use of BetaBinomialBetaPDFandCDF.JMP, but, of course, we should all know how to do it by hand. 687313075 10 2/5/2016 prior posterior a a = 5.06 a' = 26.06 b b = 5.06 b' = 52.06 neq n0 = 11.1 n1 = 79.11 E(π) = µπ = 0.500 E(π|y) = µπ|y = 0.334 mean 2 variance Var(π) = | y 0.225 standard deviation Var(π|y) = 2| y 0.00281 0.15 0.0530 exact lower 95% Q0.025 = 0.2135 Q0.025 = 0.234 exact upper 95% Q0.975 = 0.7865 Q0.975 = 0.441 0.230 approx upper 95% 0.437 PDF approx lower 95% 7 6 5 4 3 2 1 0 0 0.2 0.4 0.6 0.8 1 pi 687313075 11 2/5/2016 Example (Bolstad 2007, Exercise 8.5) In a research program on human health risk from recreational contact with water contaminated with pathogenic microbial material, the National Institute of Water and Atmosphere (NIWA) instituted a study to determine the quality of New Zealand stream water at a variety of catchment types. there were 116 one-liter water samples from sites identified as having a heavy environmental impact from birds (seagulls) and waterfowl. Out of these samples, 17 contained Giardia cysts. (a) What is the distribution of Y, the number of samples containing Giardia cysts? Likelihood: P(Y = 17|) = Binomial (116, ) 17 1 116 17 17 1 , a 99 function of Figure 2 The likelihood as it is used, as a function of the parameter of interest, in this case π, for fixed value of the sufficient statistic, in this case Y = 17. The sample size n is also fixed. Not being the parameter of interest, i.e., not being the parameter about which we want to make inferences, n, is called an auxiliary parameter, or a nuisance parameter. (b) Let π be the true probability that a one-liter water sample from this type of site contains Giarida cysts. Use a beta (1, 4) prior for π. Find the posterior distribution of π given y. Prior() = Uniform = Beta(1, 4) Posterior( |Y = 17) = Beta(1 + 17, 1 + 116 – 17) = Beta(18, 100) 687313075 12 2/5/2016 (c) Summarize the posterior distribution by its first two moments. Posterior Mean = E( |Y = 17) = 18/(18 + 100) = 18/118 = 0.1488 Posterior Variance, Var(|Y = 17) = (18)(100)/[(18 + 100)2(18 + 100 + 1)] = 0.001038 = (0.03222)2 (d) Find the normal approximation to the posterior distribution g(π | y). Normal 0.1488, 2 0.032222 687313075 13 2/5/2016 (e) Compute a 95% credible interval for π using the normal approximation found in (d). | y 0.025 ˆ | y z0.025 ˆ | y 0.07623 1.96 0.01934 0.07623 0.03791 0.03834 | y 0.975 ˆ | y z0.975 ˆ | y 0.07623 1.96 0.01934 0.07623 0.03791 0.1142 Example (Bolstad 2007, Exercise 8.7) The same study found that 12 out of 174 samples contained Giardia cysts in a environment having a high impact from sheep. (a) What is the distribution of Y, the number of samples containing Giardia cysts? Likelihood P(Y =12 |) = Binomial (n = 174, Y = 12) (b) Let π be the true probability that a one-liter water sample from this type of site contains Giardia cysts. Use a beta(1, 4) prior for π. Find the posterior distribution of π given y. Y | Beta 1 12, 4 174 12 Beta 13,166 (c) 687313075 14 2/5/2016 (d) Summarize the posterior distribution by its first two moments. E | Y 12 |Y 12 a 13 a b 13 166 0.07263 Var | Y 12 2|Y 12 ab a b a b 1 2 13 166 13 166 13 166 1 2 0.0003742 0.01934 2 (e) Find the normal approximation to the posterior distribution g(π | y). | Y 12 Normal 0.07263, 0.01934 2 2 (f) Compute a 95% credible interval for π using the normal approximation found in (d). | y 0.025 ˆ | y z0.025 ˆ | y 0.07623 1.96 0.01934 0.07623 0.03791 0.03834 | y 0.975 ˆ | y z0.975 ˆ | y 0.07623 1.96 0.01934 0.07623 0.03791 0.1142 Bayesian Credible Intervals Exact Use Minitab, JMP, SAS, etc., to compute endpoints: the 2.5 percentile and the 97.5 percentile of Beta , Normal Approximation Use normal approximation with | Y 29 , 2 | Y 29 687313075 15 2/5/2016 Use BetaPDFandCDF.JMP or BetaBinomialBetaPDFandCDF.JMP BetaPDFandCDF.JSL The Minitab project that calculated and graphed this distribution is on line. 687313075 16 2/5/2016