Beta Distribution

advertisement
Bayesian Inference for Binomial Probabilities,

Parameter of interest
 = constant probability of success on each of n independent and identical dichotomous
(Bernoulli) trials.
Prior Distribution of the parameter of interest
~ Uniform (0, 1) = Beta (1, 1)
1, 0    1
g    
0, elsewhere
or
~ Beta(, )
k  1 1     1 , 0    1
g    
elsewhere
0,
E   
Var   
Mode   

 

        1
2
 1
, ,  1
  2
Median    G 1  0.5 
687313075
1
2/5/2016
where G is the CDF of the Beta distribution with PDF g, i.e., where
G   

 g  u  du
u 
Likelihood Function
Y | Binomial(n, ) likelihood function
 n  y
n y
   1    , y  0,1, , n
f  y |     y 
0,
elsewhere

(1)
E Y |    n
Var Y |    n 1   
E Y n |    
Var Y n |     1    n
Figure 1
This is actually the sampling distribution of binomial random variable Y,
i.e., the binomial probability mass function (PMF), Equation (1), as a
function of Y, for particular fixed values of the binomial parameters n and
687313075
2
2/5/2016
π. The Likelihood function, on the other hand, is Equation (1) viewed as a
function of π for fixed Y and n.
Posterior
| Y~ Beta ( + y,  + n – y)
Theorem
The Beta family of distributions is the conjugate family of priors to the binomial
likelihood, i.e.,
if
(i) the prior distribution is Beta(, ), and
(ii) the likelihood function is Binomial(n,  ),
then
the posterior distribution is Beta( + y,  + n – y).
Proof:
posterior = constant × prior × likelihood
g  | y   kg   f  y |  
 k  1 1   
 1
 k   y 1 1   
 k  1 1   
 y 1   
n y
  n  y 1
 1
,      y,      n  y
Note. The Beta family of priors is the conjugate family of priors for the binomial
likelihood.

Conjugate means that the posterior distribution family is the same as the prior
distribution family.
687313075
3
2/5/2016

The constant of integration, i.e., the denominator of Bayes Law, can be found
easily without integration.
Distribution
Mean
prior  
likelihood Y |  
posterior  | Y  y 
 ~ Beta(, )
Y |  ~
|Y ~ Beta( + y,  + n – y)
Binomial(n, )
= Beta(’, ’)
E   

 
E Y |    n
E  | Y  y  

  
Var   
Var Y |  
Var  | Y  y  

        1
 n 1   
  
2
           1
(Equivalent)
equivalent =
n
sample size
neq      1
Variance
2
neq        1
     1  n 
Methods for assigning prior for 
Choose a vague, uninformative conjugate prior for 
Uniform: ~ Beta(1, 1)
Jeffreys’ prior for binomial: ~ Beta(0.5, 0.5)
Choose a conjugate prior for  matching prior belief about location
and scale of the distribution of 
Beliefs held (taken as given) prior to evidence (data):

Location() = mean() = 0, a specific value in accord with prior beliefs.

Scale() = SD() =  0 a specific value according to prior beliefs.
Given this notation, our prior beliefs about the location and scale of  are
687313075
4
2/5/2016
E     0
(2)
Var     02
(3)
This yields two equations in two unknowns:

 
 0
(4)

  02
        1
(5)
2
From (4) it’s easy to prove

 1   0
(6)
 0 1   0 
  02
    1
(7)
 
Substituting (4) and (6) into (5), we get
which suggests the form of the variance of a binomial random variable. Therefore, we
call denominator of the left hand side of (7) the equivalent sample size and denote the
equivalent sample size by neq, and denote the equivalent sample size of the prior by n0 to
match0 and 0:
neq  n0      1
(8)
With this notation, equation (7) becomes
 0 1   0 
n0
  02
(9)
Solving (9) for the equivalent sample size, we get
687313075
5
2/5/2016
 0 1   0 
 02
(10)
   0  n0  1
(11)
  1   0  n0  1
(12)
n0 
From (4)
From (6)
Summary
In summary, based on
(i) our prior beliefs, (2) and (3) about the location and scale of the distribution g  
of the unknown population proportion  (or the unknown probability of success),
and
(ii) the assumption that the prior g   is a member of the Beta family,
we get equations (4) and (5), which, after define the prior sample size equivalent n0,
yields formulas (11) and (12) for the prior Beta parameters  and  .
Example (Bolstad 2007, Exercise 8.1)
In order to determine how effective a magazine is at reaching its target audience, a
market research company selects a random sample of people from the target audience and
interviews them. Out of the 150 people in the sample, 29 had seen the latest issue.
Underlying random variable of interest (from survey): whether or not the ith
randomly sampled person had seen the latest issue (categorical). This is an iid
Bernoulli (0, 1) random variable, the indicator that a person has seen the latest
issue
1, if the j-th person sampled has seen the lastest issue
Ij  
0, otherwise
687313075
6
(13)
2/5/2016
Parameter of interest:  = the proportion of the target audience (population) that
have seen the latest issue.
  P  I j  1  P  j-th person sampled saw the latest issue
(14)
Statistic: Y = the number of people in a sample of size n who have seen the latest
issue, the sample sum of the indicator
n
Y  Ij
(15)
j 1
Sample: Observe y = 29, n = 150.
a) What is the distribution of y, the number who have seen the latest issue.
Likelihood(Y|) = Binomial (150, )
b) Use a uniform prior for , what is the posterior distribution of .
Prior() = Uniform = Beta(1, 1)
Posterior( |Y = 29) = Beta(1 + 29, 1 + 150 – 29) = Beta(30, 122)
Prior Mean = E() = 1/(1+1) = ½ = 0.500
c) Posterior Mean = E( |Y = 29) = 30/(30 + 122) = 30/152 = 0.197
12
10
PDF
8
6
4
2
0
0
0.2
0.4
0.6
0.8
1
pi
687313075
7
2/5/2016
d) Change in mean (from prior to posterior) is negative. I.e., change in belief about
expectation of  is negative.
e) Makes sense because sample proportion was 29/150 = 0.193 < prior mean =
0.500.
f) (Notice that posterior mean, the Bayes estimator of , is very close to the
unbiased frequentist estimator, the sample proportion 0.193.)
g) (Notice that it is slightly greater. Why? Because the prior was greater than the
sample. The bias in the Bayes estimator comes from the (incorporation of) prior
(beliefs).)
h) Prior Var() = (1)(1)/[(1+1)2(1+1+1)] = 1/12 = 0.0833= 0.2892
i) Prior SD() = 0.289
j) Posterior Variance(|Y = 29) = (30)(122)/[(30 + 122)2(30 + 122 + 1)] = 0.001035
k) Posterior SD(|Y = 29) = 0.0322
l) Variance of  has decreased because of added information.
m) (Declined by nearly a factor of 10, because of a lot of information n = 150,
compared to prior equivalent prior sample size of n0 = (1 + 1 + 1) = 3).
n) Equivalent prior sample size = n0 = (1 + 1 + 1) = 3).
o) Equivalent posterior sample size = n1 = (30 + 122 + 1) = 153).
p) Difference is sample size of n = 150. (makes sense)
q) Exact 95% Bayesian credible interval: use Minitab, JMP, SAS, etc., to compute
endpoints: the 2.5 percentile and the 97.5 percentile of Beta(30, 122)
0.025 quantile of Beta (30, 122) is Q0.025  0.1382
0.975 quantile of Beta (30, 122) is Q0.975  0.2640
Exact 95% credible interval for π is [0.138, 0.264]
r) Approximate 95% Bayesian Credible Interval: Use normal approximation with
  | Y  29  0.197,  2  | Y  29   0.0833,   | Y  29   0.289
Approximate lower 95% confidence limit is
 | y  z0.975  | y  0.197  1.96  0.289  0.197  0.564  [0.134, 0.260]
(16)
Approximate 95% credible interval for π is [0.134, 0.260]
687313075
8
2/5/2016
Example (Bolstad 2007, Exercise 8.3)
Sophie, the editor of the school newspaper, is going to conduct a survey of the students to
to determine the level of support for the current president of the student association. She
needs to determine her prior distribution for π, the proportion of students who support the
president. She decides the prior mean is 0.5 and the prior standard deviation is 0.15.
Underlying random variable of interest (from survey): whether or not the ith
randomly sampled student supports the current president of the student
association (categorical, dichotomous).
Parameter of interest:  = the proportion of students who support the president.
(a) Determine the beta(a, b) prior that matches her prior belief.
Given the prior mean of 0.5 and the prior standard deviation of 0.15, we substitute
into equations (4) and (5) we get
 0  0.5,  0  0.15
 0 1   0 
n0
  02
0.5 1  0.5 
 0.152
n0
n0 
0.5 1  0.5 
 11.11
0.152
n0  a  b  1  11.11
a  b  10.11
0 
a
ab
0.5 
a
10.11
a  5.055
b  5.055
687313075
9
2/5/2016
(b) What is the equivalent sample size of her prior?
n0  a  b  1  11.11
(c) Out of the 68 students that she polls, y = 21 support the current president.
Determine the posterior distribution.
Statistic: Y = the number of students in a sample of size n who support the
president.
Sample: Observe Y = 21, n = 68.
g  | y  21  beta  a  y, b  n  y 
 beta  5.055  21, 5.055  68  21
 beta  26.06, 52.06 
(d) NOT IN BOLSTAD. We can also find the posterior mean, variance, and standard
deviation, and we can graph the prior and the posterior. This is especially easy
with the use of BetaBinomialBetaPDFandCDF.JMP, but, of course, we should all
know how to do it by hand.
687313075
10
2/5/2016
prior
posterior
a
a = 5.06
a' = 26.06
b
b = 5.06
b' = 52.06
neq
n0 = 11.1
n1 = 79.11
E(π) = µπ = 0.500
E(π|y) = µπ|y = 0.334
mean
2
variance Var(π) =   | y  0.225
standard deviation
Var(π|y) =  2| y  0.00281
0.15
0.0530
exact lower 95%
Q0.025 = 0.2135
Q0.025 = 0.234
exact upper 95%
Q0.975 = 0.7865
Q0.975 = 0.441
0.230
approx upper 95%
0.437
PDF
approx lower 95%
7
6
5
4
3
2
1
0
0
0.2
0.4
0.6
0.8
1
pi
687313075
11
2/5/2016
Example (Bolstad 2007, Exercise 8.5)
In a research program on human health risk from recreational contact with water
contaminated with pathogenic microbial material, the National Institute of Water and
Atmosphere (NIWA) instituted a study to determine the quality of New Zealand stream
water at a variety of catchment types. there were 116 one-liter water samples from sites
identified as having a heavy environmental impact from birds (seagulls) and waterfowl.
Out of these samples, 17 contained Giardia cysts.
(a) What is the distribution of Y, the number of samples containing Giardia cysts?
Likelihood: P(Y = 17|) = Binomial (116, )   17 1   
116 17
  17 1    , a
99
function of 
Figure 2
The likelihood as it is used, as a function of the parameter of interest, in
this case π, for fixed value of the sufficient statistic, in this case Y = 17.
The sample size n is also fixed. Not being the parameter of interest, i.e.,
not being the parameter about which we want to make inferences, n, is
called an auxiliary parameter, or a nuisance parameter.
(b) Let π be the true probability that a one-liter water sample from this type of site
contains Giarida cysts. Use a beta (1, 4) prior for π. Find the posterior distribution
of π given y.
Prior() = Uniform = Beta(1, 4)
Posterior( |Y = 17) = Beta(1 + 17, 1 + 116 – 17) = Beta(18, 100)
687313075
12
2/5/2016
(c) Summarize the posterior distribution by its first two moments.
Posterior Mean = E( |Y = 17) = 18/(18 + 100) = 18/118 = 0.1488
Posterior Variance, Var(|Y = 17) = (18)(100)/[(18 + 100)2(18 + 100 + 1)]
= 0.001038 = (0.03222)2
(d) Find the normal approximation to the posterior distribution g(π | y).
Normal    0.1488,  2  0.032222 
687313075
13
2/5/2016
(e) Compute a 95% credible interval for π using the normal approximation found in
(d).
 | y 0.025  ˆ | y   z0.025 ˆ | y   0.07623  1.96  0.01934  0.07623  0.03791  0.03834
 | y 0.975  ˆ | y   z0.975 ˆ | y   0.07623  1.96  0.01934  0.07623  0.03791  0.1142
Example (Bolstad 2007, Exercise 8.7)
The same study found that 12 out of 174 samples contained Giardia cysts in a
environment having a high impact from sheep.
(a) What is the distribution of Y, the number of samples containing Giardia cysts?
Likelihood P(Y =12 |) = Binomial (n = 174, Y = 12)
(b) Let π be the true probability that a one-liter water sample from this type of site
contains Giardia cysts. Use a beta(1, 4) prior for π. Find the posterior distribution
of π given y.
Y |
Beta 1  12, 4  174 12
Beta 13,166
(c)
687313075
14
2/5/2016
(d) Summarize the posterior distribution by its first two moments.
E  | Y  12    |Y 12 
a
13

a  b 13  166
 0.07263
Var  | Y  12    2|Y 12 

ab
 a  b   a  b  1
2
13 166
13  166  13  166  1
2
 0.0003742   0.01934 
2
(e) Find the normal approximation to the posterior distribution g(π | y).
 | Y  12   Normal  0.07263,  0.01934 

2 2
(f) Compute a 95% credible interval for π using the normal approximation found in
(d).
 | y 0.025  ˆ | y   z0.025 ˆ | y   0.07623  1.96  0.01934  0.07623  0.03791  0.03834
 | y 0.975  ˆ | y   z0.975 ˆ | y   0.07623  1.96  0.01934  0.07623  0.03791  0.1142
Bayesian Credible Intervals
Exact
Use Minitab, JMP, SAS, etc., to compute endpoints: the 2.5 percentile and the 97.5
percentile of Beta  ,  
Normal Approximation
Use normal approximation with
  | Y  29 ,  2  | Y  29
687313075
15
2/5/2016
Use BetaPDFandCDF.JMP or BetaBinomialBetaPDFandCDF.JMP
BetaPDFandCDF.JSL
The Minitab project that calculated and graphed this distribution is on line.
687313075
16
2/5/2016
Download