Bayesian Statistics Bayes’ update 3/7/2023 Bayesian inference does not produces a single value. Its result is a probability distribution for unknown parameter(s) based on the data. This probability distribution is called posterior which comes from the Bayes’ rule. P (θ|data) = P (data|θ)P (θ) P (data) Now, lets try to estimate the probability of having heads when tossing a coin. Suppose, that after tossing coin 10 times we have observed the following result. {H, T, T, H, T, H, H, H, T, H} f (H) = 0.6 is the maximum likelihood estimator (MLE) of P (H). This is the frequentist approach. We can assume that P (H) = P (T ) = 0.5. Another way is to consider the likelihoods obtained using different estimates of P (H). If we estimate the likelihood P (data|P (H)) from 100 estimates of P (H) ranging from 0 to 1, we can estimate its distribution. In our case the probability mass function of the binomial distribution B(10, P (H)) = P (X ∼ B(10, P (H)) = 6) (with 6 successes) provides the likelihood of all different estimates of P (H). We can demonstrate it with the following code: rangeP <- seq(0, 1, length.out = 100) plot(rangeP, dbinom(x = 6, prob = rangeP, size = 10), type = "l", xlab = "P(Head)", ylab = "Density", col = "red") 1 0.25 0.20 0.15 0.10 0.00 0.05 Density 0.0 0.2 0.4 0.6 0.8 1.0 P(Head) Updating prior distribution Now lets assume that our prior has Normal distribution with mean equal to 0.5 and standard deviation equal to 0.5. plot(rangeP, dnorm(rangeP, mean = .5, sd = .5), type = "l", ylab = "Probability") 2 xlab = "Value", 0.80 0.70 0.60 0.50 Probability 0.0 0.2 0.4 0.6 0.8 1.0 Value Now lets plot the numerator of Bayes’ rule (Prior*Likelihood): likelihood <- dbinom(x = 6, prob = rangeP, size = 10) prior <- dnorm(x = rangeP, mean = .5, sd = .5) plot(rangeP, likelihood * prior, type = "l", col = "green", ylab = "Probability") 3 0.20 0.15 0.10 0.00 0.05 Probability 0.0 0.2 0.4 0.6 0.8 1.0 rangeP 3 graphs side by side: par(mfrow = c(3,1)) plot(rangeP, dnorm(x = rangeP, mean = .5, sd = .5), type = "l", xlab = "Value", ylab = "Probability", main = "Prior") plot(rangeP, dbinom(x = 6, prob = rangeP, size = 10), type = "l", xlab = "Value", ylab = "Density", main = "Likelihood", col = "red",) plot(rangeP, likelihood * prior, type = "l", col = "green", main = "Posterior", xlab = "Value", ylab = " 4 0.50 Probability Prior 0.0 0.2 0.4 0.6 0.8 1.0 0.6 0.8 1.0 0.6 0.8 1.0 Value 0.00 Density Likelihood 0.0 0.2 0.4 Value 0.00 Probability Posterior 0.0 0.2 0.4 Value Making our posterior sum up to 1: Posterior <- likelihood * prior standardizedPosterior <- Posterior / sum(Posterior) plot(rangeP, standardizedPosterior, col = "blue", type = "l") 5 0.020 0.010 0.000 standardizedPosterior 0.0 0.2 0.4 0.6 0.8 1.0 rangeP par(mfrow = c(2,2)) plot(rangeP, dnorm(x = rangeP, mean = .5, sd = .5), type = "l", xlab = "Value", ylab = "Probability", main = "Prior") plot(rangeP, dbinom(x = 6, prob = rangeP, size = 10), type = "l", ylab = "Density", main = "Likelihood", col = "red", xlab = "Value") plot(rangeP, likelihood * prior, type = "l", col = "green", xlab = "Value", main = "Posterior", ylab = "Probability") plot(rangeP, standardizedPosterior, col = "blue", type = "l", main = "Standardized posterior", xlab = "Value", ylab = "Probability") 6 0.4 0.6 0.8 1.0 0.0 0.6 0.8 Standardized posterior 0.4 0.6 0.8 1.0 0.0 Value 0.2 0.4 0.6 Value 7 1.0 0.000 0.020 Posterior 0.15 0.2 0.4 Value 0.00 0.0 0.2 Value Probability 0.2 0.00 0.15 Density 0.70 0.0 Probability Likelihood 0.50 Probability Prior 0.8 1.0