Introduction to Bayesian Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland Resources • Albert, Jim (2007) Bayesian Computation with R, Springer. • Ntzoufras, Ioannis (2009) Bayesian Modeling Using WinBUGS, Wiley. • Kéry, Marc (2010) Introduction to WinBUGS for Ecologists, Academic Press. Topics • Probability • Bayes’ Theorem • Bayesian Statistics Basic Definitions • Suppose Ω is a sample space—the set of outcomes, ω, of an experiment—for example the possible results of flipping a coin or rolling a die. • Suppose F is the collection of possible events (subsets of Ω) involving outcomes in Ω, including: – An empty event, Ø, with no outcomes belonging to it. – Simple events consisting of single outcomes. – Complex events consisting of multiple alternative outcomes (e.g., rolling an even number on a six-sided die). • An event in F that combines the outcomes in two other events, A and B, is called the union of A and B and is written A∪B. • An event in F made up of the outcomes present in both A and B is known as the intersection of A and B and is written A∩B. Probability Measures • A probability measure, P, is a function that assigns to each event in F, a real number between 0 and 1, called the probability of the event, that satisfies the following requirements: – P(Ω) = 1 – If there are two disjoint events, A and B in F—that is, A∩B = Ø—then P(A∪B) = P(A)+P(B). (This rule must also be true for any countable number of pairwise disjoint events.) Conditional Probability • The “conditional probability of B given A”, written P(B|A), describes the probability of an outcome being in B given that it is known to be in A. • P(B|A) = P(A∩B)/P(A). • For example, let A be even die rolls of a fair sixsided die, and B be die rolls that are a multiple of three. • A={2,4,6}, B={3,6}, A∩B={6}, and P(B|A) = P(A∩B)/P(A)=(1/6)/(1/2) = 1/3. Probability Models • The triple <Ω, F, P> is called a probability model. • Some theorems can be easily proven: 1. Let Ø be a event in which there are no outcomes. Then P(Ø)=0. 2. Define ¬A to be an event consisting of all the outcomes not in A. Then P(¬A) = 1 – P(A). 3. If A∩B = Ø, P(A∪B) = P(A)+P(B). Bayes’ Theorem • Bayes’ theorem is a provable consequence of these axioms (Wikipedia): • That is, the probability of A given B is the probability of B given A multiplied by the probability of A and divided by the probability of B. • Also, P(A|B) ∝ P(B|A)P(A) What Are Bayesian Statistics? • Bayesian statistics are the working out of the implications of Bayes’ Theorem. • They allow you to deduce the posteriori (afterwards) probability distribution of an event if you know the prior (before) probability distribution of the event and have some additional information. • It’s a theorem, so it is always true. Why is Bayes’ Theorem Useful? • If the prior probability distribution is ‘vague’ or ‘noninformative’, you can incrementally add information to produce a posterior distribution that reflects just the information. That posterior distribution is very similar to the distribution you would come up with using classical statistics. • If you start with real information in your prior, that is also taken into account, which is even more useful. Density Functions • You often have a ‘density’ function that is a good model of how events are distributed. A few typical density functions include: 1. The binomial distribution (one parameter, the probability of a ‘heads’) 2. The Poisson distribution (one parameter, the mean number of occurrences in a unit time interval) 3. The exponential distribution (one parameter, the rate) 4. The normal distribution (two parameters, the mean and the variance) 5. The uniform distribution (two parameters, the beginning and the end) Likelihood • Suppose you have an event, ω, drawn from a process described by a density. The probability of the event is then the value of the density for that event. • This is the ‘likelihood’ of that event. • If you have multiple samples, the corresponding likelihood is the product of the density values for each of the events. Maximum Likelihood • Suppose you have a probability density function, f(x,θ), and θ is a parameter, such as the mean, that you want to estimate. If you have n data samples, xn, the most likely value of θ is the one that maximizes the value of the likelihood. • Mathematically, the likelihood is Πf(xn, θ), the joint distribution of the sample and the product of all the f(xn, θ) functions. • You can often use calculus to calculate θ. Bayes and Maximum Likelihood • Suppose you have a prior distribution, f(θ), and some data, described by a likelihood function, li(data|θ). The posterior distribution, f(θ|data), can be calculated by applying Bayes’ Theorem. – f(θ|data) ∝li(data|θ)f(θ) Worked Example • • • • 51 smokers in 83 cases of lung cancer 23 smokers in 70 disease-free P(smoker|case) = 51/83 P(smoker|control) = 23/70 • P(case|smoker) = P(smoker|case)P(case)/ (P(smoker|case)P(case)+P(smoker|control)P(control)) • Relative risk = RR = P(case|smoker)/ P(case|nonsmoker) = 1.87