Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering (IBT) University and ETH Zürich Literature and References • Literature: • Bishop (Chapters 1.2, 1.3, 8.1, 8.2) • MacKay (Chapter 2) • Barber (Chapters 1, 2, 3, 4) • Many images in this lecture are taken from the above references. Bayesian Inference - Introduction to probability theory 2 Probability distribution A probability P(x=true) is defined on a sample space (domain) and defines (for every possible) event in the sample space the certainty of it to occur. Sample space: dom(X)={0,1} π π₯ = 1 = 0.4, π π₯ = 0 = 0.6 π π₯ =1 π₯∈πππ(π₯) Probabilities sum to one. Bishop, Fig. 1.11 Bayesian Inference - Introduction to probability theory 3 Probability theory: Basic rules . Sum rule* - π π = π π(π, π) P(X) is also called the marginal distribution Product rule - π π, π = π π π π(π) * According to Bishop Bayesian Inference - Introduction to probability theory 4 Conditional and marginal probability π(π₯|π¦) ≡ π(π₯, π¦)/π(π¦) π π₯ ≡ π¦ π(π₯, π¦) Bayesian Inference - Introduction to probability theory 5 Conditional and marginal probability Bishop, Fig. 1.11 Bayesian Inference - Introduction to probability theory 6 Independent variables π π₯, π¦ = π π₯ π π¦ π π₯ π¦ = π(π₯) Question for later: What does this mean for Bayes? Bayesian Inference - Introduction to probability theory 7 Probability theory: Bayes’ theorem π ππ = π ππ π(π) π π is derived from the product rule π π π π π = π π, π = π π π π(π) Bayesian Inference - Introduction to probability theory 8 Rephrasing and naming of Bayes’ rule D: data, q: parameters, H: hypothesis we put into the model. MacKay Bayesian Inference - Introduction to probability theory 9 Example: Bishop Fig. 1.9 Box (B): blue (b) or red (r) Fruit (F): apple (a) or orange (o) p(B=r) = 0.4, p(B=b) = 0.6. What is the probability of having a red box if one has drawn an orange? Bishop, Fig. 1.9 Bayesian Inference - Introduction to probability theory 10 Probability density π π π₯ ∈ π, π +∞ = π π₯ ππ₯ π π π₯ ππ₯ = 1 −∞ Bayesian Inference - Introduction to probability theory 11 PDF and CDF Bishop, Fig. 1.12 Bayesian Inference - Introduction to probability theory 12 Cumulative distribution π§ π(π§) = π π₯ ππ₯ −∞ Short example: How to use the cumulative distribution to transform a uniform distribution! Bayesian Inference - Introduction to probability theory 13 Marginal densities p(π₯) = π π₯, π¦ ππ¦ π Integration instead of summing Bayesian Inference - Introduction to probability theory 14 Two views on probability β Probability can … – … describe the frequency of outcomes in random experiments ο classical interpretation. – … describe the degree of belief about a particular event ο Bayesian viewpoint or subjective interpretation of probability. MacKay, Chapter 2 Bayesian Inference - Introduction to probability theory 15 Expectation of a function πΌπ = π π π₯ π(π₯) Or πΌπ = π π₯ π π₯ ππ₯ π Bayesian Inference - Introduction to probability theory 16 Graphical models 1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models. 2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph. 3. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly. Bishop, Chap. 8 Bayesian Inference - Introduction to probability theory 17 Graphical models overview Directed Graph Undirected Graph Names: nodes (vertices), edges (links), paths, cycles, loops, neighbours For summary of definitions see Barber, Chapter 2 Bayesian Inference - Introduction to probability theory 18 Graphical models overview Barber, Introduction Bayesian Inference - Introduction to probability theory 19 Graphical models π(π, π, π) = π π|π, π π(π, π) = π π π, π π π π π(π) Bishop, Fig. 8.1 Bayesian Inference - Introduction to probability theory 20 Graphical models: parents and children Node a is a parent of node b, node b is a child of node a. Bishop, Fig. 8.1 Bayesian Inference - Introduction to probability theory 21 Belief networks = Bayesian belief networks = Bayesian Networks In general: Every probability distribution can be expressed as a Directed acyclic graph (DAG) Important: No directed cycles! Bishop, Fig. 8.2 Bayesian Inference - Introduction to probability theory 22 Conditional independence A variable a is conditionally independent of b given c, if π π, π π = π π π, π π π π = π π π π π π In bayesian networks conditional independence can be tested by following some simple rules Bayesian Inference - Introduction to probability theory 23 Conditional independence – tail-totail path Is a independent of b? No! Yes! Bishop, Chapter 8.2 Bayesian Inference - Introduction to probability theory 24 Conditional independence – headto-tail path Is a independent of b? No! Yes! Bishop, Chapter 8.2 Bayesian Inference - Introduction to probability theory 25 Conditional independence – headto-head path Is a independent of b? Yes! No! Bishop, Chapter 8.2 Bayesian Inference - Introduction to probability theory 26 Conditional independence – notation Bishop, Chapter 8.2 Bayesian Inference - Introduction to probability theory 27 Conditional independence – three basic structures Bishop, Chapter 8.2.2 Bayesian Inference - Introduction to probability theory 28 More conventions in graphical notations Regression model Short form = Parameters explicit = Bishop, Chapter 8 Bayesian Inference - Introduction to probability theory 29 More conventions in graphical notations Complete model used for prediction Trained on data tn ο Bishop, Chapter 8 Bayesian Inference - Introduction to probability theory 30 Summary – things to remember • Probabilities and how to compute with the ο Product rule, Bayes’ Rule, Sum rule • Probability densities ο PDF, CDF • Conditional and Marginal distributions • Basic concepts of graphical models ο Directed vs. Undirected, nodes and edges, parents and children. • Conditional independence in graphs and how to check it. Bishop, Chapter 8.2.2 Bayesian Inference - Introduction to probability theory 31