Bayesian Networks – Principles and Application to Modelling water, governance and human development indicators in Developing Countries Jorge López Puga (jpuga@ual.es) Área de Metodología de las Ciencias del Comportamiento Universidad de Almería www.ual.es/personal/jpuga February 2012 The Content of the Sections 1. What is probability? 2. The Bayes Theorem Deduction of the theorem The Balls problem 3. Introduction to Bayesian Networks Historical background Qualitative and quantitative dimensions Advantages and disadvantages of Bayes nets Software 2 Water4Dev – Feb/2012 What is Probability? Etymology ►Measure of authority of a witness in a legal case (Europe) Interpretations of Probability ►Objective probability • Aprioristic or classical • Frequentist or empirical ►Subjective probability • Belief 3 Water4Dev – Feb/2012 Objective Probability Classical (Laplace, 18121814) ►A priory ►Aprioristic NA p A N ►Equiprobability ►Full knowledge about the sample space 4 Frequentist ►Random experiment ►Well defined sample space ►Posterior probability ►Randomness frA p A N Water4Dev – Feb/2012 Subjective Probability It is simply an individual degree of belief which is updated based on experience Probability Axioms ►p(SE) = 1 ►p(…) ≥ 0 ►If two events are mutually exclusive (A B = Ø), then p(A B) = p(A) + p(B) 5 Water4Dev – Feb/2012 Cards Game Let me show you the idea of probability with a cards game Classical vs. Frequentist vs. Subjective 6 Water4Dev – Feb/2012 Which is the probability of getting an ace? As you probably know… Suit Ace 2 3 4 5 6 7 8 9 10 J Q K Spades Hearts Diamonds Clubs 7 Water4Dev – Feb/2012 Which is the probability of getting an ace? Given that there are 52 cards and 4 aces in a French deck… 4 0.077 Aprioristic ►We could say… p ( Ace) 52 If we repeated the experience a finite number of times If I subjectively assess that probability 8 Frequenti st Bayesian Water4Dev – Feb/2012 Which is the probability of getting an ace? Why is useful a Bayesian interpretation of probability? – Let’s play 4 0.077 ►We could say… p ( Ace) 52 3 p( Ace) 0.059 Probability 51 estimations depends 2 on our state of p( Ace) 0.04 knowledge 50 (Dixon, 1964) 1 p ( Ace) 0.02 49 9 Water4Dev – Feb/2012 The Bayesian Theorem Getting Evidences and Updating Probabilities Joint and Conditional Probability Joint probability (Distributions – of variables) ►It represents the likelihood of two events occurring at the same time ►It is the same that the intersection of events ►Notation • p(A B), p(A,B), p(AB) Estimation ►Independent events ►Dependent events 11 Water4Dev – Feb/2012 Independent events ►p(AB) = p(A) × p(B) or p(BA) = p(B) × p(A) Example: which is the probability of obtaining two tails (T) after tossing two coins? p(TT) = p(T) × p(T) = 0.5 × 0.5 = 0.25 Dependent events ►Conditional probability and the symbol “|” ►p(AB) = p(A|B) × p(B) or p(BA) = p(B|A) × p(A) Example: which is the probability of suffering from bronchitis (B) and being a smoker (S) at the same time? • p(B) = 0.25 • p(S|B) = 0.6 12 p(SB) = p(S|B) × p(B) = 0.6 × 0.25 = 0.15 Water4Dev – Feb/2012 The Bayes Theorem It is a generalization of the conditional probability applied to the joint probability It is: p( B | A) p( A) p A | B p( B) You can deduce it because: p(AB) = p(A|B) × p(B) - - - - - p(BA) = p(B|A) × p(A) p(A|B) × p(B) = p(B|A) × p(A) p(A|B) = p(B|A) × p(A) / p(B) 13 Water4Dev – Feb/2012 Example: which is the probability of a person suffering from bronchitis (B) given s/he smokes (S)? • p(B) = 0.25 • p(S|B) = 0.6 • p(S) = 0.40 p( S | B) p( B) p B | S p( S ) 0.6 0.25 p B | I 0.375 0.40 14 Water4Dev – Feb/2012 The Total Probability Teorem If we use a system based on a mutually excusive set of events = {A1, A2, A3 ,…An} whose probabilities sum to unity, then the probability of an arbitrary event (B) equals to: p( B) p( B | Ai ) p( Ai ) which means: p( B) p( B | Ai ) p( Ai ) p( B | An ) p( An ) 15 Water4Dev – Feb/2012 If = {A1, A2, A3 ,…An} is a mutually excusive set of events whose probabilities sum to unity, then the Bayes Theorem becomes: p Ak | B p( B | Ak ) p( Ak ) p( B | Ai ) p( Ai ) Let’s use a typical example to see how it works 16 Water4Dev – Feb/2012 The Balls problem Situation: we have got three boxes (B1, B2, B3) with the following content of balls: Box 1 Box 2 Box 3 30% 40% 10% 60% 30% 70% 10% 30% 20% Experiment: extracting a ball, looking at its colour and determining from which box was extracted 17 Water4Dev – Feb/2012 Box 1 Box 2 Box 3 30% 40% 10% 60% 30% 70% 10% 30% 20% Let’s consider that the probability of selecting each box is the same: p(Bi) = 1/3 Imagine someone gives you a white ball, which is the probability that the ball was extracted from box 2? p(B2|W) = ???? 18 Water4Dev – Feb/2012 Box 1 Box 2 Box 3 30% 40% 10% 60% 30% 70% 10% 30% 20% p(B2|W) = ???? pB2 | W p(W | B2 ) p( B2 ) p(W ) By definition we know that: p(W|B1) = 0.3 p(W|B2) = 0.4 p(W|B2) = 0.1 But we do not know p(W) 19 Water4Dev – Feb/2012 𝟑𝟎 + 𝟒𝟎 + 𝟏𝟎 𝒑 𝑾 = 𝟑𝟎𝟎 Box 3 Box 1 Box 2 30% 40% 10% 60% 30% 70% 10% 30% 20% p(B2|W) = ???? ►But we can use the total probability theorem to discover the value of p(W): p(W ) p(W | B1 ) p( B1 ) p(W | B2 ) p( B2 ) p(W | B3 ) p( B3 ) 1 1 1 p(W ) 0.3 3 0.4 3 0.1 3 0.26 0.4 13 0.5 p B2 | W 0.26 20 Water4Dev – Feb/2012 ►The following table shows changes in beliefs Posterio Prior Box p(W|B_i) p(B_i) p(W|B_i) * p(B_i) rp(B_i|W) 1 2 3 Total 0.3 0.4 0.1 0.8 0.3 0.3 0.3 1 0.100 0.133 0.033 0.267 0.375 0.500 0.125 1 ►Imagine we were given a red ball, what would be the updated probability for each box? Posterio Prior Box p(R|B_i) p(B_i) p(R|B_i) * p(B_i) r p(B_i|R) 1 0.1 0.375 0.038 0.176 2 3 Total 21 0.3 0.2 0.6 0.500 0.125 1 0.150 0.025 0.212 0.706 0,118 1 Water4Dev – Feb/2012 ►Finally, what would be the probability for each box if we were said that a yellow ball was extracted? Posterio Prior Box p(B|B_i) p(B_i) p(B|B_i) * p(B_i) rp(B_i|B) 1 2 3 Total 0.6 0.3 0.7 1.6 0.176 0.706 0.118 1 0.106 0.212 0.082 0.400 0.265 0.529 0.206 1 But, is there another way to solve this problem? ►Yes, there is ►Using a Bayesian Network ►Let’s use the Balls network 22 Water4Dev – Feb/2012 Bayesian Networks A brief Introduction Brief Historical Background Late 70’s – early 80’s Artificial intelligence Machine learning and reasoning ►Expert system = Knowledge Base + Inference Engine Diagnostic decision tree, classification tree, flowchart or algorithm 70-200/min <70/min (Adapted from Cowell et. al., 1999) 24 No Femoral pulses < other pulses? Heart rate? Enter >200/min Complete heart block Tachyarrhythmi a Correct = 1/1 Correct = 3/3 Superior axis or additional cyanosis? No Yes Yes Weak left arm pulse? Water4Dev – Feb/2012 Rule-based expert systems or production systems ►If…then • IF headache & temperature THEN influenza • IF influenza THEN sneezing • IF influenza THEN weakness ►Certainty factor • IF headache & fever THEN influenza (certainty 0.7) • IF influenza THEN sneezing (certainty 0.9) • IF influenza THEN weakness (certainty 0.6) (Example adpted from Cowell et. al., 1999) 25 Water4Dev – Feb/2012 What is a Bayesian Network? There are several names for it, among others: Bayes net, belief network, causal network, influence diagram, probabilistic expert system “a set of related uncertainties” (Edwards, 1998) For Xiang (2002): […] it is triad V, G, P where: ►V, is a set of variables ►G, is a directed acyclic graph (DAG) ►P, is a set of probability distributions To make things practical we could say: 26 ►Qualitative dimension ►Quantitative dimension Water4Dev – Feb/2012 Qualitative Structure Graph: a set of vertexes (V) and a set of links (L) Directed Acyclic Graph (DAG) The meaning of a connection: A B The Principle of Conditional Independence Three types of basic connections | Evidence propagation Serial connection Causal-chain model 27 A B C Water4Dev – Feb/2012 Divergent connection Diverging connection Common-cause model Convergent connection Converging connection Common-effect model 28 B A C A C B Water4Dev – Feb/2012 A Classical Example Mr. Holmes is working in his office when he receives a phone call from his neighbour Dr. Watson, who tells him that Holmes’ burglar alarm has gone off. Convinced that a burglar has broken into his house, Holmes rushes to his car and heads for home. On his way, he listens to the radio, and in the news it is reported that there has been a small earthquake in the area. Knowing that earthquakes have a tendency to turn burglar alarms on, he returns to his work. 29 Water4Dev – Feb/2012 Quantitative Structure Probability as a belief (Cox, 1946; Dixon, 1970) Bayes Theorem Each variable (node) in the model is a conditional probability function of others variables Conditional Probability Tables (CPT) 30 Water4Dev – Feb/2012 Pros and cons of Bayes nets Qualitative Quantitative Missing data Non-parametric models Interaction–non-linearity Inference – scenarios Local computations Easy interpretation 31 Hybrid nets Time series Software Water4Dev – Feb/2012 Software Netica Application (Norsys Software Corp.) www.norsys.com Hugin (Hugin Exper A/S) www.hugin.com Ergo (Noetic Systems Inc.) www.noeticsystems.com Elvira (Academic development) http://www.ia.uned.es/~elvira R Tetrad (CMU, NASA, ONR) MATLAB http://www.phil.cmu.edu/projects/tetrad/ 32 Water4Dev – Feb/2012 Thank you very much for your attention! 33 Water4Dev – Feb/2012