JORGE-Water4Dev

advertisement
Bayesian Networks – Principles and
Application to Modelling water,
governance and human development
indicators in Developing Countries
Jorge López Puga (jpuga@ual.es)
Área de Metodología de las Ciencias del Comportamiento
Universidad de Almería
www.ual.es/personal/jpuga
February 2012
The Content of the Sections
1. What is probability?
2. The Bayes Theorem
Deduction of the theorem
The Balls problem
3. Introduction to Bayesian Networks
Historical background
Qualitative and quantitative dimensions
Advantages and disadvantages of Bayes nets
Software
2
Water4Dev – Feb/2012
What is Probability?
Etymology
►Measure of authority of a witness in a legal case
(Europe)
Interpretations of Probability
►Objective probability
• Aprioristic or classical
• Frequentist or empirical
►Subjective probability
• Belief
3
Water4Dev – Feb/2012
Objective Probability
Classical (Laplace, 18121814)
►A priory
►Aprioristic
NA
p  A 
N
►Equiprobability
►Full knowledge
about the sample
space
4
Frequentist
►Random experiment
►Well defined sample
space
►Posterior probability
►Randomness
frA
p  A 
N
Water4Dev – Feb/2012
Subjective Probability
It is simply an individual degree of belief which
is updated based on experience
Probability Axioms
►p(SE) = 1
►p(…) ≥ 0
►If two events are mutually exclusive (A  B = Ø),
then p(A  B) = p(A) + p(B)
5
Water4Dev – Feb/2012
Cards Game
Let me show you the idea of
probability with a cards game
Classical vs. Frequentist vs.
Subjective
6
Water4Dev – Feb/2012
Which is the probability of getting an ace?
As you probably know…
Suit
Ace
2
3
4
5
6
7
8
9
10
J
Q
K
Spades
Hearts
Diamonds
Clubs
7
Water4Dev – Feb/2012
Which is the probability of getting an ace?
Given that there are 52 cards and 4 aces in a
French deck…
4
 0.077 Aprioristic
►We could say… p ( Ace) 
52
If we repeated the
experience a finite
number of times
If I subjectively
assess that
probability
8
Frequenti
st
Bayesian
Water4Dev – Feb/2012
Which is the probability of getting an ace?
Why is useful a Bayesian interpretation of
probability? – Let’s play
4
 0.077
►We could say… p ( Ace) 
52
3
p( Ace) 
 0.059
Probability
51
estimations depends
2
on our state of
p( Ace) 
 0.04
knowledge
50
(Dixon, 1964)
1
p ( Ace) 
 0.02
49
9
Water4Dev – Feb/2012
The Bayesian Theorem
Getting Evidences and Updating Probabilities
Joint and Conditional Probability
Joint probability (Distributions – of variables)
►It represents the likelihood of two events occurring
at the same time
►It is the same that the intersection of events
►Notation
• p(A  B), p(A,B), p(AB)
Estimation
►Independent events
►Dependent events
11
Water4Dev – Feb/2012
Independent events
►p(AB) = p(A) × p(B) or p(BA) = p(B) × p(A)
Example: which is the probability of obtaining two
tails (T) after tossing two coins?
p(TT) = p(T) × p(T) = 0.5 × 0.5 = 0.25
Dependent events
►Conditional probability and the symbol “|”
►p(AB) = p(A|B) × p(B) or p(BA) = p(B|A) × p(A)
Example: which is the probability of suffering from
bronchitis (B) and being a smoker (S) at the same
time?
• p(B) = 0.25
• p(S|B) = 0.6
12
p(SB) = p(S|B) × p(B) = 0.6 × 0.25 = 0.15
Water4Dev – Feb/2012
The Bayes Theorem
It is a generalization of the conditional
probability applied to the joint probability
It is:
p( B | A)  p( A)
p A | B  
p( B)
You can deduce it because:
p(AB) = p(A|B) × p(B) - - - - - p(BA) = p(B|A) × p(A)
p(A|B) × p(B) = p(B|A) × p(A)
p(A|B) = p(B|A) × p(A) / p(B)
13
Water4Dev – Feb/2012
Example: which is the probability of a person
suffering from bronchitis (B) given s/he smokes (S)?
• p(B) = 0.25
• p(S|B) = 0.6
• p(S) = 0.40
p( S | B)  p( B)
p B | S  
p( S )
0.6  0.25
p B | I  
 0.375
0.40
14
Water4Dev – Feb/2012
The Total Probability Teorem
If we use a system based on a mutually excusive set
of events  = {A1, A2, A3 ,…An} whose probabilities
sum to unity,
then the probability of an arbitrary event (B) equals
to:
p( B)   p( B | Ai )  p( Ai )
which means:
p( B)  p( B | Ai )  p( Ai )    p( B | An )  p( An )
15
Water4Dev – Feb/2012
If  = {A1, A2, A3 ,…An} is a mutually excusive
set of events whose probabilities sum to
unity, then the Bayes Theorem becomes:
p Ak | B  
p( B | Ak )  p( Ak )
 p( B | Ai )  p( Ai )
Let’s use a typical example to see how it
works
16
Water4Dev – Feb/2012
The Balls problem
Situation: we have got three boxes (B1, B2,
B3) with the following content of balls:
Box 1
Box 2
Box 3
30% 
40% 
10% 
60% 
30% 
70% 
10% 
30% 
20% 
Experiment: extracting a ball, looking at its
colour and determining from which box was
extracted
17
Water4Dev – Feb/2012
Box 1
Box 2
Box 3
30% 
40% 
10% 
60% 
30% 
70% 
10% 
30% 
20% 
Let’s consider that the probability of selecting
each box is the same: p(Bi) = 1/3
Imagine someone gives you a white ball,
which is the probability that the ball was
extracted from box 2?
p(B2|W) = ????
18
Water4Dev – Feb/2012
Box 1
Box 2
Box 3
30% 
40% 
10% 
60% 
30% 
70% 
10% 
30% 
20% 
p(B2|W) = ????
pB2 | W  
p(W | B2 )  p( B2 )
p(W )
By definition we know that:
p(W|B1) = 0.3
p(W|B2) = 0.4
p(W|B2) = 0.1
But we do not know p(W)
19
Water4Dev – Feb/2012
𝟑𝟎 + 𝟒𝟎 + 𝟏𝟎
𝒑 𝑾 =
𝟑𝟎𝟎
Box 3
Box 1
Box 2
30% 
40% 
10% 
60% 
30% 
70% 
10% 
30% 
20% 
p(B2|W) = ????
►But we can use the total probability theorem to
discover the value of p(W):
p(W )  p(W | B1 )  p( B1 )  p(W | B2 )  p( B2 )  p(W | B3 )  p( B3 )

1
1
1
p(W )  0.3  3  0.4  3  0.1 3  0.26
0.4  13
  0.5
p B2 | W  
0.26
20
Water4Dev – Feb/2012
►The following table shows changes in beliefs
Posterio
Prior
Box p(W|B_i)
p(B_i)
p(W|B_i) * p(B_i) rp(B_i|W)
1
2
3
Total
0.3
0.4
0.1
0.8
0.3
0.3
0.3
1
0.100
0.133
0.033
0.267
0.375
0.500
0.125
1
►Imagine we were given a red ball, what would be
the updated probability for each box?
Posterio
Prior
Box p(R|B_i)
p(B_i)
p(R|B_i) * p(B_i) r p(B_i|R)
1
0.1
0.375
0.038
0.176
2
3
Total
21
0.3
0.2
0.6
0.500
0.125
1
0.150
0.025
0.212
0.706
0,118
1
Water4Dev – Feb/2012
►Finally, what would be the probability for each box
if we were said that a yellow ball was extracted?
Posterio
Prior
Box p(B|B_i)
p(B_i)
p(B|B_i) * p(B_i) rp(B_i|B)
1
2
3
Total
0.6
0.3
0.7
1.6
0.176
0.706
0.118
1
0.106
0.212
0.082
0.400
0.265
0.529
0.206
1
But, is there another way to solve this
problem?
►Yes, there is
►Using a Bayesian Network
►Let’s use the Balls network
22
Water4Dev – Feb/2012
Bayesian Networks
A brief Introduction
Brief Historical Background
Late 70’s – early 80’s
Artificial intelligence
Machine learning and reasoning
►Expert system = Knowledge Base + Inference
Engine
Diagnostic decision tree, classification tree,
flowchart or algorithm
70-200/min
<70/min
(Adapted from Cowell et. al., 1999)
24
No
Femoral pulses <
other pulses?
Heart rate?
Enter
>200/min
Complete heart
block
Tachyarrhythmi
a
Correct = 1/1
Correct = 3/3
Superior axis or
additional cyanosis?
No
Yes
Yes
Weak left arm
pulse?
Water4Dev – Feb/2012
Rule-based expert systems or production
systems
►If…then
• IF headache & temperature THEN influenza
• IF influenza THEN sneezing
• IF influenza THEN weakness
►Certainty factor
• IF headache & fever THEN influenza (certainty 0.7)
• IF influenza THEN sneezing (certainty 0.9)
• IF influenza THEN weakness (certainty 0.6)
(Example adpted from Cowell et. al.,
1999)
25
Water4Dev – Feb/2012
What is a Bayesian Network?
There are several names for it, among others:
Bayes net, belief network, causal network,
influence diagram, probabilistic expert system
“a set of related uncertainties” (Edwards,
1998)
For Xiang (2002): […] it is triad V, G, P where:
►V, is a set of variables
►G, is a directed acyclic graph (DAG)
►P, is a set of probability distributions
To make things practical we could say:
26
►Qualitative dimension
►Quantitative dimension
Water4Dev – Feb/2012
Qualitative Structure
Graph: a set of vertexes (V) and a set of links
(L)
Directed Acyclic Graph (DAG)
The meaning of a connection: A  B
The Principle of Conditional Independence
Three types of basic connections | Evidence
propagation
Serial connection
Causal-chain model
27
A
B
C
Water4Dev – Feb/2012
Divergent connection
Diverging connection
Common-cause model
Convergent
connection
Converging
connection
Common-effect model
28
B
A
C
A
C
B
Water4Dev – Feb/2012
A Classical Example
Mr. Holmes is working in his office when he
receives a phone call from his neighbour Dr.
Watson, who tells him that Holmes’ burglar
alarm has gone off. Convinced that a burglar
has broken into his house, Holmes rushes to
his car and heads for home. On his way, he
listens to the radio, and in the news it is
reported that there has been a small
earthquake in the area. Knowing that
earthquakes have a tendency to turn burglar
alarms on, he returns to his work.
29
Water4Dev – Feb/2012
Quantitative Structure
Probability as a belief (Cox, 1946; Dixon,
1970)
Bayes Theorem
Each variable (node) in the model is a
conditional probability function of others
variables
Conditional Probability Tables (CPT)
30
Water4Dev – Feb/2012
Pros and cons of Bayes nets
Qualitative Quantitative
Missing data
Non-parametric models
Interaction–non-linearity
Inference – scenarios
Local computations
Easy interpretation
31
Hybrid nets
Time series
Software
Water4Dev – Feb/2012
Software
Netica Application (Norsys Software Corp.)
www.norsys.com
Hugin (Hugin Exper A/S) www.hugin.com
Ergo (Noetic Systems Inc.)
www.noeticsystems.com
Elvira (Academic development)
http://www.ia.uned.es/~elvira
R
Tetrad (CMU, NASA, ONR)
MATLAB
http://www.phil.cmu.edu/projects/tetrad/
32
Water4Dev – Feb/2012
Thank you very much
for your attention!
33
Water4Dev – Feb/2012
Download