Week 7 - Probability&Bayes Net

advertisement
CS 188 Sp07 Discussion Note Week 7 – Discrete Probability and Bayesian
Network
by Nuttapong Chentanez
Probability
Independence
X and Y are independent if P(X,Y) = P(X)P(Y), equivalently P(X) = P(X|Y)
Sample Space – Set of all possible outcome of some experiment
Random Variables – Function that assign a value to each outcome in a sample
space
eg. If sample space S is the set of all students in this class, one could define a
random variable A, measuring age. If p is a person, A(p) is his/her age.
Event - a set of outcomes that share property you are interested in. eg. For
sample space S, J may be the set of juniors. Randomly picking a person may pr
may not result in the event that he/she is a junior. P(J) denotes the probability
that the event J occurs.
Events can be union, intersects, complement to define new events. Particular
conditions on random variables such as A=6’1”, A<7’ can also be considered an
event.
Conditional independent
X and Y are conditional independent given Z if P(X,Y|Z) = P(X|Z) P(Y|Z)
Exercise: Show that the above is equivalent to P(X|Y,Z) = P(X|Z)
Chain rules
n
P(x1, x2, …, xn) = P(xn|xn-1,…..,x1)P(xn-1,…, x1) =  P(xi|xi-1,….,x1)
i=1
Bayes’s Nets
A set of random variables as nodes (discrete or continuous)
A set of directed links or arrows connects pairs of nodes.
Each node Xi has a conditional probability distribution P(Xi|Parents(Xi))
Graph has no directed cycle, directed acyclic graph (DAG)
Conditional Probability – P(X|Y) = P(X  Y) / P(Y) is probability that event X
occurs given that event Y occurs
Joint Distribution – P(A = a, B = b) denotes probability that A = a and B = b
Marginal Distribution – P(A = a) =  P(A = a, B = b) This summation is
called “marginalization”
b
Conditional distribution – P(A = a| B = b) gives conditional probability
Important Rules:
Chain Rules: P(X, Y) = P(X|Y)P(Y)
Bayes’s Rule: P(X|Y) = P(Y|X) P(X) / P(Y),
Axioms of probability
1. 0<= P(a) <= 1, for any proposition a,
2. P(true) =1 , P(false) = 1
3. P( a  b) = P(a) + P(b) – P(a  b)
The topology imply certain conditional independencies:
P(Xi|Xi-1,….X1) = P(Xi| Parents(Xi))
Given that Parents(Xi)  {Xi-1, … , X1}
very useful in AI
Combined with chain rule, we can write full joint probability as:
n
P(X1, …Xn) =  P(Xi|Parents(Xi)),
i=1
Exercise:
1. Show P(a| a  b) = 1
2.
3.
using the partial ordering implied by the DAG
Consider the problem of dealing 5-card poker hands from a standard
deck of 52 cards, assuming that the dealer is fair.
a.
How many atomic events are there in the joint probability
distribution (how many 5-card hands are there)?
b. What is the probability of each atomic event?
c.
What is the probability of dealt a royal straight flush? Four
of a kind?
From this table:
Toothache
Given a set of random variables, the correct order for constructing Bayes’s net is
by adding “root causes” first, then the variables they influence and so on.
This is because parents of a node are “direct influencers” of the node.
Directed edges need not be causality, but constructing
the graphs with causality in mind tend to make the
graph has less edges. Example is adding M, J, A, B, E
will have more edges. Also it’s more difficult to
collect data.
~ Toothache
Catch
~Catch
Catch
~Catch
Cavity
0.108
0.012
0.072
0.008
~Cavity
0.016
0.064
0.144
0.576
Compute
a.
P(toothache)
b. P(Cavity)
c.
P(Toothache| cavity)
d. P(Cavity| toothache  catch)
4. After your yearly checkup, the doctor has bad news and good news. The
bad news is that you tested positive for a serious disease and the test is 99%
accurate (for instance, The probability of testing positive when you have the
disease is 0.99). The good news is that this is a rare disease, striking only 1 in
100,000 people of your age. Why is it a good news that the disease is rare? What
is the chance that you actually have the disease?
Independence in BN: Are two nodes conditionally independent given certain
evidences?
Causal Chain
Are X and Z always independent?
Are X and Z independent given Y?
X
Y
Common Cause
X = Newsgroup busy, Y = Project due, Z = Lab full
Is X and Z independent given Y?
Y
X
Z
X
Z
Y
Z
Common Effect
X: CS188 project due, Z: CS184 project due, Y:
Lack of sleep
Is X and Z independent?
Is X and Z independent given Y? General Case:
Bayes Ball Algorithm :
2.
A simple Bayes net with Boolean variables I = Intelligent, H =Honest,
P =Popular, L=LotsOfCampaignFunds, E =Elected.
Example
a. Which of the followings are asserted by the network (ignoring CPT)?
L
R
B
b. Calculate P(i, h, ~l, p, ~e)
D
T
c. Calculate the probability that someone is intelligent given that they are honest,
have few campaign funds, and are elected.
T
’
Exercise:
1. The Surprise Candy Company makes candy in two 70% are strawberry and
30% are anchovy. Each new piece of candy starts out with a round shape; as it
moves along the production line, a machine randomly selects a certain
percentage to be trimmed into a square; then, each piece is wrapped in a wrapper
whose color is chosen randomly to be red or brown. 80% of the strawberry
candies are round and 80% have a red wrapper, while 90% of the anchovy
candies are square and 90% have a brown wrapper. All candies are sold
individually in sealed, identical, black boxes. Now you, the customer, have just
bought a Surprise candy at the store but have not yet opened the box.
Consider these three Bayes nets:
a. Which network(s) can correctly represent P (Flavor, Wrapper, Shape)?
b. Which network is the best representation for this problem?
c. True/False: Network (i) asserts that P (Wrapper | Shape) =P (Wrapper).
d. What is the probability that your candy has a red wrapper?
(i) 0.8 (ii) 0.56 (iii) 0.59
e. In the box is a round candy with a red wrapper. The probability that its flavor
is strawberry
(i)0.7 (ii) Between 0.7 and 0.99 (iii) > 0.99
d. True/False If there are two candidates in the race, then making two copies of
the network will correctly represent the joint distribution over the two sets of
variables.
3.
Is X2  X3 | {X1, X6}? How about X1  X6 | {X2, X3}?
Download