CS 188 Sp07 Discussion Note Week 7 – Discrete Probability and Bayesian Network by Nuttapong Chentanez Probability Independence X and Y are independent if P(X,Y) = P(X)P(Y), equivalently P(X) = P(X|Y) Sample Space – Set of all possible outcome of some experiment Random Variables – Function that assign a value to each outcome in a sample space eg. If sample space S is the set of all students in this class, one could define a random variable A, measuring age. If p is a person, A(p) is his/her age. Event - a set of outcomes that share property you are interested in. eg. For sample space S, J may be the set of juniors. Randomly picking a person may pr may not result in the event that he/she is a junior. P(J) denotes the probability that the event J occurs. Events can be union, intersects, complement to define new events. Particular conditions on random variables such as A=6’1”, A<7’ can also be considered an event. Conditional independent X and Y are conditional independent given Z if P(X,Y|Z) = P(X|Z) P(Y|Z) Exercise: Show that the above is equivalent to P(X|Y,Z) = P(X|Z) Chain rules n P(x1, x2, …, xn) = P(xn|xn-1,…..,x1)P(xn-1,…, x1) = P(xi|xi-1,….,x1) i=1 Bayes’s Nets A set of random variables as nodes (discrete or continuous) A set of directed links or arrows connects pairs of nodes. Each node Xi has a conditional probability distribution P(Xi|Parents(Xi)) Graph has no directed cycle, directed acyclic graph (DAG) Conditional Probability – P(X|Y) = P(X Y) / P(Y) is probability that event X occurs given that event Y occurs Joint Distribution – P(A = a, B = b) denotes probability that A = a and B = b Marginal Distribution – P(A = a) = P(A = a, B = b) This summation is called “marginalization” b Conditional distribution – P(A = a| B = b) gives conditional probability Important Rules: Chain Rules: P(X, Y) = P(X|Y)P(Y) Bayes’s Rule: P(X|Y) = P(Y|X) P(X) / P(Y), Axioms of probability 1. 0<= P(a) <= 1, for any proposition a, 2. P(true) =1 , P(false) = 1 3. P( a b) = P(a) + P(b) – P(a b) The topology imply certain conditional independencies: P(Xi|Xi-1,….X1) = P(Xi| Parents(Xi)) Given that Parents(Xi) {Xi-1, … , X1} very useful in AI Combined with chain rule, we can write full joint probability as: n P(X1, …Xn) = P(Xi|Parents(Xi)), i=1 Exercise: 1. Show P(a| a b) = 1 2. 3. using the partial ordering implied by the DAG Consider the problem of dealing 5-card poker hands from a standard deck of 52 cards, assuming that the dealer is fair. a. How many atomic events are there in the joint probability distribution (how many 5-card hands are there)? b. What is the probability of each atomic event? c. What is the probability of dealt a royal straight flush? Four of a kind? From this table: Toothache Given a set of random variables, the correct order for constructing Bayes’s net is by adding “root causes” first, then the variables they influence and so on. This is because parents of a node are “direct influencers” of the node. Directed edges need not be causality, but constructing the graphs with causality in mind tend to make the graph has less edges. Example is adding M, J, A, B, E will have more edges. Also it’s more difficult to collect data. ~ Toothache Catch ~Catch Catch ~Catch Cavity 0.108 0.012 0.072 0.008 ~Cavity 0.016 0.064 0.144 0.576 Compute a. P(toothache) b. P(Cavity) c. P(Toothache| cavity) d. P(Cavity| toothache catch) 4. After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and the test is 99% accurate (for instance, The probability of testing positive when you have the disease is 0.99). The good news is that this is a rare disease, striking only 1 in 100,000 people of your age. Why is it a good news that the disease is rare? What is the chance that you actually have the disease? Independence in BN: Are two nodes conditionally independent given certain evidences? Causal Chain Are X and Z always independent? Are X and Z independent given Y? X Y Common Cause X = Newsgroup busy, Y = Project due, Z = Lab full Is X and Z independent given Y? Y X Z X Z Y Z Common Effect X: CS188 project due, Z: CS184 project due, Y: Lack of sleep Is X and Z independent? Is X and Z independent given Y? General Case: Bayes Ball Algorithm : 2. A simple Bayes net with Boolean variables I = Intelligent, H =Honest, P =Popular, L=LotsOfCampaignFunds, E =Elected. Example a. Which of the followings are asserted by the network (ignoring CPT)? L R B b. Calculate P(i, h, ~l, p, ~e) D T c. Calculate the probability that someone is intelligent given that they are honest, have few campaign funds, and are elected. T ’ Exercise: 1. The Surprise Candy Company makes candy in two 70% are strawberry and 30% are anchovy. Each new piece of candy starts out with a round shape; as it moves along the production line, a machine randomly selects a certain percentage to be trimmed into a square; then, each piece is wrapped in a wrapper whose color is chosen randomly to be red or brown. 80% of the strawberry candies are round and 80% have a red wrapper, while 90% of the anchovy candies are square and 90% have a brown wrapper. All candies are sold individually in sealed, identical, black boxes. Now you, the customer, have just bought a Surprise candy at the store but have not yet opened the box. Consider these three Bayes nets: a. Which network(s) can correctly represent P (Flavor, Wrapper, Shape)? b. Which network is the best representation for this problem? c. True/False: Network (i) asserts that P (Wrapper | Shape) =P (Wrapper). d. What is the probability that your candy has a red wrapper? (i) 0.8 (ii) 0.56 (iii) 0.59 e. In the box is a round candy with a red wrapper. The probability that its flavor is strawberry (i)0.7 (ii) Between 0.7 and 0.99 (iii) > 0.99 d. True/False If there are two candidates in the race, then making two copies of the network will correctly represent the joint distribution over the two sets of variables. 3. Is X2 X3 | {X1, X6}? How about X1 X6 | {X2, X3}?