CS 188 Sp07 Discussion Note Week 7 – Discrete Probability and Bayesian Network by Nuttapong Chentanez Probability Sample Space – Set of all possible outcome of some experiment Random Variables – Function that assign a value to each outcome in a sample space eg. If sample space S is the set of all students in this class, one could define a random variable A, measuring age. If p is a person, A(p) is his/her age. Event - a set of outcomes that share property you are interested in. eg. For sample space S, J may be the set of juniors. Randomly picking a person may pr may not result in the event that he/she is a junior. P(J) denotes the probability that the event J occurs. Events can be union, intersects, complement to define new events. Particular conditions on random variables such as A=6’1”, A<7’ can also be considered an event. Conditional Probability – P(X|Y) = P(X Y) / P(Y) is probability that event X occurs given that event Y occurs Joint Distribution – P(A = a, B = b) denotes probability that A = a and B = b Marginal Distribution – P(A = a) = P(A = a, B = b) This summation is called “marginalization” b Conditional distribution – P(A = a| B = b) gives conditional probability Important Rules: Chain Rules: P(X, Y) = P(X|Y)P(Y) Bayes’s Rule: P(X|Y) = P(Y|X) P(X) / P(Y), Axioms of probability 1. 0<= P(a) <= 1, for any proposition a, 2. P(true) =1 , P(false) = 1 3. P( a b) = P(a) + P(b) – P(a b) very useful in AI Exercise: 1. Show P(a| a b) = 1 2. Consider the problem of dealing 5-card poker hands from a standard deck of 52 cards, assuming that the dealer is fair. a. How many atomic events are there in the joint probability distribution (how many 5-card hands are there)? b. What is the probability of each atomic event? c. What is the probability of dealt a royal straight flush? Four of a kind? 3. From this table: Toothache ~ Toothache Catch ~Catch Catch ~Catch Cavity 0.108 0.012 0.072 ~Cavity 0.016 0.064 0.144 Compute a. P(toothache) b. P(Cavity) c. P(Toothache| cavity) d. P(Cavity| toothache catch) 0.008 0.576 4. After your yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and the test is 99% accurate (i.e. The probability of testing positive when you have the disease is 0.99). The good news is that this is a rare disease, striking only 1 in 100,000 people of your age. Why is it a good news that the disease is rare? What is the chance that you actually have the disease? Independence X and Y are independent if P(X,Y) = P(X)P(Y), equivalently P(X) = P(X|Y) Conditional independent X and Y are conditional independent given Z if P(X,Y|Z) = P(X|Z) P(Y|Z) Exercise: Show that the above is equivalent to P(X|Y,Z) = P(X|Z) Chain rules P(x1, x2, …, xn) = P(xn|xn-1,…..,x1)P(xn-1,…, x1) n = P(xi|xi-1,….,x1) i=1 Bayes’s Nets A set of random variables as nodes (discrete or continuous) A set of directed links or arrows connects pairs of nodes. Each node Xi has a conditional probability distribution P(Xi|Parents(Xi)) Graph has no directed cycle, directed acyclic graph (DAG) The topology imply certain conditional independencies: P(Xi|Xi-1,….X1) = P(Xi| Parents(Xi)) Given that Parents(Xi) {Xi-1, … , X1} Combined with chain rule, we can write full joint probability as: n P(X1, …Xn) = P(Xi|Parents(Xi)), using the partial ordering implied by the DAG i=1 Given a set of random variables, the correct order for constructing Bayes’s net is by adding “root causes” first, then the variables they influence and so on. This is because parents of a node are “direct influencers” of the node. Directed edges need not be causality, but constructing the graphs with causality in mind tend to make the graph has less edges. Example is adding M, J, A, B, E will have more edges. Also more difficult to collect data. Independence in BN: Are two nodes conditionally independent given certain evidences? Causal Chain X Y Z Are X and Z always independent? No. eg. X=Low air pressure, Y=Rain, Z=Traffic They could be independent, by crafting CPTs so that P(Z|X) = P(Z) Are X and Z independent given Y? Yes, P(Z|X,Y) = P(X,Y,Z)/P(X,Y) = P(X)P(Y|X)P(Z|Y) / P(X)P(Y|X) = P(Z|Y) Common Cause Y X Z X = Newsgroup busy, Y = Project due, Z = Lab full Is X and Z independent given Y? Yes, P(Z|X,Y) = P(X,Y,Z)/P(X,Y) = P(Y)P(X|Y)P(Z|Y)/P(Y)P(X|Y) = P(Z|Y) Common Effect X Z Y X: CS188 project due, Z: CS184 project due, Y: Lack of sleep Is X and Z independent? Yes Is X and Z independent given Y? No, if you don’t have cs188 proj and lack of sleep, it’s more likely that cs184 is due :P General Case: Bayes Ball Algorithm • Correct algorithm: • Shade in evidence • Start at source node • Try to reach target by search • States: pair of (node X, previous state S) • Successor function: • X unobserved: • To any child • To any parent if coming from a child • X observed: • From parent to parent • If you can’t reach a node, it’s conditionally independent of the start node given evidence S X X S S X S Example X L R D B T T’ Yes, Yes, No, No (L->R->T->T’->T->B), Yes Exercise: 1. The Surprise Candy Company makes candy in two 70% are strawberry and 30% are anchovy Each new piece of candy starts out with a round shape; as it moves along the production line, a machine randomly selects a certain percentage to be trimmed into a square; then, each piece is wrapped in a wrapper whose color is chosen randomly to be red or brown. 80% of the strawberry candies are round and 80% have a red wrapper, while 90% of the anchovy candies are square and 90% have a brown wrapper. All candies are sold individually in sealed, identical, black boxes. Now you, the customer, have just bought a Surprise candy at the store but have not yet opened the box. Consider these three Bayes nets: a. Which network(s) can correctly represent P (Flavor, Wrapper, Shape)? b. Which network is the best representation for this problem? c. True/False: Network (i) asserts that P (Wrapper | Shape) =P (Wrapper). d. What is the probability that your candy has a red wrapper? (i) 0.8 (ii) 0.56 (iii) 0.59 e. In the box is a round candy with a red wrapper. The probability that its flavor is strawberry (i)0.7 (ii) Between 0.7 and 0.99 (iii) > 0.99 2. A simple Bayes net with Boolean variables I = Intelligent, H =Honest, P =Popular, L=LotsOfCampaignFunds, E =Elected. a. Which of the followings are asserted by the network (ignoring CPT)? b. Calculate P(i, h, ~l, p, ~e) c. Calculate the probability that someone is intelligent given that they are honest, have few campaign funds, and are elected. d. True/False If there are two candidates in the race, then making two copies of the network will correctly represent the joint distribution over the two sets of variables. 3. Is X2 X3 | {X1, X6}? How about X1 X6 | {X2, X3}? No, Yes