Bayes’ product and sum rules Bayes’ theorem holds the key to the study of information in an uncertain world as information alters probability beliefs and Bayes’ product and sum rules describe consistent belief updating.1 Bayes’ theorem says Pr (X, Y ) = Pr (Y | X) Pr (X) = Pr (X | Y ) Pr (Y ) Pr (X) = Pr (X, Y ) Y Pr (Y ) = Pr (X, Y ) X where X and Y are vectors of random variables of arbitrary length. In other words, joint probabilities are the products of conditional and marginal probabilities – built from a product rule. Marginal probabilities are sums of joint probabilities on integrating out all random variables except the target random variables – built from a sum rule. And finally to complete the circle, conditional probabilities utilize the product rule by the ratio of joint probability to the marginal probability for the random variables on which we’re conditioning.2 Hence, identification or assignment of the joint probability distribution is a complete representation of the uncertain setting. Product rule Suppose for clarity, Y = [y1 , y2 , y3 , . . . , yn ] and X = x. Joint probabilities are built from products of chains. Pr (x, y1 , y2 , y3 , . . . , yn ) = Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (y2 | x, y1 ) Pr (y1 | x) Pr (x) = Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (y2 | x, y1 ) Pr (x, y1 ) = Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (x, y1 , y2 ) = Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (x, y1 , y2 , y3 ) = Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) Pr (x, y1 , y2 , y3 , . . . , yn1 ) Sum rule Continue with the setup above. Integrating out other random variables yields marginal probabilities. For example, Pr (x) = Pr (x, y1 , y2 , y3 , . . . , yn ) y1 ,y2 ,y3 ,...,yn where the summation is over all values of y1 , y2 , y3 , . . . , yn1 , and yn . 1 Consistency is the key to science. There are no truths, but rather only consistent reasoning in our search for the richest and deepest understanding of the world. 2 In the case of continuous random variables, probability masses Pr (·) are replaced by densities, say f (·), and summation is replaced by integration. 1 Stochastic independence Random variables are stochastically independent when their distributions are unrelated. This is succinctly summarized as Pr (X | Y ) = Pr (X) or Pr (Y | X) = Pr (Y ) Now, when X and Y are independent the product rule simplifies as Pr (X, Y ) = Pr (X | Y ) Pr (Y ) = Pr (X) Pr (Y ) or Pr (X, Y ) = Pr (Y | X) Pr (X) = Pr (Y ) Pr (X) Stochastic conditional independence Random variables are stochastically conditional independent when their conditional distributions are unrelated.3 This is briefly summarized (for n = 2) as Pr (y2 | x, y1 ) = Pr (y2 | x) or Pr (y1 | x, y2 ) = Pr (y1 | x) Now, when (y1 | x) and (y2 | x) are independent the product rule simplifies as Pr (x, y1 , y2 ) = Pr (y2 | x, y1 ) Pr (y1 | x) Pr (x) = Pr (y2 | x) Pr (y1 | x) Pr (x) or Pr (x, y1 , y2 ) = Pr (y1 | x, y2 ) Pr (y2 | x) Pr (x) = Pr (y1 | x) Pr (y2 | x) Pr (x) 3 This case is illustrated in Ralph’s Scale. 2