Bayes` product and sum rules

advertisement
Bayes’ product and sum rules
Bayes’ theorem holds the key to the study of information in an uncertain
world as information alters probability beliefs and Bayes’ product and sum rules
describe consistent belief updating.1 Bayes’ theorem says
Pr (X, Y ) = Pr (Y | X) Pr (X) = Pr (X | Y ) Pr (Y )

Pr (X) =
Pr (X, Y )
Y
Pr (Y ) =

Pr (X, Y )
X
where X and Y are vectors of random variables of arbitrary length. In other
words, joint probabilities are the products of conditional and marginal probabilities – built from a product rule. Marginal probabilities are sums of joint
probabilities on integrating out all random variables except the target random
variables – built from a sum rule. And finally to complete the circle, conditional probabilities utilize the product rule by the ratio of joint probability
to the marginal probability for the random variables on which we’re conditioning.2 Hence, identification or assignment of the joint probability distribution is
a complete representation of the uncertain setting.
Product rule Suppose for clarity, Y = [y1 , y2 , y3 , . . . , yn ] and X = x. Joint
probabilities are built from products of chains.
Pr (x, y1 , y2 , y3 , . . . , yn )
= Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (y2 | x, y1 ) Pr (y1 | x) Pr (x)
= Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (y2 | x, y1 ) Pr (x, y1 )
= Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (y3 | x, y1 , y2 ) Pr (x, y1 , y2 )
= Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) · · · Pr (x, y1 , y2 , y3 )
= Pr (yn | x, y1 , y2 , y3 , . . . , yn1 ) Pr (x, y1 , y2 , y3 , . . . , yn1 )
Sum rule Continue with the setup above. Integrating out other random
variables yields marginal probabilities. For example,

Pr (x) =
Pr (x, y1 , y2 , y3 , . . . , yn )
y1 ,y2 ,y3 ,...,yn
where the summation is over all values of y1 , y2 , y3 , . . . , yn1 , and yn .
1 Consistency is the key to science. There are no truths, but rather only consistent reasoning
in our search for the richest and deepest understanding of the world.
2 In the case of continuous random variables, probability masses Pr (·) are replaced by
densities, say f (·), and summation is replaced by integration.
1
Stochastic independence Random variables are stochastically independent
when their distributions are unrelated. This is succinctly summarized as
Pr (X | Y ) = Pr (X)
or
Pr (Y | X) = Pr (Y )
Now, when X and Y are independent the product rule simplifies as
Pr (X, Y ) = Pr (X | Y ) Pr (Y )
= Pr (X) Pr (Y )
or
Pr (X, Y ) = Pr (Y | X) Pr (X)
= Pr (Y ) Pr (X)
Stochastic conditional independence Random variables are stochastically
conditional independent when their conditional distributions are unrelated.3
This is briefly summarized (for n = 2) as
Pr (y2 | x, y1 ) = Pr (y2 | x)
or
Pr (y1 | x, y2 ) = Pr (y1 | x)
Now, when (y1 | x) and (y2 | x) are independent the product rule simplifies as
Pr (x, y1 , y2 ) = Pr (y2 | x, y1 ) Pr (y1 | x) Pr (x)
= Pr (y2 | x) Pr (y1 | x) Pr (x)
or
Pr (x, y1 , y2 ) = Pr (y1 | x, y2 ) Pr (y2 | x) Pr (x)
= Pr (y1 | x) Pr (y2 | x) Pr (x)
3 This
case is illustrated in Ralph’s Scale.
2
Download