Independent events The University of Sydney Let (X , p) be a probability space, and A, B ⊆ X be events. Random Variables MATH3067 Part 1: Information Theory Week 1 Lecture 2 University of Sydney NSW 2006 Australia 29 July 2012 Examples of independent events Intuitively, to say that the events A and B are independent means that the probability of A occurring is not affected by the question of whether or not B occurs. That is, p(A | B) should equal p(A). If p(B) 6= 0 then p(A | B) = p(A ∩ B)/p(B); so the above condition is the same as p(A ∩ B) = p(A)p(B). We take this as the definition. Definition: The events A and B are said to be independent if p(AB) = p(A)p(B). Combined experiments Consider the experiment of tossing a fair coin twice. Let A be the event that the first toss comes down heads, and B the event that the second toss comes down heads. Let X1 , X2 , . . . , XN be sample spaces for N experiments. The four simple events HH, HT, TH, TT all have probability 0.25. A = {HH, HT }, B = {HH, TH }. Both have probability 0.5. And AB = A ∩ B = {HH } has probability 0.25. Since 0.25 = 0.5 × 0.5 the events A and B are independent. Consider drawing a card at random from a well shuffled pack. There are 52 equiprobable outcomes. Let K be the event that a king is drawn, and S the event that a spade is drawn. Then p(K ) = 1/13 and p(S) = 1/4. And p(KS) = 1/52 = (1/13) × (1/4) = p(K )p(S). So these events are independent. The process of performing all of these experiments together can be regarded as a single joint experiment whose outcomes are N-tuples (x1 , x2 , . . . , xN ), where xi ∈ Xi for each i. We allow the possibility the the outcomes of the separate experiments are not independent of one another. For example, X1 could be the temperature, X2 the humidity, X3 the wind speed, X4 the brightness, all measured at midday. Joint distributions Marginal distributions of a joint distribution The sample space for joint experiment is the Cartesian product X1 × X2 × · · · × XN = { (x1 , x2 , . . . , xN ) | xi ∈ Xi for all i }. Definition: A probability distribution on X1 × X2 × · · · × XN is called a joint probability distribution for X1 , X2 , . . . , XN (N.B. a joint distribution satisfies ∑ p(x1 , x2 , . . . , xN ) = 1). x1 ,x2 ,...,xN In the case N = 2 the values of a joint distribution can be conveniently depicted by using a rectangular table. Here is an example of a joint distribution for X and Y , where X = {r , s } and Y = {a, b, c }: a b c p(x) r 0.1 0.2 0.1 0.4 s 0.2 0.1 0.3 0.6 p(y ) 0.3 0.3 0.4 Here p(r , a) = 0.1, p(r , b) = 0.2, p(r , c) = 0.1, and so on. Observe that the row sums give a probability distribution on X and the column sums give a probability distribution on Y . These are called the marginal distributions of the joint distribution (since their values are written in the margins of the joint table). The marginal distributions are defined by p(x) = ∑y ∈Y p(x , y ) and p(y ) = ∑x ∈X p(x , y ). Independent experiments Compound events for independent experiments Let X , Y be independent experiments and A ⊆ X , B ⊆ Y . Consider two experiments X and Y that have a joint distribution p : X × Y → [0, 1]. It ought to be true that p(AB) = p(A)p(B). If x0 ∈ X then we have defined p(x0 ) = ∑y ∈Y p(x0 , y ). We have A ⊆ X ; so p(A) refers to the marginal distribution. Really, this is the probability of the compound event A = {x0 } × Y = { (x0 , y ) | y ∈ Y }. i.e. p(x0 ) = p(A). That is, p(A) = ∑x ∈A p(x) = ∑x ∈A ∑y ∈Y p(x , y ). Similarly, if y0 ∈ Y then p(y0 ) = p(B), where B is the compound event B = X × {y0 } = { (x , y0 ) | x ∈ X }. Observe that AB = {(x0 , y0 )}. The events A and B are independent if p(AB) = p(A)p(B). First of all we need to understand what this really means. In terms of the joint distribution, this is the probability of the compound event A × Y = { (x , y ) | x ∈ A, y ∈ Y }. And p(B) = p(X × B), where X × B = { (x , y ) | x ∈ X , y ∈ B }. Now AB means (A × Y ) ∩ (X × B) = A × B. Since p(x , y ) = p(x)p(y ) (by independence), we get That is, x0 and y0 are independent if p(x0 , y0 ) = p(x0 )p(y0 ). p(AB) = Definition: The experiments X and Y are said to be independent if p(x , y ) = p(x)p(y ) for all x ∈ X and y ∈ Y . ∑ p(x , y ) = x ∈A,y ∈B ∑ p(x)p(y ) x ∈A,y ∈B = ∑ p(x) ∑ p(y ) = p(A)p(B). x ∈A y ∈B Random variables Functions of a random variable We now adopt a less formal but more convenient language. Let p be a probability distribution on X = {x1 , x2 , . . . , xn }. We now think of the xi as being the different possible values of a variable, which we denote by X . And p(xi ) is the probability that the random variable X takes the value xi . “Performing the experiment” amounts to randomly choosing a value for X . More often than not we will write p(X = x) rather than just p(x) for the probability that the random variable X takes the value x. Expectation Imagine a random variable X whose values are the possible outcomes of some game of chance. Imagine also that the player of the game wins or loses money depending on the outcome. So there is a “payoff function”, which is a real-valued function defined on the sample space – that is, a function f : X → R. For an outcome x, the player wins if f (x) > 0, loses if f (x) < 0. Suppose someone plays the game N times, and for each x ∈ X let N(x) be the number of times the outcome is x. The player’s overall payoff is ∑x ∈X N(x)f (x). The average payoff per game is ∑x ∈X N(x) N f (x) = ∑x ∈X p(x)f (x) where p(x) is the empirical probability of x. The Bolonian Lottery In general, if p is a probability distribution for X , and f is a real-valued payoff function on X , then ∑x ∈X p(x)f (x) is the expected average payoff per game (since p(x) would be the relative frequency of x in an ideal world). Definition: Let (X , p) be a probability space and f a real-valued function on X . The expectation of f is the quantity E(f ) = ∑ p(x)f (x). x ∈X If X has n elements and each p(x) = 1/n for all x, then E(f ) = ∑ f (x)/n x ∈X is just the average value of f on the set X . In general the expectation is a weighted average: more likely outcomes have higher weighting than less likely ones. In (nonexistent) Bolonia it costs $1 to enter the lottery. Entrants choose an 8 digit number. If your number wins you get 1000000. Mary always choose 11121985 (her birth date). What can she expect to win or lose on average? The payoff is ( f (x) = −1 if x is not the winning number 1000000 if x is the winning number The 108 outcomes all have probability 10−8 , and f (x) = 106 for one value of x and f (x) = −1 for all the others. So E(f ) = 10−8 × 106 + (108 − 1)(10−8 × (−1)) = −0.98999999. In the long run, Mary loses almost a dollar every time. Expectation formulas for real-valued random variables Real-valued random variables Sometimes the possible values of the random variable X are themselves real numbers. In this case the payoff function could be simply f (x) = x, and it makes sense to talk about the expected value of X . The expectation of X is E(X ) = ∑ p(x) × x. x ∈X Proposition: Let X and Y be real-valued random variables. Then 1. E(X + Y ) = E(X ) + E(Y ), 2. E(λX ) = λE(X ) for all λ ∈ R, It is important to realize that this only makes sense when x ∈ R. 3. If X and Y are independent then E(XY ) = E(X )E(Y ), Similarly, for real-valued random variables we can apply arithmetic operations directly to the variables themselves. 4. If X ≥ Y then E(X ) ≥ E(Y ), Thus if X and Y are real-valued random variables, then so are X + Y and XY (for example). 6. |E(X )| ≤ E(|X |). Proofs of 1, 2 and 3. Proofs of 4, 5 and 6. For the 1st formula: E(X + Y ) = Suppose X ≥ 0; that is, for all x, if p(x) 6= 0 then x ≥ 0. Then E(X ) = ∑x p(x)x ≥ 0 (since all nonzero terms are > 0). ∑ p(x , y )(x + y ) x ,y =∑ x p(x , y ) x + p(x , y ) y ∑ ∑ ∑ y y x = ∑ p(x)x + ∑ p(y )y = E(X ) + E(Y ) x y 2nd formula: E(λX ) = ∑x p(x)(λx) = λ ∑x p(x)x = λE(X ). 3rd formula: If X and Y are independent, E(XY ) = 5. If X is constant, always taking the value λ, then E(X ) = λ, ∑ p(x , y )xy = x ,y ∑ p(x)p(y )xy x ,y = ∑ p(x)x ∑ p(y )y = E(X )E(Y ). x y If X ≥ Y then X − Y ≥ 0. So E(X − Y ) ≥ 0. So E(X ) − E(Y ) ≥ 0, and so E(X ) ≥ E(Y ). To say X = λ (constant) means p(λ) = 1 and p(x) = 0 for all other x. This gives E(X ) = p(λ)λ + ∑ p(x)x = λ + 0 = λ. x 6=λ For the final formula, note that −|x | ≤ x ≤ |x | holds for all x, and so the random variable X satisfies −|X | ≤ X ≤ |X |. The 4th formula then gives −E(|X |) ≤ E(X ) ≤ E(|X |). This implies that |E(X )| ≤ E(|X |).