# Random Variables ```Independent events
The University
of Sydney
Let (X , p) be a probability space, and A, B ⊆ X be events.
Random Variables
MATH3067 Part 1: Information Theory
Week 1 Lecture 2
University of Sydney
NSW 2006
Australia
29 July 2012
Examples of independent events
Intuitively, to say that the events A and B are independent
means that the probability of A occurring is not affected by the
question of whether or not B occurs.
That is, p(A | B) should equal p(A).
If p(B) 6= 0 then p(A | B) = p(A ∩ B)/p(B); so the above
condition is the same as p(A ∩ B) = p(A)p(B). We take this as
the definition.
Definition: The events A and B are said to be independent if
p(AB) = p(A)p(B).
Combined experiments
Consider the experiment of tossing a fair coin twice.
Let A be the event that the first toss comes down heads, and B
the event that the second toss comes down heads.
Let X1 , X2 , . . . , XN be sample spaces for N experiments.
The four simple events HH, HT, TH, TT all have probability 0.25.
A = {HH, HT }, B = {HH, TH }. Both have probability 0.5.
And AB = A ∩ B = {HH } has probability 0.25.
Since 0.25 = 0.5 &times; 0.5 the events A and B are independent.
Consider drawing a card at random from a well shuffled pack.
There are 52 equiprobable outcomes.
Let K be the event that a king is drawn, and S the event that a
Then p(K ) = 1/13 and p(S) = 1/4. And
p(KS) = 1/52 = (1/13) &times; (1/4) = p(K )p(S).
So these events are independent.
The process of performing all of these experiments together
can be regarded as a single joint experiment whose outcomes
are N-tuples (x1 , x2 , . . . , xN ), where xi ∈ Xi for each i.
We allow the possibility the the outcomes of the separate
experiments are not independent of one another.
For example, X1 could be the temperature, X2 the humidity, X3
the wind speed, X4 the brightness, all measured at midday.
Joint distributions
Marginal distributions of a joint distribution
The sample space for joint experiment is the Cartesian product
X1 &times; X2 &times; &middot; &middot; &middot; &times; XN = { (x1 , x2 , . . . , xN ) | xi ∈ Xi for all i }.
Definition: A probability distribution on X1 &times; X2 &times; &middot; &middot; &middot; &times; XN is
called a joint probability distribution for X1 , X2 , . . . , XN
(N.B. a joint distribution satisfies
∑
p(x1 , x2 , . . . , xN ) = 1).
x1 ,x2 ,...,xN
In the case N = 2 the values of a joint distribution can be
conveniently depicted by using a rectangular table.
Here is an example of a joint distribution for X and Y ,
where X = {r , s } and Y = {a, b, c }:
a b c p(x)
r 0.1 0.2 0.1 0.4
s 0.2 0.1 0.3 0.6
p(y ) 0.3 0.3 0.4
Here p(r , a) = 0.1, p(r , b) = 0.2, p(r , c) = 0.1, and so on.
Observe that the row sums give a probability distribution on X
and the column sums give a probability distribution on Y .
These are called the marginal distributions of the joint
distribution (since their values are written in the margins of the
joint table).
The marginal distributions are defined by p(x) = ∑y ∈Y p(x , y )
and p(y ) = ∑x ∈X p(x , y ).
Independent experiments
Compound events for independent experiments
Let X , Y be independent experiments and A ⊆ X , B ⊆ Y .
Consider two experiments X and Y that have a joint
distribution p : X &times; Y → [0, 1].
It ought to be true that p(AB) = p(A)p(B).
If x0 ∈ X then we have defined p(x0 ) = ∑y ∈Y p(x0 , y ).
We have A ⊆ X ; so p(A) refers to the marginal distribution.
Really, this is the probability of the compound event
A = {x0 } &times; Y = { (x0 , y ) | y ∈ Y }. i.e. p(x0 ) = p(A).
That is, p(A) = ∑x ∈A p(x) = ∑x ∈A ∑y ∈Y p(x , y ).
Similarly, if y0 ∈ Y then p(y0 ) = p(B), where B is the
compound event B = X &times; {y0 } = { (x , y0 ) | x ∈ X }.
Observe that AB = {(x0 , y0 )}.
The events A and B are independent if p(AB) = p(A)p(B).
First of all we need to understand what this really means.
In terms of the joint distribution, this is the probability of the
compound event A &times; Y = { (x , y ) | x ∈ A, y ∈ Y }.
And p(B) = p(X &times; B), where X &times; B = { (x , y ) | x ∈ X , y ∈ B }.
Now AB means (A &times; Y ) ∩ (X &times; B) = A &times; B.
Since p(x , y ) = p(x)p(y ) (by independence), we get
That is, x0 and y0 are independent if p(x0 , y0 ) = p(x0 )p(y0 ).
p(AB) =
Definition: The experiments X and Y are said to be
independent if p(x , y ) = p(x)p(y ) for all x ∈ X and y ∈ Y .
∑
p(x , y ) =
x ∈A,y ∈B
∑
p(x)p(y )
x ∈A,y ∈B
=
∑ p(x) ∑ p(y ) = p(A)p(B).
x ∈A
y ∈B
Random variables
Functions of a random variable
We now adopt a less formal but more convenient language.
Let p be a probability distribution on X = {x1 , x2 , . . . , xn }.
We now think of the xi as being the different possible values of
a variable, which we denote by X .
And p(xi ) is the probability that the random variable X takes the
value xi .
“Performing the experiment” amounts to randomly choosing a
value for X .
More often than not we will write p(X = x) rather than just p(x)
for the probability that the random variable X takes the value x.
Expectation
Imagine a random variable X whose values are the possible
outcomes of some game of chance.
Imagine also that the player of the game wins or loses money
depending on the outcome.
So there is a “payoff function”, which is a real-valued function
defined on the sample space – that is, a function f : X → R.
For an outcome x, the player wins if f (x) &gt; 0, loses if f (x) &lt; 0.
Suppose someone plays the game N times, and for each x ∈ X
let N(x) be the number of times the outcome is x.
The player’s overall payoff is ∑x ∈X N(x)f (x).
The average payoff per game is ∑x ∈X N(x)
N f (x) = ∑x ∈X p(x)f (x)
where p(x) is the empirical probability of x.
The Bolonian Lottery
In general, if p is a probability distribution for X , and f is a
real-valued payoff function on X , then ∑x ∈X p(x)f (x) is the
expected average payoff per game (since p(x) would be the
relative frequency of x in an ideal world).
Definition: Let (X , p) be a probability space and f a real-valued
function on X . The expectation of f is the quantity
E(f ) =
∑ p(x)f (x).
x ∈X
If X has n elements and each p(x) = 1/n for all x, then
E(f ) =
∑ f (x)/n
x ∈X
is just the average value of f on the set X .
In general the expectation is a weighted average: more likely
outcomes have higher weighting than less likely ones.
In (nonexistent) Bolonia it costs \$1 to enter the lottery.
Entrants choose an 8 digit number.
If your number wins you get 1000000.
Mary always choose 11121985 (her birth date).
What can she expect to win or lose on average?
The payoff is
(
f (x) =
−1
if x is not the winning number
1000000 if x is the winning number
The 108 outcomes all have probability 10−8 , and f (x) = 106 for
one value of x and f (x) = −1 for all the others.
So
E(f ) = 10−8 &times; 106 + (108 − 1)(10−8 &times; (−1)) = −0.98999999.
In the long run, Mary loses almost a dollar every time.
Expectation formulas for real-valued
random variables
Real-valued random variables
Sometimes the possible values of the random variable X are
themselves real numbers.
In this case the payoff function could be simply f (x) = x,
and it makes sense to talk about the expected value of X .
The expectation of X is E(X ) = ∑ p(x) &times; x.
x ∈X
Proposition: Let X and Y be real-valued random variables.
Then
1. E(X + Y ) = E(X ) + E(Y ),
2. E(λX ) = λE(X ) for all λ ∈ R,
It is important to realize that this only makes sense when x ∈ R.
3. If X and Y are independent then E(XY ) = E(X )E(Y ),
Similarly, for real-valued random variables we can apply
arithmetic operations directly to the variables themselves.
4. If X ≥ Y then E(X ) ≥ E(Y ),
Thus if X and Y are real-valued random variables, then so are
X + Y and XY (for example).
6. |E(X )| ≤ E(|X |).
Proofs of 1, 2 and 3.
Proofs of 4, 5 and 6.
For the 1st formula:
E(X + Y ) =
Suppose X ≥ 0; that is, for all x, if p(x) 6= 0 then x ≥ 0.
Then E(X ) = ∑x p(x)x ≥ 0 (since all nonzero terms are &gt; 0).
∑ p(x , y )(x + y )
x ,y
=∑
x
p(x
,
y
)
x
+
p(x
,
y
)
y
∑
∑ ∑
y
y
x
= ∑ p(x)x + ∑ p(y )y = E(X ) + E(Y )
x
y
2nd formula: E(λX ) = ∑x p(x)(λx) = λ ∑x p(x)x = λE(X ).
3rd formula: If X and Y are independent,
E(XY ) =
5. If X is constant, always taking the value λ, then E(X ) = λ,
∑ p(x , y )xy =
x ,y
∑ p(x)p(y )xy
x ,y
= ∑ p(x)x ∑ p(y )y = E(X )E(Y ).
x
y
If X ≥ Y then X − Y ≥ 0. So E(X − Y ) ≥ 0.
So E(X ) − E(Y ) ≥ 0, and so E(X ) ≥ E(Y ).
To say X = λ (constant) means p(λ) = 1 and p(x) = 0 for all
other x. This gives
E(X ) = p(λ)λ +
∑ p(x)x = λ + 0 = λ.
x 6=λ
For the final formula, note that −|x | ≤ x ≤ |x | holds for all x,
and so the random variable X satisfies −|X | ≤ X ≤ |X |.
The 4th formula then gives −E(|X |) ≤ E(X ) ≤ E(|X |).
This implies that |E(X )| ≤ E(|X |).
```