Conditional probability and independent events

advertisement
1.3 Conditional probability and independent events.
1.3.1 Conditional probability.
Conditional probabilities are a way to adjust the probability of something happening according to how
much information one has.
Example 1. Suppose someone rolls a die as in Example 1.1, but he doesn't show us what number comes up.
However, he does tell us that it is four or larger. Does this affect the probability that the number is even?
Here is one possible way of looking at this. We roll the die a large number of times and we ignore any rolls
which produce a number less than 4. Of those which produce a number 4 or larger, we count the number
which are also even. The fraction
# which are even and 4 or larger
# which are 4 or larger
is approximately the probability that the number is even after we are told that it is four of larger. Note that
# which are even and 4 or larger
=
# which are 4 or larger

The ratio
# which are even and 4 or larger
total # of rolls
# which are 4 or larger
total # of rolls
Pr{value is even and 4 or larger}
Pr{ 4, 6 }
2/6
2
=
=
= ,
Pr{value is 4 or larger}
Pr{ 4, 5, 6 }
3/6
3
Pr{value is even and 4 or larger}
is called the conditional probability that the value is even given
Pr{value is 4 or larger}
that the value is 4 or larger and denoted by Pr{ value is even | value is 4 or larger }, i.e.
Pr{ value is even | value is 4 or larger } =
Pr{value is even and 4 or larger}
Pr{value is 4 or larger}
More generally, if A and B are two events, then we are interested in the probability that the outcome is in A
given that the outcome is in B, which we shall denote by Pr{A | B}. Intuitively, we may interpret this as
meaning the following. We perform the experiment a large number of times with each one independent of
the others. We ignore any times the outcome is not in B, and of the times the outcome is in B, we count the
number of times the outcome is also in A. Then
# times outcome is in A  B
 Pr{ A | B },
# times outcome is in B
as the number of repetitions  .
Note that if we divide the top and bottom of the above fraction by the total number of times we do the
experiment, then we get
1.3 - 1
# times outcome is in A  B
# repetitions
# times outcome is in B
# repetitions
 Pr{ A | B },
The fraction in the top of the left side of the above approaches Pr{A  B } and the fraction in the bottom
approaches Pr{B} as the number of times we do the experiment  .. So we have
(1)
Pr{A | B} =
Pr{A  B}
,
Pr{B}
provided Pr{B} is not 0. Most texts take this as the definition of Pr{A | B}.
Problem 2.1. You own stock in Megabyte Computer Corporation. You estimate that there is an 80%
chance of Megabyte making a profit if it makes a certain technological breakthrough, but only a 30%
chance of making a profit if they don't make the breakthrough. Furthermore, you estimate that there is
a 40% chance of its making the breakthrough. Suppose before you can find out if they made the
breakthrough, you go on a 6 month vacation to Tahiti. Then one day you receive the following
message from your stockbroker. "Megabyte made a profit." What is the probability that Megabyte
made the breakthrough?
Solution. Let
E = the event that they make a profit
F = the event that they make a breakthrough
Pr{F  E}
. We are given Pr{E | F} = 0.8 and Pr{E | Fc} = 0.3 and
Pr{E}
Pr{F} = 0.4. Here Fc, the complement of F is the set of outcomes in the sample space that are not in F.
We can proceed as follows.
We want to find Pr{F | E} =
Pr{Fc} = 1 - Pr{F} = 1 - 0.4 = 0.6
Pr{E  F} = Pr{E | F}Pr{F} = (0.8)(0.4) = 0.32
Pr{E  Fc} = Pr{E | Fc}Pr{Fc} = (0.3)(0.6) = 0.18
Pr{E} = Pr{E  F} + Pr{E  Fc} = 0.32 + 0.18 = 0.5
Pr{F | E} =
Pr{F  E}
0.32
=
= 0.64
Pr{E}
0.5
E = make a profit
Ec = don't make a profit
F = make a
breakthrough
Pr{E  F} = 0.32
Fc = don't make a
breakthrough
Pr{E  Fc} = 0.18
Pr{F} = 0.4
Pr{Fc} = 0.4
1.3 - 2
Pr{E} = 0.5
1.3.2 Independent events.
Earlier we used the term independent in an informal way to indicate repetitions of an experiment which are
in some way unrelated. In most books the notion of independence is defined in terms of probability instead
of vice versa. This is done as follows.
We say than an event A is independent of another event B if the probability that the outcome is in A is the
same as the probability that the outcome is in A given that the outcome lies in B. In symbols
(2)
Pr{A | B} = Pr{A}.
Thus the knowledge that B has occurred doesn't give one any information regarding the probability that A
has occurred.
Example 2. In Example 1 getting an even number is not independent of getting 4 or larger since Pr{even |
4 or larger } = 2/3, while Pr{even} = 1/2. On the other hand if C = {3, 4, 5, 6} is the event of getting 3 or
larger then it is not hard to show that Pr{A | C} = 1/2 so that getting an even number is independent of
getting 3 or larger.
Since Pr{A | B} = Pr{A  B}/Pr{B}, it follows that A be independent of B if
(3)
Pr{A  B} = Pr{A} Pr{B}.
Thus, A is independent of B if the probability of both A and B occurring is the product of their probabilities.
One consequence of this is that if A is independent of B, then B is independent of A, i.e. the definition is
symmetric in the two sets A and B.
The formula (3) makes sense even if Pr{B} is 0, while the original formula (2) does not. Most texts use (3)
as the basic definition of independent events.
Problem 1. a. You roll a die twice as in the Example 2 of section 2.1. Consider the event where
the sum of the numbers on the two rolls is 7. Show that this is independent of rolling a 1 on the
first roll.
b. Let B be the event of rolling a 1 on the first roll or second roll or both. Show that the event
where the sum of the numbers on the two rolls is 7 is not independent of B.
1.3.3 Modeling repetitions.
We can use conditional probabilities to model situations where we repeat the same observation more than
once and we want to describe how the first observation affects the second.
Example 3. Consider Example 1 in section 1.2.1 where an office copier on any particular day is either in
good condition (g or 1), poor condition (p or 2) or broken (b or 3) and we observe the copier today and
tomorrow. As in that example, we use the random variables
1.3 - 3
X1 = condition of the copier today
X2 = condition of the copier tomorrow
to help describe the outcomes and events. The probabilities for the nine possible outcomes for the two days
were as follows.
p =
 pp1121

 p31
p12 p13

p22 p23 
p32 p33 
=
1 = g, X2 = g}
 Pr{X
 Pr{X1 = p, X2 = g}
 Pr{X1 = b, X2 = g}
Pr{X1 = g, X2 = p}
Pr{X1 = p, X2 = p}
Pr{X1 = b, X2 = p}
Pr{X1 = g, X2 = b}

Pr{X1 = p, X2 = b} 
Pr{X1 = b, X2 = b} 
=
 0.4
 0
 0.128
0.03 0.07

0.04 0.06 
0.032 0.24 
We also computed the probabilities of the events where the copier was in a certain condition today or
tomorrow. These were
1 = g}
 Pr{X

 0.5 
 Pr{X1 = p}  =  0.1 
 Pr{X1 = b} 
 0.4 
and
2 = g}
 Pr{X

 0.528 
 Pr{X2 = p}  =  0.102 
 Pr{X2 = b} 
 0.37 
In order to describe how the condition of the copier today affects the condition of the copier tomorrow, we
compute the conditional probabilities Pr{ X2 = j | X1 = i} where i and j can each be g, p or b. For example,
consider the probability the copier is in poor condition tomorrow given that it is in good condition to day.
This is
Pr{ X2 = p | X1 = g} =
Pr{X2 = p and X1 = g}
0.03
=
= 0.06
Pr{X1 = g}
0.5
If we didn't know the condition of the copier today, then the probability that the copier would be in poor
condition tomorrow would be 0.102. So knowing that the copier is in good condition today decreased the
probability that it would be in poor condition tomorrow. In particular, the copier being in good condition
today and poor condition tomorrow are not independent.
Altogether there are nine conditional probabilities of the form Pr{ X2 = j | X1 = i} where i and j can each be
g, p or b. These are
(4)
q =
2 = g | X1 = g}
 Pr{X
 Pr{X2 = g | X1 = p}
 Pr{X2 = g | X1 = b}
Pr{X2 = p | X1 = g}
Pr{X2 = p | X1 = p}
Pr{X2 = p | X1 = b}
Pr{X2 = b | X1 = g}

Pr{X2 = b | X1 = p} 
Pr{X2 = b | X1 = b} 
=
 0.8
 0
 0.32
0.06
0.4
0.08
0.14

0.6 
0.6 
If we compare each column with the row vector
( Pr{X2 = g} , Pr{X2 = p} , Pr{X2 = b} ) = ( 0.528 , 0.102 , 0.37 )
we see that in each case, knowledge of the condition of the copier today changes the probability of the
condition of the copier tomorrow. In particular, for each i and j the events {X1 = i} and {X2 = j} are not
independent.
1.3 - 4
1.3.4 Independent random variables.
Example 4. Consider Example 2 in section 1.2.1 where we roll a die twice. Let's show that that rolling a 3
on the second roll is independent of rolling a 5 on the first roll, i.e. the fact that we know that a 5 turned up
on the first roll doesn't change the probability that a 3 will come up on the second roll. Let {X1 = 5} denote
the event that the result of the first roll is a 5 and {X2 = 3} be the event that the result of the second roll is a
3 and {X1 = 5, X2 = 3} be the event that the first roll is a 5 and the second roll is a 3. One has
{X2 = 3} = { (1,3), (2,3), (3,3), (4,3), (5,3), (6,3) }
{X1 = 5} = { (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) }
{X1 = 5, X2 = 3} = { (5,3) }.
Therefore, Pr{X2 = 3} = 1/6, Pr{X1 = 5} = 1/6, Pr{X1 = 5, X2 = 3} = 1/36 = Pr{X1 = 5} Pr{X2 = 3}.
This shows rolling a 3 on the second roll is independent of rolling a 5 on the first roll.
The same argument shows that the probability of rolling any given number on the second roll is independent
of rolling any other given number on the first roll, i.e.
(5)
Pr{X1 = i, X2 = j} = Pr{X1 = i} Pr{X2 = j}.
This leads us to the notion of independent random variables. Let X and Y be two random variables. X and
Y are independent if knowledge of the values of one of them doesn't influence the probability that the other
assumes various values, i.e.
Pr{X  A, Y  B} = Pr{X  A} Pr{Y  B}
(6)
for any two sets A and B. If X and Y are discrete random variables with probability mass functions fx and gy
respectively, then this is equivalent to
(7)
Pr{X = x, Y = y} = Pr{X = x} Pr{Y = y} = fxgy
for any x and y.
Example 5. Consider Example 2 of section 1.2.1 where we roll a die twice and X1 is the outcome of the
first roll and X2 is the outcome of the second roll. Then X1 and X2 are independent.
Problem 2. If X1 and X2 are any two random variables, are X1 and T = X1 + X2 necessarily dependent?
Several random variables. More generally, if we have a collection of random variables, X1, ..., Xn then
they are independent if knowledge of the values of some of the variables doesn't change the probability that
the others assume various values, i.e.
(8)
Pr{X1  A1, ..., Xn  An} = Pr{X1  A1} ... Pr{Xn  An}
1.3 - 5
for any A1, ..., An. If X1, ..., Xn are all discrete with probability mass functions fi(x) , then this is equivalent
to
Pr{ X1 = x1, ..., Xn = xn } = Pr{X1 = x1} ... Pr{Xn = xn} = f1(x1) ... fn(xn)
(9)
for any x1, ..., xn. Another way of viewing formula (9) is
f(x1 ..., xn) = f1(x1) ... fn(xn)
(10)
where f(x1 ..., xn) is the joint pmf of X1, ..., Xn and f1(x1), …, f(xn) are the individual pmf's of X1, ..., Xn.
Problem 3. Show that the random variables X1, X2 and X3 in Example 4 in section 1.2.2 are
independent.
1.3.5 Conditional probability mass functions
Suppose X and Y are two random variables and x1, x2, ..., xn and y1, y2, ..., ym, respectively, are the values X
and Y assume. In section 1.2.4 we described these random variables by means of the joint pmf
fxiyj = f(xi, yj) = Pr{X = xi, Y = yj} which assigns to each pair (xj, yj) the probability that X assumes the value
xi and Y assumes the value yj. Often a more convenient way of describing two random variables is by means
of a conditional probability mass function. One of these is the conditional probability mass function of Y
given X. This is
(11)
qij = fyj|xi = f(yj | xi) = Pr{Y = yj | X = xi} =
Pr{X = xi, Y = yj}
f(xi, yj)
=
Pr{X = xi}
fX(xi)
where f(xi, yj) is the joint pmf of X and Y and fX(xi) is the individual pmf of X. The conditional pmf of X
given Y naturally forms a matrix; let's call it q where qij is given by (11).
Suppose, as in section 1.2.4, we represent the joint pmf by means of the matrix p where
pij = Pr{X = xi, Y = yj}. Then the individual pmf of X is given by the row sums of p. So the matrix q for the
conditional pmf of Y given X is obtained by dividing each row of p by its row sum.
Example 6. Consider Example 1 where an office copier on any particular day is either in good condition
(g or 1), poor condition (p or 2) or broken (b or 3) and we observe the copier today and tomorrow and
X1 = condition of the copier today and X2 = condition of the copier tomorrow. Then the matrix q in (4)
above in the conditional pmf of X2 given X1.
If we return to the general situation of two random variables X and Y, then we can recover the joint pmf of X
and Y from the conditional probability distribution of Y given X and the individual pmf of X, namely
f(xi, yj) = f(yj | xi) fX(xi)
1.3 - 6
1.3.6 The algebra of conditional probability
There are a number of helpful formulas involving conditional probability. The following additivity
formula for conditional probability is a direct extension of the basic additivity formula for probability.
Proposition 1. (a) Suppose E1, E2, ... is an sequence of disjoint events (i.e. subsets of the sample space S)
and F and G are also an events. Then

Pr{ En | F} =
(12)
 Pr{En | F}
n
n=1
(b) If, in addition, En = S then
n

(13)
Pr{ G | F} =
 Pr{G  En | F}
n=1
Proof. Pr{ En | F} = Pr{  En   F} / Pr{F} = Pr{  [En  F]} / Pr{F} =

n
n

n

 Pr{En  F} / Pr{F}
n=1

=
 Pr{En | F}. The third equality applied the basic additivity formula to the disjoint sequence
n=1
E1  F, E2  F, ... .The proves (a). To prove (b), note that Pr{ G | F} = Pr{ G  S | F} =
Pr{ G  En | F} = Pr{  G  En | F} =
n
n

 Pr{G  En | F}. The last equality used (a). //
n=1
The next conditional intersection formula is a generalization of the following basic intersection formula
Pr{E  F} = Pr{E | F} Pr{F}.
Proposition 2. If E, F and G are events, then
(14)
Pr{E  F | G} = Pr{E | F  G} Pr{F | G}
Proof. Pr{E  F | G} = Pr{E  F  G} / Pr{G} =
[ Pr{E  F  G} / Pr{F  G} ] [ Pr{F  G} / Pr{G} ] = Pr{E | F  G} Pr{F | G}.
1.3 - 7
Download