Introduction Chapter 1 Motivation

advertisement
Chapter 1
Introduction
Motivation
In every field of human life there are processes that cannot be described exactly (by an algorithm). For
example, how fast does a web page respond? when does the bus come? how many cars are on the parking
lot at 8.55 am?
By observation of these processes or by experiments we can detect patterns of behavior, such as: “ usually,
the first week of semester the campus network is slow”, “by 8.50 am the parking lot at the Design building
usually is full”.
Our goal is to analyze these patterns further:
1.1
Basic Probability
Real World
Observation, experiment with unknown
outcome
list of all possible outcomes
1.1.1
Mathematical World
Random experiment
sample space Ω (read: Omega)
individual outcome
elementary event A, A ∈ Ω (read: A is an
element of Ω)
a collection of individual outcomes
event A, A ⊂ Ω (read: A is a subset of Ω)
assignment of the likelihood or chance of
a possible outcome
probability of an event A, P (A).
Examples for sample spaces
1. I attempt to sign on to AOL from my home - to do so successfully the local phone number must be
working and AOL’s network must be working.
Ω = { (phone up, network up), (phone up, network down), (phone down, network up), (phone down,
network down) }
2. Online I attempt to access a web page and record the time required to receive and display it (in
seconds).
Ω = (0, ∞) seconds
3. on a network there are two possible routes a message can take to a destination - in order for a message
to get to the recipient, one of the routes and the recipient’s computer must be up.
1
2
CHAPTER 1. INTRODUCTION
Ω1 in tabular form:
route 1
up
up
up
up
down
down
down
down
route 2
up
up
down
down
up
up
down
down
recipient’s computer
up
down
up
down
up
down
up
down
or, alternatively: Ω2 = { successful transmission, no transmission }
Summary 1.1.1
• Sample spaces can be finite, countable infinite or uncountable infinite.
• There is no such thing as THE sample space for a problem. The complexity of Ω can vary, many are
possible for a given problem.
1.1.2
Examples for events
With the same examples as before, we can define events in the following way:
1. A = fail to log on
B = AOL network down
then A is a subset of Ω and can be written as a set of elementary events:
A = { (phone up, network down), (phone down, network up), (phone down, network down)}
Similarly:
B = {(phone up, network down), (phone down, network down)}
2. C = at least 10 s are required, C = [10, ∞).
3. D = message gets through
D with first sample space: D = {(U, U, U ), (U, D, U ), (D, U, U )}
Once we begin to talk about events in terms of sets, we need to know the standard notation and basic rules
for computation:
1.2
Basic Notation of Sets
For the definitions throughout this section assume that A and B are two events.
Definition 1.2.1 (Union)
A ∪ B is the event consisting of all outcomes in A or in B or in
both
read: A or B
1.2. BASIC NOTATION OF SETS
3
Example 1.2.1
2. time required to retrieve and display a particular web page. Let A, B, C and D be events: A =
[100, 200), B = [150, ∞), C = [200, ∞) and D = [50, 75].
Then A ∪ B = [100, ∞) and A ∪ C = [100, ∞) and A ∪ D = [50, 75] ∪ [100, 200]
Definition 1.2.2 (Intersection)
A ∩ B is the event consisting of all outcomes simultaneously in
A and in B.
read: A and B
Example 1.2.2
2. Let A, B, C and D be defined as above.
Then
A∩B
=
[100, 200) ∩ [150, ∞) = [150, 200)
A∩D
=
[100, 200) ∩ [50, 75] = ∅
3. Let A be the event “fail to log on” and B = “network down”.
Then
A ∩ B = {(phone up, network down), (phone down, network down)} = B
B is a subset of A.
Definition 1.2.3 (Empty Set)
∅ is the the set with no outcomes
Definition 1.2.4 (Complement)
Ā is the event consisting of all outcomes not in A.
read: not A
Example 1.2.3
3. message example
Let D be the event that a message gets through.
D̄ = { ( D,D,U), (D,U,D), (U,D,D), (D,D,D) }.
4
CHAPTER 1. INTRODUCTION
Definition 1.2.5 (disjoint sets)
Two events A and B are called mutually exclusive or disjoint, if
their intersection is empty:
A∩B =∅
[
1.3
Kolmogorov’s Axioms
Example:
3. From my experience with the network provider, I can decide that the chance that my next message
gets through is 90 %.
Write: P (D) = 0.9
To be able to work with probabilities properly - to compute with them - one must lay down a set of postulates:
Kolmogorov’s Axioms A system of probabilities ( a probability model) is an assignment of numbers
P (A) to events A ⊂ Ω in such a manner that
the probability of any
event A is
a real number between
0 and 1
the sum of
probabilities of all
events in the
sample space
is 1
(i) 0 ≤ P (A) ≤ 1 for all A
(ii) P (Ω) = 1.
(iii) if A1 , A2 , . . . are (possibly, infinite many) disjoint events (i.e. Ai ∩ Aj = ∅ for all i, j) then
X
P (A1 ∪ A2 ∪ . . .) = P (A1 ) + P (A2 ) + . . . =
P (Ai ).
the probability of a
disjoint union
of events is
equal to the
sum of the
individual
probabilities
i
These are the basic rules of operation of a probability model:
• every valid model must obey these,
• any system that does, is a valid model
Whether or not a particular model is realistic or appropriate for a specific application is another question.
Example 1.3.1
Draw a single card from a standard deck of playing cards
Ω = { red, black }
Two different, equally valid probability models are:
Model 1
P (Ω) = 1
P ( red ) = 0.5
P ( black ) = 0.5
Mathematically, both schemes are equally valid.
Model 2
P (Ω) = 1
P ( red ) = 0.3
P ( black ) = 0.7
1.3. KOLMOGOROV’S AXIOMS
5
Beginning from the axioms of probability one can prove a number of useful theorems about how a probability
model must operate.
We start with the probability of Ω and derive others from that.
Theorem 1.3.1
Let A be an event in Ω, then
P (Ā) = 1 − P (A) for all A ⊂ Ω.
For the proof we need to consider three main facts and piece them together appropriately:
1. We know that P (Ω) = 1 because of axiom (ii)
2. Ω can be written as Ω = A ∪ Ā because of the definition of an event’s complement.
3. A and Ā are disjoint and therefore the probability of their union equals the sum of the individual
probabilities (axiom iii).
All together:
(1)
(2)
(3)
1 = P (Ω) = P (A ∪ Ā) = P (A) + P (Ā).
This yields the statement.
2
Example 1.3.2
3. If I believe that the probability that a message gets through is 0.9, I also must believe that it fails with
probability 0.1
Corollary 1.3.2
The probability of the empty set P (∅) is zero.
For a proof of the above statement we exploit that the empty set is the complement of Ω. Then we can
apply Theorem 1.3.1.
Thm 1.3.1
P (∅) = P (Ω̄)
=
1 − P (Ω) = 1 − 1 = 0.
2
Theorem 1.3.3 (Addition Rule of Probability)
Let A and B be two events of Ω, then:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
To see why this makes sense, think of probability as the area in the Venn diagram: By simply adding P (A)
and P (B), P (A ∩ B) gets counted twice and must be subtracted off to get P (A ∪ B).
Example 1.3.3
6
CHAPTER 1. INTRODUCTION
1. AOL dial-up:
If I judge:
P ( phone up )
=
0.9
P ( network up )
=
0.6
P ( phone up, network up )
=
0.55
then
P ( phone up or network up) = 0.9 + 0.6 − 0.55 = 0.95
diagram:
network
up
down
phone
up down
.55
.05
.35
.05
.90
.10
.60
.40
1
Example 1.3.4 computer access
Out of a group of 100 students, 30 have a laptop, 50 have a computer (desktop or laptop) of their own and
90 have access to a computer lab.
One student is picked at random. What is the probability that he/she
(a) has a computer but not a laptop?
(b) has access to a lab and a computer of his/her own?
(c) does have a laptop or a computer?
To get an overview of the situation, we can first define events and draw a Venn diagram. Define
L = student has laptop
C = student has computer of his/her own
A = student has access to lab
Since L is a subset of C ( we write this as L ⊂ C), L is inside C in the diagram.
L
C
A
Ω
From the Venn diagram we see that the students who have a computer but no laptop are inside C but not
inside L, i.e. in (a) we are looking for C ∩ L̄. Since 30 students out of the total of 50 students in C have a
laptop there are 20 remaining students who have a computer but no laptop. this corresponds to a probability
of 20%: P (C ∩ L̄) = 0.2
In (b) we are looking for the intersection of C and A. We cannot compute this value exactly, but we can
give an upper and a lower limit:
1.3. KOLMOGOROV’S AXIOMS
7
P (A ∩ C) ≤ min P (A), P (C) = 0.5
and since
P (A ∪ C) = P (A) + P (C) − P (A ∩ C) = 0.9 + 0.5 − P (A ∩ C) = 1.4 − P (A ∩ C),
which can not be greater than 1, we know that P (A ∩ C) needs to be at least 0.4.
In short:
0.4 ≤ P (A ∩ C) ≤ 0.5
i.e. between 40 and 50 % of all students have both access to a lab and a computer of his/her own.
The number of students who have a laptop or a computer is just the number of students who have a computer,
since laptops are a subgroup of computers. Therefore
P (C ∪ L) = P (C) = 0.5.
Example 1.3.5
A box contains 4 chips, 1 of them is defective.
A person draws one chip at random.
What is a suitable probability that the person draws the defective chip?
Common sense tells us, that since one out of the four chips is defective, the person has a chance of 25% to
draw the defective chip.
Just for training, we will write this down in terms of probability theory:
One possible sample space Ω is: Ω = {g1 , g2 , g3 , d} (i.e. we distinguish the good chips, which may be a bit
artificial. It will become obvious, why that is a good idea anyway, later on.)
The event to draw the defective chip is then A = {d}.
We can write the probability to draw the defective chip by comparing the sizes of A and Ω:
P (A) =
|{d}|
|A|
=
= 0.25.
|Ω|
|{g1 , g2 , g3 , d}|
Be careful, though! The above method to compute probabilities is only valid in a special case:
Theorem 1.3.4
If all elementary events in a sample space are equally likely (i.e. P ({ωi }) = const for all ω ∈ Ω), the
probability of an event A is given by:
|A|
,
P (A) =
|Ω|
where |A| gives the number of elements in A.
Example 1.3.6 continued
The person now draws two chips. What is the probability that the defective chip is among them?
We need to set up a new sample space containing all possibilities for drawing two chips:
Ω
=
{{g1 , g2 }, {g1 , g3 }, {g1 , d},
{g2 , g3 }, {g2 , d},
{g3 , d}}
8
CHAPTER 1. INTRODUCTION
E
=
“ defective chip is among the two chips drawn” =
= {{g1 , d}, {g2 , d}, {g3 , d}}.
Then
P (E) =
|E|
3
= = 0.5.
|Ω|
6
Finding P (E) involves counting the number of outcomes in E. Counting by hand is sometimes not feasible
if Ω is large.
Therefore, we need some standard counting methods.
1.4
Counting Methods
Each week, millions of people buy tickets for the federal lottery. Each ticket has six different numbers printed
on it, each in the range 1 to 49. How many different lottery tickets are there, and what are the chances of
winning? – Probably you would agree that the chances of winning the lottery are very low – some people
think of it as being equal to the probability of being hit by lightning. In the following section we are going
to look at methods to find out how low the chances of winning the lottery are exactly – we are going to do
that by counting all different possibilities for the result of a lottery drawing.
1.4.1
Two Basic Counting Principles
Summation Principle If a complex action can be performed using k alternative methods and each
method can be performed in n1 , n2 , ..., nk different ways, respectively, the complex action can be performed
in n1 + n2 + ... + nk different ways.
Multiplication Principle If a complex action can be broken down in a series of k components and these
components can be performed in respectively n1 , n2 , . . . , nk ways, then the complex action can be performed
in n1 · n2 · . . . · nk different ways.
Example 1.4.1
Toss a coin first, then toss a die: results in 2 · 6 = 12 possible outcomes of the experiment.
die
coin
H
T
1.4.2
1
2
3
4
5
6
1
2
3
4
5
6
Ordered Samples with Replacement
just to make sure we know what we are talking about, here are the definitions that will explain this section’s
title:
1.4. COUNTING METHODS
9
Definition 1.4.1 (ordered sample)
If r objects are selected from a set of n objects, and if the order of selection is noted, the selected set of r
objects is called an ordered sample.
Definition 1.4.2 (Sampling w/wo replacement)
1.4.4 Sampling with replacement occurs when an object is selected and then replaced before the next object
is selected.
Sampling without replacement occurs when an object is not replaced after it has been selected.
Situation:
Imagine a box with n balls in it numbered from 1 to n.
We are interested in the number of ways to sequentially select k balls from the box
when the same ball can be drawn repeatedly (with replacement).
This is our first application of the multiplication principle: Instead of looking at the complex action, we
break it down into the k single draws. For each draw, we have n different possibilities to draw a ball.
The complex action can therefore be done in n
· . . . · n} = nk different ways.
| · n {z
k times
The sample space Ω can be written as:
Ω
=
{(x1 , x2 , . . . , xk )|xi ∈ {1, . . . , n}}
=
{x1 x2 . . . xk |xi ∈ {1, . . . , n}}
We already know that |Ω| = nk .
Example 1.4.2
(a) How many valid five digit octal numbers (with leading zeros) do exist?
In a valid octal number each digit needs to be between 0 and 7. We therefore have 8 choices for each
digit, yielding 85 different five digit octal numbers.
(b) What is the probability that a randomly chosen five digit number is a valid octal number?
One possible sample space for this experiment would be
Ω = {x1 x2 . . . x5 |xi ∈ {0, . . . , 9}},
yielding |Ω| = 105 .
Since all numbers in Ω are equally likely, we can apply Thm 1.3.4 and get for the sought probability:
P ( “randomly chosen five digit number is a valid octal number” ) =
85
≈ 0.328.
105
Example 1.4.3 Pick 3
Pick 3 is a game played daily at the State Lottery. The rules are as follows:
Choose three digits between 0 and 9 and order them.
To win, the numbers must be drawn in the exact order you’ve chosen.
Clearly, the number of different ways to choose numbers in this way is 10 · 10 · 10 = 1000.
odds (= probability) to win: 1/1000.
10
CHAPTER 1. INTRODUCTION
1.4.3
Ordered Samples without Replacement
Situation:
Same box as before.
We are interested in the number of ways to sequentially draw k balls from the box
when each ball can be drawn only once (without replacement).
Again, we break up the complex action into k single draws and apply the multiplication principle:
Draw
# of Choices
1st
n
2nd
(n − 1)
3rd
(n − 2)
...
...
total choices:
n · (n − 1) · (n − 2) · . . . · (n − k + 1) =
The fraction
n!
(n−k)!
kth
(n − k + 1)
n!
(n − k)!
is important enough to get a name of its own:
Definition 1.4.3 (Permutation number)
P (n, k) := n!/(n − k)! is the number of permutations of n distinct objects taken k at a time.
Example 1.4.4
(a) I only remember that a friend’s (4 digit) telephone number consists of the numbers 3,4, 8 and 9.
How many different numbers does that describe?
That’s the situation, where we take 4 objects out of a set of 4 objects and order them - that is P (4, 4)!.
P (4, 4) =
4!
4!
24
=
=
= 24.
(4 − 4)!
0!
1
(b) In a survey, you are asked to choose from seven items on a pizza your favorite three and rank them.
How many different results will the survey have at most? - P (7, 3).
P (7, 3) =
7!
= 7 · 6 · 5 = 210.
(7 − 3)!
Variation: How many different sets of “top 3” items are there? (i.e. now we do not regard the order
of the favorite three items.)
Think: The value P (7, 3) is the result of a two-step action. First, we choose 3 items out of 7. Secondly,
we order them.
Therefore (multiplication principle!):
P (7, 3)
| {z }
=
X
|{z}
·
P (3, 3)
| {z }
# of ways to choose
# of ways to choose
# of ways to choose
3 from 7 and order them
3 out of 7 items
3 out of 3 and order them
So:
X=
P (7, 3)
7!
7·6·5
=
=
= 35.
P (3, 3)
4!3!
3·2·1
This example leads us directly to the next section:
1.4. COUNTING METHODS
1.4.4
11
Unordered Samples without Replacement
Same box as before.
We are interested in the number of ways to choose k balls (at once) out of a box
with n balls.
As we’ve seen in the last example, this can be done in
P (n, k)
n!
=
P (k, k)
(n − k)!k!
different ways.
Again, this number is interesting enough to get a name:
Definition 1.4.4 (Binomial Coefficient)
For two integer numbers n, k with k ≤ n the Binomial coefficient is defined as
n
n!
:=
(n − k)!k!
k
Read: “out of n choose k” or “k out of n”.
Example 1.4.5 Powerball (without the Powerball)
Pick five (different) numbers out of 49 - the lottery will also draw five numbers.
You’ve won, if at least three of the numbers are right.
(a) What is the probability to have five matching numbers?
Ω, the sample space, is the set of all possible five-number-sets:
Ω = {{x1 , x2 , x3 , x4 , x5 }|xi ∈ {1, . . . , 49}}
|Ω| =
49
5
=
49!
= 1906884.
5!44!
The odds to win a matching five are 1: 1 906 884 - they are about the same as to die from being struck
by lightning.
(b) What is the probability to have exactly three matching numbers?
Answering this question is a bit tricky. But: since the order of the five numbers you’ve chosen doesn’t
matter, we can assume that we picked the three right numbers at first and then picked two wrong
numbers.
Do you see it? That’s again a complex action that we can split up into two simpler actions.
We need to figure out first, how many
ways there are to choose 3 numbers out of the right 5 numbers.
Obviously, this can be done in 53 = 10 ways.
Secondly,
the number of ways to choose the remaining 2 numbers out of the wrong 49-5 = 44 numbers
is 44
2 = 231.
In total, we have 10 · 231 = 2310 possible ways to choose three right numbers, which gives a probability
of 11/90804 ≈ 0.0001.
Note: the probability to have exactly three right numbers was given as
5 49−5
P ( “3 matching numbers” ) =
3
5−3
49
5
We will come across these probabilities quite a few times from now on.
12
CHAPTER 1. INTRODUCTION
(b) What is the probability to win? (i.e to have at least three matching numbers)
In order to have a win, we need to have exactly 3, 4 or 5 matching numbers. We already know
the probabilities for exactly 3 or 5 matching numbers. What remains, is the probability for exactly 4
matching numbers.
If we use the above formula and substitute the 3 by a 4, we get
5 49−5
P ( “4 matching numbers” ) =
4
5−4
49
5
=
5 · 49
≈ 0.000128
49
5
In total the probability to win is:
P ( “win” )
= P ( “3 matching numbers” ) + P ( “4 matches” ) + P ( “5 matches” ) =
1 + 5 · 49 + 231
=
= 477 : 1906884 ≈ 0.00025.
1906884
Please note: In the previous examples we’ve used parentheses ( ), see definition , to indicate that the order
of the elements inside matters. These constructs are called tuples.
If the order of the elements does not matter, we use { } - the usual symbol for sets.
1.5
Conditional Probabilities
Example 1.5.1
A box contains 4 computer chips, two of them are defective.
Obviously, the probability to draw a defective chip in one random draw is 2/4 = 0.5.
We analyze this chip now and find out that it is a good one.
If we draw now, what is the probability to draw a defective chip?
Now, the probability to draw a defective chip has changed to 2/3.
Conclusion: The probability of an event A may change if we know (before we start the experiment for A)
the outcome of another event B.
We need to add another term to our mathematical description of probabilities:
Real World
assessment of “chance” given additional,
partial information
Mathematical World
conditional probability of one event A
given another event B.
write: P (A|B)
Definition 1.5.1 (conditional probability)
The conditional probability of event A given event B is defined as:
P (A|B) :=
P (A ∩ B)
P (B)
if P (B) 6= 0.
Example 1.5.2
A lot of unmarked Pentium III chips in a box is as
Good
Defective
400 mHz
480
20
500
500 mHz
490
10
500
970
30
1000
1.6. INDEPENDENCE OF EVENTS
13
Drawing a chip at random has the following probabilities:
P (D) = 0.03
P (400mHz) = 0.5
P (G) = 0.97
P (500mHz) = 0.5)
check: these two must sum to 1.
check: these two must sum to 1, too.
P (D and 400mHz) = 20/1000 = 0.02
P (D and 500mHz) = 10/1000 = 0.01
Suppose now, that I have the partial information that the chip selected is a 400 mHz chip.
What is now the probability that it is defective?
Using the above formula, we get
P ( chip is D| chip is 400mHz) =
P ( chip is D and chip is 400mHz)
0.02
=
= 0.04.
P ( chip is 400mHz)
0.5
i.e. knowing the speed of the chip influences our probability assignment to whether the chip is defective or
not.
Note: Rewriting the above definition of conditional probability gives:
P (A ∩ B) = P (B) · P (A|B),
(1.1)
i.e. knowing two out of the three probabilities gives us the third for free.
We have seen that the occurrence of an event B may change the probability for an event A. If an event B
does not have any influence on the probability of A we say, that the events A and B are independent:
1.6
Independence of Events
Definition 1.6.1 (Independence of Events)
Two events A and B are called independent, if
P (A ∩ B) = P (A) · P (B)
(Alternate definition: P (A|B) = P (A))
Independence is the mathematical counterpart of the everyday notion of “unrelatedness” of two events.
Example 1.6.1 Safety System at a nuclear reactor
Suppose there are two physically separate safety systems A and B in a nuclear reactor. An “incident” can
occur only when both of them fail in the event of a problem.
Suppose the probabilities for the systems to fail in a problem are:
P (A fails) = 10−4
P (B fails) = 10−8
The probability for an incident is then
P ( incident )
=
P (A and B fail at the same time =
=
P (A fails and B fails)
Using that A and B are independent from each other, we can compute the intersection of the events that
both systems fail as the product of the probabilities for individual failures:
P (A fails and B fails)
A,B independent
=
P (A fails ) · P (B fails)
Therefore the probability for an incident is:
P ( incident ) = P (A fails ) · P (B fails) = 10−4 · 10−8 = 10−12 .
14
CHAPTER 1. INTRODUCTION
Comments The safety system at a nuclear reactor is an example for a “parallel system”
A parallel system consists of k components c1 , . . . , ck , that are arranged as drawn in the diagram 1.1.
C1
C2
1
2
Ck
Figure 1.1: Parallel system with k components.
The system works as long as there is at least one unbroken path between 1 and 2 (= at least one of the
components still works).
Under the assumption that all components work independently from each other, it is fairly easy to compute
the probability that a parallel system will fail:
P ( system fails )
=
P ( all components fail ) =
= P (c1 fails ∩ c2 fails ∩ . . . ck fails)
components are independent
=
= P (c1 fails) · P (c2 fails) · . . . · P (ck fails)
A similar kind of calculation can be done for a “series system”. A series system, again, consists of k
supposedly independent components c1 , . . . , ck arranged as shown in diagram 1.2.
1
C1
C2
2
Ck
Figure 1.2: Series system with k components.
This time, the system only works, if all of its components are working.
Therefore, we can compute the probability that a series system works as:
P ( system works )
=
P ( all components work ) =
= P (c1 works ∩ c2 works ∩ . . . ck works)
components are independent
=
= P (c1 works) · P (c2 works) · . . . · P (ck works)
Please note that based on the above probabilities it is easy to compute the probability that a parallel system
is working and a series system fails, respectively, as:
P ( parallel system works )
P ( series system fails )
T hm1.3.1
=
T hm1.3.1
=
1 − P ( parallel system fails)
1 − P ( parallel system works)
The probability that a system works is sometimes called the system’s reliability. Note that a parallel system
is very reliable, a series system usually is very unreliable.
Warning: independence and disjointness are two very different concepts!
1.7. BAYES’ RULE
15
Disjointness:
Independence:
If A and B are disjoint, their intersection is empty,
has therefore probability 0:
If A and B are independent events, the probability of their intersection can be computed as the
product of their individual probabilities:
P (A ∩ B) = P (∅) = 0.
P (A ∩ B) = P (A) · P (B)
If neither of A or B are empty, the probability for
the intersection will not be 0 either!
The concept of independence between events can be extended to more than two events:
Definition 1.6.2 (Mutual Independence)
A list of events A1 , . . . , An is called mutually independent, if for any subset {i1 , . . . , ik } ⊂ {1, . . . , n} of indices
we have:
P (Ai1 ∩ Ai2 ∩ . . . ∩ Aik ) = P (Ai1 ) · P (Ai2 ) · . . . · P (Aik ).
Note: for more than 3 events pairwise independence does not imply mutual independence.
1.7
Bayes’ Rule
Example 1.7.1 Treasure Hunt
Suppose that there are three closed boxes. The first box contains two gold coins, the second box contains
one gold coin and one silver coin, and the third box contains two silver coins. Suppose that you select one
of the boxes randomly and then select one of the coins from this box.
What is the probability that the coin you selected is golden?
For a problem like this, that consists of a step-wise procedure, it is often useful to draw a tree (a flow chart)
of the choices we can make in each step.
The diagram below shows the tree for the 2 steps of choosing a box first and choosing one of two coins in
that box.
1/3
1/3
1
gold
1/2
gold
1/2
silver
1
silver
B1
B2
1/3
B3
The lines are marked by the probabilities, with which each step is done:
Choosing one box (at random) means, that all boxes are equally likely to be chosen: P (Bi ) = 13 for i = 1, 2, 3.
In the first box are two gold coins: A gold coin in this box is therefore chosen with probability 1.
The second box has one golden and one silver coin. A gold coin is therefore chosen with probability 0.5.
16
CHAPTER 1. INTRODUCTION
How do we piece these information together?
We have two possible paths in the tree, to get a golden coin as a result. Each path corresponds to one event.
E1
=
choose Box 1 and pick one of the two golden coins
E2
=
choose Box 2 and pick the golden coin
We need the probabilities for these two events.
Think: use equation (1.1) to get P (Ei )!
P (E1 )
= P ( choose Box 1 and pick one of the two golden coins) =
= P ( choose Box 1 ) · P ( pick one of the two golden coins |B1 ) =
1
=
· 1.
3
and
P (E2 )
= P ( choose Box 2 and pick one of the two golden coins) =
= P ( choose Box 2 ) · P ( pick one of the two golden coins |B2 ) =
1
1 1
· = .
=
3 2
6
The probability to choose a golden coin is the sum of P (E1 ) and P (E2 ) (since those are the only ways to get
a golden coin, as we’ve seen in the tree diagram).
P ( golden coin ) =
1 1
+ = 0.5.
3 6
There are several things to learn from this example:
1. Instead of trying to tackle the whole problem, we’ve divided it into several smaller pieces, that are
more manageable (Divide and conquer Principle).
2. We identified the smaller parts by looking at the description of the problem with the help of a tree.
And: if you compare the probabilities on the lines of the tree with the probabilities we used to compute
the smaller pieces E1 and E2 , you’ll see that those correspond closely to the branches of the tree.
The probability of E1 is computed as the product of all probabilities on the edges from the root to the
leaf for E1 .
Definition 1.7.1 (cover)
A set of k events B1 , . . . , Bk is called a cover of the sample space Ω, if
(i) the events are pairwise disjoint, i.e.
Bi ∩ Bj = ∅
for all i, j
(ii) the union of the events contains Ω:
k
[
Bi = Ω
i=1
What is a cover, then? – You can think of a cover as several non-overlapping pieces, which in total contain
every possible case of the sample space, like pieces of a jig-saw puzzle e.g.
Compare with diagram 1.3.
1.7. BAYES’ RULE
17
Figure 1.3: B1 , B2 . . . , Bk are a cover of Ω.
The boxes from the last example, B1 , B2 , and B3 , are a cover of the sample space.
this is a formal way for
“Divide and
Conquer”
Theorem 1.7.2 (Total Probability)
If the set B1 , . . . , Bk is a cover of the sample space Ω, we can compute the probability for an event A by
(cf. fig.1.4):
k
X
P (A) =
P (Bi ) · P (A|Bi ).
i=1
Note: Instead of writing P (Bi )·P (A|Bi ) we could have written P (A∩Bi ) - this is the definition of conditional
probability cf. def. 1.5.1.
Figure 1.4: The probability of event A is put together as sum of the probabilities of the smaller pieces (theorem
of total probability).
The challenge in using this Theorem is to identify what set of events to use as cover, i.e. to identify in which
parts to dissect the problem.
Very often, the cover B1 , B2 , . . . , Bk has only two elements, and looks like E, Ē.
Tree Diagram:
B1
P(A| B1)
A
B1
B2
P(A| B2)
A
B2
Bk
P(A| Bk)
A
Bk
The probability of each node in the tree can be calculated by multiplying all probabilities from the root to the event (1st rule of tree
diagrams).
Summing up all the probabilities in the leaves gives P (A) (2nd
rule).
18
CHAPTER 1. INTRODUCTION
Homework: Powerball - with the Powerball
Redo the above analysis under the assumption that besides the five numbers chosen from 1 to 49 you choose
an additional number, again, between 1 and 49 as the Powerball. The Powerball may be a number you’ve
already chosen or a new one.
You’ve won, if at least the Powerball is the right number or, if the Powerball is wrong, at least three out of
the other five numbers must match.
• Show that the events “Powerball is right”, “Powerball is wrong”is a cover of the sample space (for that,
you need to define a sample space).
• Draw a tree diagram for all possible ways to win, given that the Powerball is right or wrong.
• What is the probability to win?
Extra Problem (tricky): Seven Lamps
1
2
A system of seven lamps is given as drawn in the diagram.
7
3
6
4
Each lamp fails (independently) with probability p = 0.1.
The system works as long as not two lamps next to each other fail.
What is the probability that the system works?
5
Example 1.7.2 Forensic Analysis
On a crime site the police found traces of DNA (evidence DNA), which could be identified to belong to the
perpetrator. Now, the search is done by looking for a DNA match.
The probability for ”a man from the street” to have the same DNA as the DNA from the crime site (random
match) is approx. 1: 1 Mio.
For the analysis, whether someone is a DNA match or not, a test is used. The test is not totally reliable,
but if a person is a true DNA match, the test will be positive with a probability of 1. If the person is not a
DNA match, the test will still be positive with a probability of 1:100000.
Assuming, that the police found a man with a positive test result. What is the probability that he actually
is a DNA match?
First, we have to formulate the above text into probability statements.
The probability for a random match is
P ( match ) = 1 : 1 Mio = 10−6 .
Now, the probabilities for a positive test result:
P ( test pos | match ) = 1
P ( test pos | no match ) = 1 : 100000 = 10−5
The probability asked for in the question is, again, a conditional probability. We know already, that the man
has a positive test result. We look for the probability, that he is a match. This translates to P ( match | test pos. ).
First, we use the definition of conditional probability to re-write this probability:
P ( match | test pos. ) =
P ( match ∩ test pos. )
P ( test pos. )
This doesn’t seem to help a lot, since we still don’t know a single one of those probabilities. But we do the
same trick once again for the numerator:
P ( match ∩ test pos. ) = P ( test pos. | match ) · P ( match )
1.7. BAYES’ RULE
19
Now, we know both of these probabilities and get
P ( match ∩ test pos. ) = 1 · 10−6 .
The denominator is a bit more tricky. But remember the theorem of total probabilities - we just need a proper
cover to compute this probability.
The way this particular problem is posed, we find a suitable cover in the events match and no match. Using
the theorem of total probability gives us:
P ( test pos. ) = P ( match ) · P ( test pos. | match ) + P ( no match ) · P ( test pos. | no match )
We have got the numbers for all of these probabilities! Plugging them in gives:
P ( test pos. ) = 10−6 · 1 + (1 − 10−6 ) · 10−5 = 1.1 · 10− 5.
In total this gives a probability for the man with the positive test result to be a true match of slightly less
than 10%!
P ( match | test pos. ) = 10−6 · (1.1 · 10− 5.) = 1/11.
Is that result plausible? - If you look at the probability for a false positive test result and compare it with the
overall probability for a true DNA match, you can see, that the test is ten times more likely to give a positive
result than there are true matches.This means that, if 10 Mio people are tested, we would expect 10 people
to have a true DNA match. On the other hand, the test will yield additional 100 false positive results, which
gives us a total of 110 people with positive test results.
This, by the way, is not a property limited to DNA tests - it’s a property of every test, where the overall
percentage of positives is fairly small, like e.g. tuberculosis tests, HIV tests or - in Europe - tests for mad
cow disease.
Theorem 1.7.3 (Bayes’ Rule)
If B1 , B2 , . . . , Bk is a cover of the sample space Ω,
P (Bj |A) =
P (A | Bj ) · P (Bj )
P (Bj ∩ A)
= Pk
P (A)
i=1 P (A | Bi ) · P (Bi )
for all j and ∅ =
6 A ⊂ Ω.
Example 1.7.3
A given lot of chips contains 2% defective chips. Each chip is tested before delivery.
However, the tester is not wholly reliable:
P ( “tester says chip is good” | “chip is good” )
=
0.95
P ( “tester says chip is defective” | “chip is defective” )
=
0.94
If the test device says the chip is defective, what is the probability that the chip actually is defective?
P ( chip is defective
|
{z
}
|
:=Cd
tester says it’s defective ) = P (Cd |Td )
|
{z
}
Bayes’ Rule, use Cd ,C̄d as cover
:=Td
=
=
P (Td |Cd )P (Cd )
=
P (Td |Cd )P (Cd ) + P (Td |C¯d )P (C¯d )
0.94 · 0.02
= 0.28.
0.94 · 0.02 + (1 − P (T¯d |C¯d ) ·0.98
|
{z
}
0.05
=
20
1.8
CHAPTER 1. INTRODUCTION
Bernoulli Experiments
A random experiment with only two outcomes is called a Bernoulli experiment.
Outcomes are e.g. 0,1
“success”, “failure”
“hit”, “miss”
“good”, “defective”
The probabilities for “success” and “failure” are called p and q, respectively.
(Then: p + q = 1)
A compound experiment, consisting of a sequence of n independent repetitions is called a sequence of
Bernoulli experiments.
Example 1.8.1
Transmit binary digits through a communication channel with success = “digit received correctly”.
Toss a coin repeatedly, success = “head”.
Sample spaces
Let Ωi be the sample space of an experiment involving i Bernoulli experiments:
Ω1
= {0, 1}
Ω2
= {(0, 0), (0, 1), (1, 0), (1, 1)} = {
00, 01, 10, 11
|
{z
}
}
all two-digit binary numbers
..
.
Ωn
= {n − digit binary numbers} = {n − tuples of 0s and 1s}
Probability assignment
For Ω1 probabilities are already assigned:
P (0) = q
P (1) = p
For Ω2 :
P (00) = q 2
P (01) = qp
P (10) = pq
Generally, for Ωn :
P (s) = pk q n−k
if s has exactly k 1s and n − k 0s.
P (11) = p2
Example 1.8.2 Very simple dartboard
We will assume, that only those darts count, that actually hit the dartboard.
If a player throws a dart and hits the board at random, the probability to hit
the red zone will be directly proportional to the red area. Since out of the nine
squares in total 8 are gray and only one is red, the probabilities are:
P ( gray ) = 98 .
P ( red ) = 19
A player now throws three darts, one after the other.
What are the possible sequences of red and gray hits, and what are their probabilities?
We have, again, a step-wise setup of the problem, we can therefore draw a tree:
1.8. BERNOULLI EXPERIMENTS
21
r
r
g
r
r
rrg
rgr
g
g
r
r
g
g
sequence
rrr
r
g
g
rgg
grr
grg
ggr
ggg
probability
1
93
8
93
8
93
82
93
8
93
82
93
82
93
83
93
Most of the time, however, we are not interested in the exact sequence in which the darts are thrown - but
in the overall result, how many times a player hits the red area.
This leads us to the notion of a random variable.
Download