(A) + P (B)

advertisement
Throwing a pair of dice. There are 36 ways for the dice to fall,
shown in the body of the diagram; all are equally likely.
1
Example. A pair of dice are thrown. What is the chance of
getting a total of 4 spots?
Solution. Look at the figure. There are 3 ways to get a total
of four spots:
The chance is 3 in 36. That is the answer.
2
Sample Space
• Probability theory is used as a model for situations for
which the outcomes occur randomly. Generically, such
situations are called “experiments,” and the set of all
possible outcomes is the sample space corresponding to
an experiment. The sample space is denoted by , and
a generic element of  is denoted by . The following
are some examples.
3
Example
•
A driver passes through a sequence of three
intersections with traffic lights. At each light, the
driver either stops, s, or continues, c. The sample
space is the set of all possible outcomes:
 = {ccc, ccs, css, csc, sss, ssc, scc, scs}
Where csc, for example, denotes the outcome that
the commuter continues through the first light, stops
at the second light, and continues through the third
light.
4
Example

The number of jobs in a print queue of a computer
may be modeled as random. Here the sample
space can be taken as
 = {0, 1, 2, 3 …}
that is, all the nonnegative integers. In practice,
there is probably an upper limit, N, on how large the
print queue can be, so instead the sample space
might be defined as
 = {0, 1, 2, …, N}
5
Example

Earthquakes exhibit very erratic behavior,
which is sometimes modeled as random.
For example, the length of time between
successive earthquakes in a particular region
that are greater in magnitude than a given
threshold may be regarded as an
experiment. Here  is the set of all
nonnegative real numbers.
={t|t≥0}
6
We are often interested in particular subsets of  , which in
probability language are called events. In the first example the
event that the driver stops at the first light is the subset of 
denoted by
A = { sss, ssc, scc, scs }
(Events, or subsets, are usually denoted by italic uppercase
letters.) In the second example, the event that there are fewer
than five jobs in the print queue can be denoted by
A = { 0, 1, 2, 3, 4 }
7
The algebra of set theory carries over directly into
probability theory. The union of two events, A and B, is
the event C that either A occurs or B occurs or both
occur: C = A  B. For example, if A is the event that the
driver stops at the first light (listed above) and if B is the
event that he or she stops at the third light,
B = { sss, scs, ccs, css }
then C is the event that the driver stops at the first light
or stops at the third light and consists of the outcomes
that are in A or B or in both:
C = { sss, ssc, scc, scs, ccs, css }
8
The intersection of two events, C = A  B, is the event that both A and B
occur. If A and B are listed above, then C is the event that the driver stops
at the first light and stops at the third light and thus consists of those
outcomes that are common to both A and B:
C = { sss, scs }
The complement of an event, Ac , is the event that A does not occur and
thus consists of all those elements in the sample space that are not in A.
The complement of the event that the driver stops at the first light is the
event that he or she continues at the first light
Ac = { ccc, ccs, css, csc }
You may recall the empty set is usually denoted by . The empty set is
the set with no elements; it is the event with no outcomes. An event of
probability 0 may or may not be empty. If A is the event that the driver
stops at the first light and C is the event of continuing through all three
lights, C = { ccc }, then A and C have no outcomes in common, and we can
write
AC=
9
In such cases, A and C are said to be disjoint.
Venn diagrams, such as those below, are often a useful
tool for visualizing set operations.
Figure. Venn diagrams of A  B and A  B.
10
The following are some laws of set theory.
Cummutative Laws:
AB=BA
AB=BA
Associative Laws:
(A  B)  C = A  (B  C)
(A  B)  C = A (B  C)
Distributive Laws:
(A  B)  C = (A  C)  (B  C)
(A  B)  C = (A  C)  (B  C)
Of these, the distributive laws are the least intuitive, and you may find
it instructive to illustrate them with Venn diagrams.
11
Probability Measures
Our course is not about the foundations of probability, so we will give this
important topic only slight mention for now. However, the problem is that
we will repeatedly be making statements that have probabilities as their
“backbones.” We will speak of “the probability of disease given underlying
conditions or parameters,” alternatively of “the probability of data as
discrepant or more so with a null hypothesis than those that were
observed.” It is difficult if not impossible to ignore the issue of what is
meant by these statements. In particular, there will always be the question
as to whom or what any results or statements made apply. Fortunately, in
many instances while different notions of probability cannot possibly give
exactly identical answers, the “answers” are nearly the same if each is
applied with suitable care. Before moving on to much too brief definitions,
we note that often the operational justifications of statements about
probability we will make depend upon a frequentistic definition, even though
the intuitive approaches we keep in the backs of our minds are rather
subjective.
12
Frequentistic approach.
For repeated events, probability can be estimated by the “long
run” relative frequency of an event out of a set of many trials. If
an event occurs m times in n trials, then the relative frequency
m/n provides an “unbiased” estimate of the probability of the
event. In the limit, as the number of trials n increases without
bound, the relative frequency converges to the “true” probability
of the event (“Law of Large Numbers”). This interpretation
involving repeated trials is known as the “frequentist” approach to
probability.
13
Non-frequentist subjective approach.
The frequentist approach has a number of disadvantages. First, it
cannot be used to provide probability statements for events that occur
once or only rarely (for example, change in a particular pattern of
weather). Second, the frequentist estimates are based entirely on the
sample and so cannot take into account any priori belief (common or
other sense) about the probability. Think of flipping a coin 25 times and
asking yourself based on the results whether the coin is “fair”. The
subjective probability of an event A can be defined as the price you
would pay for a fair bet on the event divided by the amount you would
win if the event happens. Fair means that neither you nor the
bookmaker would be expected to make any profit. To make a fair bet,
prior information must be taken into account. Well-meaning people
faced with the same data can have very different opinions.
• Laplace’s law of succession.
• The Reverend Thomas Bayes.
14
A probability measure on  is a function P from subsets of  to the real
numbers that satisfies the following axions:
1.
P()=1
2.
If A  , then P(A) ≥ 0.
3.
If A1 and A2 are disjoint, then
P(A1  A2) = P(A1) + P(A2)
Most generally, if A1, A2, …, An, … are mutually disjoint, then
P(

 Ai
1

) =  P (Ai)
i =1
15
The first two axioms are rather obvious. Since  consists of all possible
outcomes P() = 1. The second axiom simply states that a probability
is nonnegative. The third axiom states that if A and B are disjoint – that
is, have no outcomes in common – then P(A  B) = P(A) + P(B), and
also that this property extends to limits. For example, the probability that
the print queue contains either one or three jobs is equal to the
probability that it contains one plus the probability that it contains three.
The following properties of probability measures are consequences of
the axioms.
PROPERTY A.
P (Ac) = 1 – P(A). This property follows since A and Ac are disjoint with
A  Ac =  and thus, by the first and third axioms, P(A) + P(Ac) = 1. In
words, this porperty says that the probability that an event does not
occur equals one minus the probability that it does occur.
16
Property B.
P () = 0. This property follows from Property A since  = c . In words,
this says that the probability that there is no outcome at all is zero.
Property C.
If A  B, then P(A)  P(B). This property follows since B can be
expressed as the union of two disjoint sets:
B = A  (B  Ac )
Then, from the third axiom,
P (B) = P (A) + P ( B  Ac)
And thus
P (A) = P (B) – P (B  Ac)  P (B)
This property states that if B occurs whenever A occurs, then P(A)  P(B).
For example, if whenever it rains (A) it is cloudy (B), then the probability
that it rains is less than or equal to the probability that it is cloudy.
17
Property D.
(Addition Law) P (A  B) = P (A) + P (B) – P (A  B). To see this, we
decompose A  B into three disjoint subsets, as shown in the following
figure.
C = A  Bc
D=AB
E = Ac  B
We then have, from the third axiom,
P (A  B) = P (C) + P (D) + P (D) + P (E)
18
Also, A = C  D, and C and D are disjoint; so P(A) = P(C) + P (D).
Similarly, P(B) + P(D) + P(E). Putting these results together, we see that
P (A) + P (B) = P (C) + P(E) + 2 P (D)
= P (A  B ) + P (D)
or
P (A  B ) = P (A) + P (D) – P (D)
This property is easy to see from the Venn diagram. If P (A) and P (B)
are added together, P (A  B) is counted twice.
19
EXAMPLE.
Suppose that a fair coin is thrown twice. Let A denote the event of
heads on the first toss and B the event of heads on the second toss.
The sample space is
 = { hh, ht, th, tt }
We assume that each elementary outcome in  is equally likely and
has probability ¼ . C = A  B is the event that heads comes up on
the first toss or on the second toss. Clearly, P(C)  P(A) + P(B) = 1.
Rather, since A  B is the event that heads comes up on the first
toss and on the second toss,
P (C) = P (A) + P (B) – P (A  B) = .5 + .5 - .25 = .75
20
Computing Probabilities: Counting Methods
Probabilities are especially easy to compute for finite sample spaces.
Suppose that  = { 1, 2, …, N } and that P ( i ) = pi. To find the
probability of an event A, we simply add the probabilities of the i that
constitute A.
EXAMPLE. Suppose that a fair coin is thrown twice and the sequence
of heads and tails is recorded. The sample space is
 = { hh, ht, th, tt }
As in the previous example, we assume that each outcome in  has
probability .25. Let A denote the event that at least one head is thrown.
The A = { hh, ht, th }, and P(A) = .75.
21
This is a simple example of a fairly common situation. The elements of
 all have equal probability; so if there are N elements in  , each of
them has probability 1/N. If A can occur in any of n mutually exclusive
ways, then P(A) = n/N, or
number of ways A can occur
P(A) = -------------------------------------------------total number of outcomes
Note that this formula holds only if all the outcomes are equally likely.
In Example A, if only the number of heads were recorded, then 
would be { 0, 1, 2}. These outcomes are not equally likely, and P (A) is
not 2/3.
The preceding example is a very simple case. To compute
probabilities for more complex situations, we must develop systematic
ways of counting outcomes.
22
The Multiplication Principle
The following is a statement of the very useful multiplication principle.
MULTIPLICATION PRINCIPLE. If one experiment has m outcomes and
another experiment has n outcomes, then there are mn possible outcomes
for the two experiments.
EXAMPLE. A DNA molecule is a sequence of four types of nucleotides,
denoted by A, G, C, and T. The module can be millions of units long and
can thus encode an enormous amount of information. For example, for a
molecule 1 million units long, there are 4106 different possible sequences.
This is a staggeringly large number having nearly a million digits. An amino
acid is coded for by a sequence of three nucleotides; there are 43 = 64
different codes, but there are only 20 amino acids since some of them can
be coded for in several ways. A protein molecule is composed of as many
as hundreds of amino acid units and thus there are an incredibly large
number of possible proteins. For example, there are 20100 different
sequences of 100 amino acids.
23
EXAMPLE. (Birthday Problem) Suppose that a room contains n
people. What is the probability that at least two of them have a
common birthday?
This is a famous problem with a counterintuitive answer. Assume that
every day of the year is equally likely to be a birthday, disregard leap
years, and denote by A the event that there are at least two people
with a common birthday. As is sometimes the case, it is easier to find
P(Ac) than to find P(A). This is because A can happen in many ways,
whereas Ac is much simpler. There are 365n possible outcomes, and
Ac can happen in 365 x 364 x (365 – n + 1) ways. Thus,
P
(Ac)
365 x 364 x … x (365 –n + 1)
= -------------------------------------------365n
24
The following table exhibits the latter probabilities for various values of n:
n
P(A)
------------
4
.016
16 .284
23 .507
32 .753
40 .891
56 .988
From the table, we see that if there are only 23 people, the probability of at
least one match exceeds .5.
25
The PARADOX OF THE CHEVALIER DE MÉRÉ
In the seventeenth century, French gamblers used to bet on the event
that with 4 rolls of a die, at least one ace would turn up; an ace is
.
In another game, they bet on the event that with 24 rolls of a pair of
dice, at least one double-ace would turn up: a double-ace is a pair of
dice which show
.
The Chevalier de Méré, a French nobleman of the period, thought the
two events were equally likely. He reasoned this way about the first
game:
• In one roll of a die, I have 1/6 of a chance to get an ace.
• So in 4 rolls, I have 4 x 1/6 = 2/3 of a chance to get at least
one ace.
26
His reasoning for the second game was similar:
• In one roll of a pair of dice, I have 1/36 of a chance to get a double-ace.
• So in 24 rolls, I must have 24 x 1/36 = 2/3 of a chance to get at least
one double-ace.
By this argument, both chances were the same, namely 2/3. But experience
showed the first event to be a bit more likely than the second. This
contradiction became known as the Paradox of the Chevalier de Méré.
De Méré asked the philosopher Blaise Pascal about the problem, and Pascal
solved it with the help of his friend, Pierre de Fermat. Fermat was a judge
and a member of parliament, who is remembered today for the mathematical
research he did after hours. Fermat saw that de Méré was adding chances
for events that were not mutually exclusive. In fact, using de Méré’s
argument a little further, it shows the chance of getting an ace in 6 rolls of a
die to be 6/6, or 100%. Something had to be wrong.
27
The question is how to calculate the chances correctly.
Pascal and Fermat solved this problem, with a typically
indirect piece of mathematical reasoning – the kind that
always leaves non-mathematicians feeling a bit cheated.
Of course, a direct attack could easily bog down: with 4 rolls
of a die, there are 64 = 1,296 outcomes to worry about; with
24 rolls of a pair of dice, there are 3624  2.2 x 1037
outcomes.
The conversation between Pascal and Fermat is lost to
history, but here is a reconstruction.
28
Pascal. Let’s look at the first game first.
Fermat. Bon. The chance of winning is hard to compute, so
let’s work out the chance of the opposite event – losing.
Then chance of winning = 100% -- chance of losing.
Pascal. D’accord. The gambler loses when none of the four
rolls shows an ace. But how do you work out the chances?
Fermat. It does look complicated. Let’s start with one roll.
What’s the chance that the first roll doesn’t show an ace?
Pascal. It has to show something from 2 through 6, so the
chance is 5/6.
Fermat. C’est ça. Now, what’s the chance that the first two
rolls don’t show aces?
29
Pascal. We can use the multiplication rule. The chance that
the first roll doesn’t give an ace and the second doesn’t give
an ace equals 5/6 x 5/6 = (5/6)2. After all, the rolls are
independent, n’est-ce pas?
Fermat. What about 3 rolls?
Pascal. It looks like 5/6 x 5/6 x 5/6 = (5/6)3.
Fermat. Oui. Now what about 4 rolls?
Pascal. Must be (5/6)4.
Fermat. Yes, and that’s about 0.482, or 48.2%.
30
Pascal. So there is a 48.2% chance of losing. Now
chance of winning = 100% - chance of losing
= 100% - 48.2% = 51.8%.
Fermat. That settles the first game. The chance of winning is
a little over 50%. Now what about the second?
Pascal. Well, in one roll of a pair of dice, there is 1 chance in
36 of getting a double-ace, and 35 chances in 36 of not getting
a double-ace. By the multiplication rule, in 24 rolls of a pair of
dice the chance of getting no double-aces must be (35/36)24 .
31
Fermat. Eh bien, that’s about 50.9%. So we have the
chance of losing. Now
chance of winning = 100% - chance of losing
= 100% - 50.9% = 49.1%.
Pascal. Yes, and that’s a bit less than 50%. Voilà. That’s
why you win the second game a bit less frequently than the
first. But you have to roll a lot of dice to see the difference.
This example illustrates one strategy for working out
chances: if the chance of an event is hard to find, try to find
the chance of the opposite event; then subtract from 100%.
This is useful when the chance of the opposite event is
easier to compute.
32
Download