Chapter 04.pptx

advertisement
Chapter 4: Probability
Slide set to accompany "Statistics Using Technology" by Kathryn Kozak (Slides by David H Straayer) is
licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at
http://www.tacomacc.edu/home/dstraayer/published/Statistics/Book/StatisticsUsingTechnology112314b.pdf.
4.1: Empirical Probability
• Experiment: an activity that has specific results
that can occur, but it is unknown which results
will occur.
• Outcomes: the results of an experiment
• Event: a set of certain outcomes of an
experiment that you want to have happen
• Sample Space: collection of all possible outcomes
of the experiment. Usually denoted as SS.
• Event space: the set of outcomes that make up
an event. The symbol is usually a capital letter.
Trials for Die Experiment
n
Number of 6s
Relative Frequency
10
2
0.2
50
6
0.12
100
18
0.18
500
81
0.162
1000
163
0.163
Experimental Probabilities
• 𝑃 𝐴 =
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘–π‘šπ‘’π‘  𝐴 π‘œπ‘π‘π‘’π‘Ÿπ‘ 
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘–π‘šπ‘’π‘  π‘‘β„Žπ‘’ 𝑒π‘₯π‘π‘’π‘Ÿπ‘–π‘šπ‘’π‘›π‘‘ π‘€π‘Žπ‘  π‘Ÿπ‘’π‘π‘’π‘Žπ‘‘π‘’π‘‘
• On the prior slide,
163
1000
= 0.163
Law of large numbers
• as n increases, the relative frequency tends
towards the actual probability value.
Some words
Note: probability, relative frequency,
percentage, and proportion are all different
words for the same concept. Also, probabilities
can be given as percentages, decimals, or
fractions.
Section 4.2: Theoretical Probability
• It is not always feasible to conduct an experiment
over and over again, so it would be better to be
able to find the probabilities without conducting
the experiment. These probabilities are called
Theoretical Probabilities.
• To be able to do theoretical probabilities, there is
an assumption that you need to consider. It is
that all of the outcomes in the sample space need
to be equally likely outcomes. This means that
every outcome of the experiment needs to have
the same chance of happening.
Theoretical Probabilities
If the outcomes of an experiment are equally
likely, then the probability of event A happening
is:
# π‘œπ‘“ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’π‘  𝑖𝑛 𝑒𝑣𝑒𝑛𝑑 π‘ π‘π‘Žπ‘π‘’
𝑃 𝐴 =
# π‘œπ‘“ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’π‘  𝑖𝑛 π‘ π‘Žπ‘šπ‘π‘™π‘’ π‘ π‘π‘Žπ‘π‘’
Flip a pair of coins
•
•
•
•
•
•
•
What is the sample space?
What is the probability of getting exactly one head?
What is the probability of getting at least one head?
What is the probability of getting a head and a tail?
What is the probability of getting a head or a tail?
What is the probability of getting a foot?
What is the probability of each outcome? What is the
sum of these probabilities?
Probability Properties
1. 0 ≤ 𝑃 𝑒𝑣𝑒𝑛𝑑 ≤ 1
2. If the P(event) = 1, then it will happen and is
called the certain event
3. If the P(event) = 0, then it cannot happen and
is called the impossible event
4.
𝑃 π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’ = 1
Pull a card from a 52-card deck
•
•
•
•
•
•
•
•
•
What is the sample space?
What is the probability of getting a Spade?
What is the probability of getting a Jack?
What is the probability of getting an Ace?
What is the probability of not getting an Ace?
What is the probability of getting a Spade and an Ace?
What is the probability of getting a Spade or an Ace?
What is the probability of getting a Jack and an Ace?
What is the probability of getting a Jack or an Ace?
Complementary events
• If A is an event, the complementary event can
be notated: AC, not A, ~A, or something
similar.
• P(A) + P(not A) = 1
• P(not A) = 1 – P(A)
• This is more computationally useful than it
may seem at first.
• Sometimes it’s a lot easier to calculate
P(not A) than P(A) (see: shared birthdays)
Shared Birthdays
• What is the probability that two students in
this class share a birthday?
• There are a lot of ways this can happen!
• But there is only one way of it not happening,
that is if everybody has a different birthday.
• That is a bit beyond where we are now, but
turns out to be a lot easier to calculate.
Two critical distinctions
• Mutual exclusivity
• Independence
• For each distinction, remember a definition,
and a canonical example for both possibilities.
Mutual Exclusivity
• Two events are mutually exclusive if they can’t
happen at the same time.
• Canonical examples: (rolling a pair of dice)
– Exclusive: Rolling a pair and rolling a 7 – there is
no roll of two dice that totals 7 and has the same
number on each die.
– Not Exclusive: Rolling a pair and rolling an 8 – the
roll of 4, 4 totals eight and totals 8 points.
Independence
• Two events are independent if the fact that one
happens does not alter the probability of the second
happening.
• Canonical examples: (Drawing cards from a 52- card
deck. Event A is “Draw a Queen”, and Event B is “Draw
a Queen”)
– Independent: draw a card, note whether it is a Queen or
not, put it back in the deck, re-shuffle, and draw a second
card.
– Not independent: draw two cards out of the deck. The
probability of the second card being a Queen changes
depending on whether the first card was a Queen.
Addition Rules
• If two events A and B are mutually exclusive,
then P(A or B) = P(A) + P(B) and P(A and B) =0
• If two events A and B are not mutually
exclusive, then
P(A or B) = P(A) + P(B) - P(A and B)
Two Dice Sample Space
52 cards
Roll a pair of dice
a)
b)
c)
d)
e)
What is the sample space?
What is the probability of getting a sum of 5?
What is the probability of getting the first die a 2?
What is the probability of getting a sum of 7?
What is the probability of getting a sum of 5 and the first
die a 2?
f) What is the probability of getting a sum of 5 or the first
die a 2?
g) What is the probability of getting a sum of 5 and sum of
7?
h) What is the probability of getting a sum of 5 or sum of 7?
Odds
Section 4.3: Conditional Probability
• Probabilities calculated after information is
given. This is where you want to find the
probability of event A happening after you
know that event B has happened. If you know
that B has happened, then you don’t need to
consider the rest of the sample space. You
only need the outcomes that make up event
B. Event B becomes the new sample space,
which is called the restricted sample space, R.
Restricted sample space
• If you always write a restricted sample space
when doing conditional probabilities and use
this as your sample space, you will have no
trouble with conditional probabilities. The
notation for conditional probabilities is
P(A, given B) = P(A|B). The event following
the vertical line is always the restricted sample
space.
New Information
• One way of looking at this conditional
probability issue is “How does this new
information cause me to revise my estimate of
likelihood (probability)?”
• There is a whole branch of Statistics known as
Bayesian Analysis that deals with this.
• And some genuinely weird history.
Suppose you roll two dice. What is the probability
of getting a sum of 5, given that the first die is a 2?
Solution:
Since you know that the first die is a 2, then this is
your restricted sample space, so
R = {(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)}
Out of this restricted sample space, the way to get a
sum of 5 is {(2,3)}. Thus
P(sum of 5 | the first die is a 2) = 1/6
Probability of a 5?
• When we considered all 36 possible
outcomes, 4 of them (1,4), (2,3), (3,2), (4,1)
had a total of 5. Without any knowledge of
the first die, the probability of getting a 5 is
4/36 = 1/9 about 11.1%
• But, when we know the first die was a 2, it
changes the probability to 1/6 or 16.7%,
roughly
Suppose you roll two dice. What is the probability
of getting a sum of 7, given the first die is a 4?
Solution:
Since you know that the first die is a 4, this is your
restricted sample space, so
R = {(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)}
Out of this restricted sample space, the way to get a
sum of 7 is {(4,3)}.
Thus P(sum of 7 | first die is a 4) = 1/6
Probability of a 7?
• When we looked at all 36 outcomes, and 6 of
them were 7’s, we got the same probability
for getting a 7: 6/36 = 1/6
• This means that knowing that the first die is a
4 did not change the probability that the sum
is a 7. This added knowledge did not help you
in any way. It is as if that information was not
given at all.
Dependent and Independent Events
• In the second case, the events sum of 7 and
first die is a 4 are called independent events.
• In the first case, the events sum of 5 and first
die is a 2 are called dependent events.
• Events A and B are considered independent
events if the fact that one event happens does
not change the probability of the other event
happening.
A and B are independent if
• P(A|B) = P(A)
or
• P(B|A) = P(B)
a) Suppose you roll two dice. Are the events “sum of 7”
and “first die is a 3” independent?
b) Suppose you roll two dice. Are the events “sum of 6”
and “first die is a 4” independent?
c) Suppose you pick a card from a deck. Are the events
“Jack” and “Spade” independent?
d) Suppose you pick a card from a deck. Are the events
“Heart” and “Red” card independent?
e) Suppose you have two children via separate births.
Are the events “the first is a boy” and “the second is a
girl” independent?
f) Suppose you flip a coin 50 times and get a head every
time, what is the probability of getting a head on the
next flip?
Multiplication Rule
• If two events are dependent, then:
P(A and B) = P(A)*P(B|A)
• If two events are independent, then:
P(A and B) = P(A)*P(B)
• Solving: 𝑃 𝐡 𝐴 =
𝑃 (𝐴 π‘Žπ‘›π‘‘ 𝐡)
𝑃(𝐴)
• It is often easier to find a conditional probability
by using the restricted sample space and
counting unless the sample space is large.
Multiplication rule examples
a) Suppose you pick three cards from a deck,
what is the probability that they are all
Queens if the cards are not replaced after
they are picked?
b) Suppose you pick three cards from a deck,
what is the probability that they are all
Queens if the cards are replaced after they
are picked and before the next card is
picked?
Two-Way Table: Leprosy Cases
WHO Region
Americas
Eastern
Mediterranean
Europe
Western
Pacific
Africa
South-East
Asia
Column Total
World Bank Income Group
High
Upper
Lower
Low
Income
Middle
Middle Income
Income
Income
174
36028
615
0
54
6
1883
604
Row
Total
36817
2547
10
26
0
216
0
3689
0
1155
10
5086
0
0
39
0
1986
149896
15928
10236
17953
160132
264
36289
158069
27923
222545
Problems from this table
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
Find the probability that a person with leprosy is from the Americas.
Find the probability that a person with leprosy is from a high-income country.
Find the probability that a person with leprosy is from the Americas and a highincome country.
Find the probability that a person with leprosy is from a high-income country,
given they are from the Americas.
Find the probability that a person with leprosy is from a low-income country.
Find the probability that a person with leprosy is from Africa.
Find the probability that a person with leprosy is from Africa and a low-income
country.
Find the probability that a person with leprosy is from Africa, given they are from
a low-income country.
Are the events that a person with leprosy is from “Africa” and “low-income
country” independent events? Why or why not?
Are the events that a person with leprosy is from “Americas” and “high-income
country” independent events? Why or why not?
Bayes Theorem
Supplement
The Theorem
• Begin with:
P(A & B) = P(A)*P(B|A) this is the “and” rule
• Goal: calculate P(H|E)
this is the new probability of a Hypothesis H,
given new evidence E.
• Begin by re-naming:
P(E & H) = P(E)*P(H|E)
Proof, continued
• But note “&” is symmetric:
P(E & H) = P(H & E)
• P(E)*P(H|E) = P(H)*P(E|H)
• Solve for our “target” P(H|E)
• 𝑃 𝐻𝐸 =
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
𝑃(𝐸)
• Hmm… but what is P(E)?
• P(E) = P(E & H) + P(E & ~H)
• 𝑃 𝐻𝐸 =
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
𝑃 𝐸&𝐻 +𝑃(𝐸 & ~𝐻)
• 𝑃 𝐻𝐸 =
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
𝑃 𝐻 ∗𝑃 𝐸 𝐻 + 𝑃 ~𝐻 ∗𝑃 𝐸 ~𝐻)
• But P(~H) = (1 – P(H) ) complement rule
• So 𝑃(𝐻|𝐸) =
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
𝐻 + 1− 𝑃(𝐻 ∗𝑃 𝐸 ~𝐻)
𝐸
• 𝑃(𝐻|𝐸) can be calculated from three terms
𝑃 𝐻 ∗𝑃
The three terms of Bayes
• P(H) is called the Prior Probability of H, or just the
Prior. Sometimes called the base rate.
This is the best estimate of the probability of the
hypothesis before considering the new evidence
E.
• P(E|H) is the probability of getting the evidence
we got under the assumption that H is true.
• P(E | ~H) is the probability of getting the
evidence we got under the assumption that H is
false. Note that we can use the compliment rule
to say P(E | ~H) = 1 – P(~E | ~H).
Bayes as a tool to avoid Fallacies
• Confirmation bias: only focusing on P(E|H)
(“See, the evidence is consistent with my
theory!”) and ignoring P(E | ~H). (Ignoring
that there could be other explanations for the
evidence, even if the theory is not correct.)
• Base Rate: Forgetting to factor in that base
rate. (“The test is 99% accurate, you have
positive result, hence there is a 99% chance
you have the disease.”)
Apply to medical diagnosis
• Sensitivity: Assume a test is 99% accurate in true
positive rate, which means that if the patient has the
disease, 99% of the time the test will be positive. This
is P(E | H). We could say the false negative rate is 1%
(P(~E | H) = 1%).
• Specificity: Assume that the test is 98% accurate in the
true negative rate, which means that if the patient
does not have the disease, 98% of the time the test will
be negative. This is P(~E | ~H). We could say the false
positive rate is 2% (P(E | ~H) = 2%).
• Base Rate: Assume that 1/1000 of people in the
population have the disease. This is the prior: P(H)
Bayes “Do I have it?”
• 𝑃 𝐻𝐸 =
• Or
•
𝑃 𝐻 ∗𝑃
𝑃 𝐻 ∗𝑃
𝐸𝐻
𝐸
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
𝐻 + 1− 𝑃(𝐻 ∗𝑃 𝐸 ~𝐻)
𝑃 𝐻 ∗𝑃(𝐸|𝐻)
+ 1− 𝑃(𝐻 ∗(1−(𝑃 ~𝐸 ~𝐻))
0.001∗0.99
0.001∗0.99 + 1−0.99 ∗(1−0.98)
• ≈ 4.7%
“Fake population” table approach
History and Bayes
• For a long time, there was a great controversy
between “Bayesians” and “Frequentests”
among statisticians.
• Lately, a consensus seems to be developing
that both approaches are just different ways
of viewing the world, and are in fact fully
compatible.
Summary of logic to probability rules
• NOT: P(not A) = 1 – P(A)
• OR: P(A or B) = P(A) + P(B) (if P(A and B) =0)
P(A or B) = P(A) +P(B) – P(A and B)
• AND: P(A and B) = P(A) * P(B) (if independent)
P(A and B) = P(A) * P(B | A) (if the
probability of B depends
on whether A occurred)
Download