Uploaded by Elvis Espinoza

Probability Theory

advertisement
Probability Theory
It is part of probability that many improbable things will happen.
-Aristotle
Likelihood, Chance, and Probability
• Up to now, we have focused on data. Maybe a bit on experimental
design but mostly data.
• We are about to take a major pivot away from data and start getting
into the idea of chance and calculating probabilities.
• Consider the sample you gather from a population. What is the
probability that your estimate, based on the collected sample, reflects
the population parameter?
• This question is of upmost importance and, hence, justifies delving
into the field of probability theory.
• In this module, we will be introducing some more terminology and
techniques for calculating theoretical probabilities.
Probability Crash Course Part 1
Probability Crash Course Part 2
Randomness
• Probability is used to quantify uncertainty.
• A word that leads to the discussion of probability is also randomness or the
idea of variability in outcomes (think of variance in data when thinking
about variability in outcomes).
• A probability (or random) experiment (or trial) is any process with a result
determined by chance.
• Every possible result from this probability experiment is called an outcome.
• The sample space, denoted 𝑆, is the set of all outcomes.
• An event is a subset of outcomes from the sample space.
• Identifying Outcomes in a Sample Space or Event 1
• Identifying Outcomes in a Sample Space or Event 2
• A compound event is an event that is defined by combining two or more
events.
• Using a Pattern to List All Outcomes in a Sample Space or Event
• When probability experiments have several stages, a tree diagram
can be used to organize the several stages and their outcomes in a
systematic way.
• Suppose you are watching a soccer game and your friend asks if Team
A will win. Without doing any experiment or anything, you answer
that Team A has about a 75% chance of winning. You arrived at this
value just by your knowledge of soccer, Team A, and their opponent.
In essence, this is an educated guess. We call this subjective
probability.
• Clearly, there is nothing scientific about subjective probability.
However, sometimes, this is all you have. If an expert weighs in with a
subjective probability, sometimes you just have to go with that until
something better comes along.
• Once you have performed an experiment and have actual numbers,
you obtain an empirical probability. This is also called an
experimental probability.
• In experimental probability, if 𝐸 is an event, then 𝑃(𝐸), read “the
probability that 𝐸 occurs,” is calculated as
𝑓
𝑃 𝐸 =
𝑛
where 𝑓 is the frequency of event 𝐸 and 𝑛 is the total number of times
the experiment is performed.
• Suppose the experiment is flipping a fair coin. Suppose you flip the
coin 20 times, that is 20 trials or 𝑛 = 20. There are two outcomes:
heads or tails. Intuitively, you know that the probability of heads and
tails is both 50% (a coin flip). However, try for yourself. Flip a coin 20
times, do you get 10 heads and 10 tails? That is what you would
expect if there is a 50% chance of each, right?
• Almost surely, you will not arrive at that outcome experimentally. For
instance, you could end up with 15 heads and 5 tails. Does that mean
there is something wrong with the coin? Maybe. However, this is
possible even with a fair coin.
• Ok, so the experimental probability of a coin flip in our experiment is
75% for heads and 25% for tails.
• The “true” probability of a coin flip is 50% heads and 50% tails. We can this “true”
probability theoretical probability or classical probability.
• In theoretical probability, if all outcomes are equally likely to occur, 𝑃(𝐸), read “the
probability that 𝐸 occurs,” is given by
𝑛(𝐸)
𝑃 𝐸 =
𝑛(𝑆)
where 𝑛(𝐸) is the number of outcomes in the event and 𝑛(𝑆) is the number of outcomes
in the sample space.
• Using the definition of theoretical probability, the event “heads” would be the only
outcome so 𝑛 𝐸 = 1 and the sample space would contain “heads” and “tails” so
𝑛 𝑆 = 2. Hence, the probability of heads is 50%. Perfect!
• Calculation Using Theoretical Probability 1
• Calculation Using Theoretical Probability 2
• Calculation Using Theoretical Probability 3
• Calculation Using Theoretical Probability 4
• Calculation Using Theoretical Probability 5 (drawing from a deck of 52 cards)
• Summary of the Introduction to Probability 1
• Summary of the Introduction to Probability 2
• Why the discrepancy between the experimental and theoretical
probability? The Law of Large Numbers (LLN) provides the answer!
The LLN states that as the number of trials increase, the closer the
experimental probability will get to the theoretical probability.
• This is a powerful result!!
• This means our experiments (or our sample sizes) must be large
enough to give a meaningful estimate of the theoretical probability
(or of the population).
• We will not delve too far into the weeds about the LLN. There are two
versions: Weak Law of Large Numbers and Strong Law of Large
Numbers.
• As you progress on your statistics journey, expect to hear more about
the LLN.
Properties of Probability
• For any event, 𝐸, 0 ≤ 𝑃(𝐸) ≤ 1. The closer to 1, the more likely the
event will happen and the closer to 0, the more likely the event will
not happen.
• For any sample space, 𝑆, 𝑃 𝑆 = 1.
• For any empty set, ∅, 𝑃 ∅ = 0.
• Note: When the probability of something happening is 0, that means
it cannot happen. When the probability of something happening is 1,
that means it must happen.
• The union of the events 𝐸 and 𝐹, denoted 𝐸 ∪ 𝐹 and read 𝐸 union 𝐹,
is the set of all outcomes that are included in event 𝐸 or event 𝐹 or
both.
• The intersection of the events 𝐸 and 𝐹, denoted 𝐸 ∩ 𝐹 and read 𝐸
intersect 𝐹, is the set of all outcomes that are included in both 𝐸 and
𝐹.
• The complement of an event 𝐸, denoted 𝐸 𝑐 , is the set of all
outcomes in the sample space that are not in 𝐸.
• If you combine the outcomes of events 𝐸 and 𝐸 𝑐 you get 𝑆, the set of
all outcomes. Mathematically, this is
𝐸 + 𝐸 𝑐 = 𝑆.
• Since 𝑃 𝑆 = 1, we get the following:
𝑃 𝑆 = 𝑃 𝐸 + 𝐸 𝑐 = 𝑃 𝐸 + 𝑃 𝐸 𝑐 = 1.
• This result is called the complement rule for probability. Written another way, this is
𝑃 𝐸 = 1 − 𝑃(𝐸 𝑐 ).
• Calculation Using the Complement Rule 1
• Calculation Using the Complement Rule 2
• Calculation Using the Complement Rule 3
• Calculation Using the Complement Rule 4
• This result takes advantage of the addition rule for probability. The rule is as follows: for
two events, 𝐸 and 𝐹, the probability that 𝐸 or 𝐹 occurs is given by the following formula:
𝑃(𝐸 or 𝐹) = 𝑃 𝐸 + 𝑃 𝐹 − 𝑃(𝐸 and 𝐹)
• Using set notation, this becomes:
𝑃(𝐸 ∪ 𝐹) = 𝑃 𝐸 + 𝑃 𝐹 − 𝑃(𝐸 ∩ 𝐹)
• Calculation Using the Addition Rule 1
• Calculation Using the Addition Rule 2
• Calculation Using the Addition Rule 3
• Calculation of Probability Using the Addition Rule
Inclusion-Exclusion Principle
• The “−𝑃(𝐸 and 𝐹)” or “−𝑃(𝐸 ∩ 𝐹)” part of the formula is to ensure there
is no double counting of outcomes that events 𝐸 and 𝐹 have in common.
This counting technique is known as the inclusion-exclusion principle from
combinatorics.
• This can actually be extended to 3 events (in fact, this can be extended to 𝑛
events): 𝐸, 𝐹, and 𝐺. The addition rule here would be as follows:
𝑃 𝐸∪𝐹∪𝐺
=𝑃 𝐸 +𝑃 𝐹 +𝐹 𝐺 −𝑃 𝐸∩𝐹 −𝑃 𝐹∩𝐺 −𝑃 𝐸∩𝐺
+ 𝑃(𝐸 ∩ 𝐹 ∩ 𝐺)
• Note that we subtract the intersections of each pairing of the 3 events but
then add the additional overlap since this correction overcorrects.
• Here is a Venn diagram illustrating the inclusion-exclusion principle in
action for three events: 𝐴, 𝐡, and 𝐢.
Inclusion-Exclusion Principle
• If events 𝐸 and 𝐹 have no outcomes in common, then events 𝐸 and 𝐹 are
said to be mutually exclusive. In this case, the addition rule for probability
of mutually exclusive events is
𝑃 𝐸 π‘œπ‘Ÿ 𝐹 = 𝑃 𝐸 + 𝑃 𝐹
• Using set notation, this becomes:
𝑃 𝐸∪𝐹 =𝑃 𝐸 +𝑃 𝐹
• Mutual exclusivity is also called disjointedness.
• Since events 𝐸 and 𝐸 𝑐 are mutually exclusive
by definition, this is how we
𝑐
arrived at the outcome 𝑃 𝐸 + 𝑃 𝐸 = 1.
• Calculation of Probability of Mutually Exclusive Events
• The addition rule for probability are applicable for single events with
multiple outcomes. For instance, flipping a coin is a single event with 2
outcomes. Rolling a pair of dice is a single event with 36 outcomes (we will
learn how to calculate the number of outcomes later, so do not linger over
where the 36 comes from).
• Summary of Addition Rules for Probability 1
• Summary of Addition Rules for Probability 2
• If you have followed the proliferation of sports gambling, you may
have heard the term odds being used to describe the odds of an
event occurring. There are two types of odds:
• The odds in favor of an event 𝐸 occurring is given by
𝑃(𝐸)
𝑃(𝐸)
=
𝑃(not 𝐸) 𝑃 𝐸 𝑐
• The odds against an event 𝐸 occurring is given by
𝑃(not 𝐸) 𝑃(𝐸 𝑐 )
=
𝑃(𝐸)
𝑃 𝐸
• Calculation Using Complement Rule and Calculating Odds
• What can be used for multistage experiments though? A multistage
experiment is where multiple events are possible over multiple trials.
For instance, if I flip a coin twice, this experiment has two events over
the two trials.
• For these instances, we use multiplication instead of addition. The key
difference here is the use of the word or and and. When considering
a single event or is the interesting phrase cause only a single event
can occur. However, when multiple events can occur, and becomes
the interesting phrase since we want to know what can happen across
multiple trials.
• Suppose we are playing the lottery and the winner is determined
drawing balls with the number 0-100 written on them. If the lottery
requires one to select 6 numbers in the correct order, this experiment
will have 6 events: the first draw, the second draw, etc.
• If the ball drawn is put back in the pool to be potentially drawn again,
this experiment is said to be done with replacement. That means the
first and subsequent draws will be independent.
• If two events are independent, this means one event occurring does
not affect the probability of the other event occurring. If the balls are
drawn with replacement, each draw is done with the same pool.
• Otherwise, if the experiment was done without replacement, each
draw would be done with a pool with one less ball altering the
probability. Hence, each subsequent draw would be dependent on
the previous draw.
• If 𝐸 and 𝐹 are dependent events, calculating the probability that both occur is a
bit different. That is because the first trial influences the probability of the second
and subsequent trials. Hence, we need the concept of conditional probability.
• Conditional probability, denoted 𝑃(𝐹|𝐸) and read “the probability of 𝐹 given 𝐸,”
is the probability of event 𝐹 occurring given that event 𝐸 occurs first.
• The multiplication rule for probability of dependent events is as follows: for two
dependent events, 𝐸 and 𝐹, the probability that 𝐸 and 𝐹 occur is given by the
following formula
𝑃(𝐹 ∩ 𝐸)
𝑃 𝐹𝐸 =
𝑃(𝐸)
or
•
•
•
•
𝑃(𝐸 ∩ 𝐹) = 𝑃 𝐸 βˆ™ 𝑃 𝐹 𝐸 = P(F) βˆ™ 𝑃(𝐸|𝐹).
Calculation Using Conditional Probability 1
Calculation Using Conditional Probability 2
Calculation of Probability for Dependent Events 1
Calculation of Probability for Dependent Events 2
Independence
• Statistical independence is one of, if not the, most important concept
in all of statistical analysis.
• Events are said to independent if knowledge of one event does not
provide information about another. Put another way, the occurrence
of one event does not influence the occurrence of the other event(s).
• For instance, when randomly sampling, you want each draw to be
independent of the next draw. That way, how a sample is drawn will
have no effect on the results.
• Events 𝐸 and 𝐹 are independent if and only if 𝑃 𝐸 𝐹 = 𝑃(𝐸) and
𝑃 𝐹 𝐸 = 𝑃(𝐹).
• The following two formulas are called the multiplication rule for
independent events. These are sometimes called the product rule.
• If two events, 𝐸 and 𝐹, are independent, then
𝑃 𝐸 ∩ 𝐹 = 𝑃(𝐸) ⋅ 𝑃(𝐹).
• If 𝑛 events, 𝐸1 , 𝐸2 , β‹―, 𝐸𝑛 , are independent, then
𝑃 𝐸1 ∩𝐸2 ∩ β‹― ∩𝐸𝑛 = 𝑃(𝐸1 ) ⋅ 𝑃 𝐸2 ⋅ β‹― ⋅ 𝑃(𝐸𝑛 ).
• Example Determining if Two Events are Independent
• Example Using a Tree Diagram and Product Rule
• Example Using the Product Rule
• Example Using Sampling with Replacement and Product Rule
• Real Life Application of the Product Rule
• Example Using the Complement Rule and the Product Rule
Joint, Marginal, and Conditional Probability
Mutual Exclusivity and Independence ARE
NOT THE SAME THING!!!
Complement of Union and Intersection
• Using set operations, we can calculate the following probabilities and
mathematical manipulations thereof:
𝑃 𝐸𝐢 ∩ 𝐹𝐢 = 𝑃 𝐸 ∪ 𝐹
𝐢
=1−𝑃 𝐸∪𝐹
𝑃 𝐸𝐢 ∪ 𝐹𝐢 = 𝑃 𝐸 ∩ 𝐹
𝐢
=1−𝑃 𝐸∩𝐹
Counting (for probability)
• When it comes to multistage experiments or experiments with several trials, it is
paramount that you keep count of the number of outcomes at each stage or trial of the
experiment.
• One method for counting all outcomes in a multistage experiment is using the
Fundamental Counting Principle which states that for a multistage experiment with 𝑛
stages (or trials) where the first stage has π‘˜1 outcomes, the second stage has π‘˜2
outcomes, the third stage has π‘˜3 outcomes, and so forth, the total number of possible
outcomes for the sequence of stages that make up the multistage experiment is
π‘˜1 π‘˜2 π‘˜3 β‹― π‘˜π‘› .
• Calculation Using the Fundamental Counting Principle 1
• Calculation Using the Fundamental Counting Principle 2
• Calculation Using the Fundamental Counting Principle 3
• Calculation Using the Fundamental Counting Principle 4
• Calculation Using the Fundamental Counting Principle (without replacement)
• Calculation of Probability Using the Fundamental Counting Principle
• Summary of Multiplication Rules for Probability 1
• Summary of Multiplication Rules for Probability 2
• A factorial is the product of all positive integers less than or equal to a given
positive integer, 𝑛, given by
𝑛! = 𝑛(𝑛 − 1)(𝑛 − 2) β‹― (2)(1)
where 𝑛 is a positive integer.
• Note that, by definition, 0! = 1.
• A combination is a selection of objects from a group without regard to their
arrangement (or order). For instance, if I hold a raffle and the first 10 people win
the same prize, the order is irrelevant; I just want to know who the 10 people are.
Hence, when order is not important, the following formula is used to calculate the
number of combinations:
𝑛!
𝐢(𝑛, π‘Ÿ) =
.
π‘Ÿ! 𝑛−π‘Ÿ !
• Calculating Probability Using Combinations 1
• Calculating Probability Using Combinations 2
• Calculating Probability Using Combinations and the Fundamental Counting
Principle 1
• Calculating Probability Using Combinations and the Fundamental Counting
Principle 2
• A permutation is a selection of objects from a group where the
arrangement (or order) is specific. For instance, for the raffle example, the
first person drawn gets a better prize than the second person, the order
matters! I need to know the 10 people in the correct order to dole out the
prizes correctly. Hence, when order is important, the following formula is
used to calculate the number of permutations:
𝑃(𝑛, π‘Ÿ) =
𝑛!
.
𝑛−π‘Ÿ !
• In both formulas, π‘Ÿ objects are selected from a group of 𝑛 distinct objects,
so π‘Ÿ and 𝑛 are both positive integers with π‘Ÿ ≤ 𝑛.
• For our raffle example, π‘Ÿ = 10, the number of people drawn for a prize and
𝑛 is the number of people who entered the raffle.
• Example of a Permutation 1
• Example of a Permutation 2
• Example of a Permutation 3
• Calculating Probability Using Permutations
• Consider groups of objects where some of the objects are identical. For
instance, a bucket of 100 coins. You are bound to have multiple pennies,
nickels, dimes, and/or quarters.
• Special (or distinguishable) permutations involve objects that are
identical. The number of distinguishable permutations of 𝑛 objects, of
which π‘˜1 are all alike, π‘˜2 are all alike, and so forth, is given by
𝑛!
π‘˜1 ! π‘˜2 ! β‹― π‘˜π‘ !
Where π‘˜1 + π‘˜2 + β‹― + π‘˜π‘ = 𝑛.
• Example of Special Permutations
• Summary of Combinations and Permutations 1
• Summary of Combinations and Permutations 2
• Summary of Calculating Probabilities Using Counting Techniques,
Combinations, and Permutations
Bayes’ Rule/Law/Theorem
• Bayes’ Rule is a clever way to obtaining a conditional probability given
new information. Essentially, an event occurs which provide
additional information that can be used to revise a previous
probability.
• This previous probability is called a prior probability and the revised
probability is called the posterior probability.
Bayes’ Theorem
• Let 𝐸 be an event and 𝐹1 , 𝐹2 ,β‹―, 𝐹𝑁 be 𝑁 mutually exclusive and
collectively exhaustive events. Then Bayes’ Theorem states,
𝑃(𝐹𝑖 ∩ 𝐸)
𝑃(𝐹𝑖 ∩ 𝐸)
𝑃 𝐹𝑖 𝐸 =
=
𝑃(𝐸)
𝑃 𝐸 ∩ 𝐹1 + 𝑃 𝐸 ∩ 𝐹2 + β‹― + 𝑃 𝐸 ∩ 𝐹𝑁
𝑃(𝐹𝑖 ) βˆ™ 𝑃(𝐸|𝐹𝑖 )
=
𝑃 𝐹1 βˆ™ 𝑃 𝐸 𝐹1 + 𝑃 𝐹2 βˆ™ 𝑃 𝐸 𝐹2 + β‹― + 𝑃(𝐹𝑁 ) βˆ™ 𝑃(𝐸|𝐹𝑁 )
𝑃(𝐹𝑖 ) βˆ™ 𝑃(𝐸|𝐹𝑖 )
= 𝑁
𝑖=1 𝑃(𝐹𝑖 ) βˆ™ 𝑃(𝐸|𝐹𝑖 )
• Example of Bayes' Theorem 1
• Example of Bayes' Theorem 2
• Summary of Bayes' Theorem
Bayes’ Theorem
Bayes’ Theorem Applications
References
Wiley, C.W., Denley, K., & Atchley, E. (2021). Beginning Statistics
(Third Edition). Hawkes Learning.
Hawkes, J.S. (2019). Discovering Statistics and Data (Third Edition).
Hawkes Learning.
Anderson, D.R., Sweeney, D.J., Williams, T.A., Camm, J.D., Cochran,
J.J., Fry, M.J., & Ohlmann, J.W. (2020). Essentials of Statistics for
Business and Economics (Ninth Edition). Cengage.
https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_princip
le
Download