Probability Theory It is part of probability that many improbable things will happen. -Aristotle Likelihood, Chance, and Probability • Up to now, we have focused on data. Maybe a bit on experimental design but mostly data. • We are about to take a major pivot away from data and start getting into the idea of chance and calculating probabilities. • Consider the sample you gather from a population. What is the probability that your estimate, based on the collected sample, reflects the population parameter? • This question is of upmost importance and, hence, justifies delving into the field of probability theory. • In this module, we will be introducing some more terminology and techniques for calculating theoretical probabilities. Probability Crash Course Part 1 Probability Crash Course Part 2 Randomness • Probability is used to quantify uncertainty. • A word that leads to the discussion of probability is also randomness or the idea of variability in outcomes (think of variance in data when thinking about variability in outcomes). • A probability (or random) experiment (or trial) is any process with a result determined by chance. • Every possible result from this probability experiment is called an outcome. • The sample space, denoted π, is the set of all outcomes. • An event is a subset of outcomes from the sample space. • Identifying Outcomes in a Sample Space or Event 1 • Identifying Outcomes in a Sample Space or Event 2 • A compound event is an event that is defined by combining two or more events. • Using a Pattern to List All Outcomes in a Sample Space or Event • When probability experiments have several stages, a tree diagram can be used to organize the several stages and their outcomes in a systematic way. • Suppose you are watching a soccer game and your friend asks if Team A will win. Without doing any experiment or anything, you answer that Team A has about a 75% chance of winning. You arrived at this value just by your knowledge of soccer, Team A, and their opponent. In essence, this is an educated guess. We call this subjective probability. • Clearly, there is nothing scientific about subjective probability. However, sometimes, this is all you have. If an expert weighs in with a subjective probability, sometimes you just have to go with that until something better comes along. • Once you have performed an experiment and have actual numbers, you obtain an empirical probability. This is also called an experimental probability. • In experimental probability, if πΈ is an event, then π(πΈ), read “the probability that πΈ occurs,” is calculated as π π πΈ = π where π is the frequency of event πΈ and π is the total number of times the experiment is performed. • Suppose the experiment is flipping a fair coin. Suppose you flip the coin 20 times, that is 20 trials or π = 20. There are two outcomes: heads or tails. Intuitively, you know that the probability of heads and tails is both 50% (a coin flip). However, try for yourself. Flip a coin 20 times, do you get 10 heads and 10 tails? That is what you would expect if there is a 50% chance of each, right? • Almost surely, you will not arrive at that outcome experimentally. For instance, you could end up with 15 heads and 5 tails. Does that mean there is something wrong with the coin? Maybe. However, this is possible even with a fair coin. • Ok, so the experimental probability of a coin flip in our experiment is 75% for heads and 25% for tails. • The “true” probability of a coin flip is 50% heads and 50% tails. We can this “true” probability theoretical probability or classical probability. • In theoretical probability, if all outcomes are equally likely to occur, π(πΈ), read “the probability that πΈ occurs,” is given by π(πΈ) π πΈ = π(π) where π(πΈ) is the number of outcomes in the event and π(π) is the number of outcomes in the sample space. • Using the definition of theoretical probability, the event “heads” would be the only outcome so π πΈ = 1 and the sample space would contain “heads” and “tails” so π π = 2. Hence, the probability of heads is 50%. Perfect! • Calculation Using Theoretical Probability 1 • Calculation Using Theoretical Probability 2 • Calculation Using Theoretical Probability 3 • Calculation Using Theoretical Probability 4 • Calculation Using Theoretical Probability 5 (drawing from a deck of 52 cards) • Summary of the Introduction to Probability 1 • Summary of the Introduction to Probability 2 • Why the discrepancy between the experimental and theoretical probability? The Law of Large Numbers (LLN) provides the answer! The LLN states that as the number of trials increase, the closer the experimental probability will get to the theoretical probability. • This is a powerful result!! • This means our experiments (or our sample sizes) must be large enough to give a meaningful estimate of the theoretical probability (or of the population). • We will not delve too far into the weeds about the LLN. There are two versions: Weak Law of Large Numbers and Strong Law of Large Numbers. • As you progress on your statistics journey, expect to hear more about the LLN. Properties of Probability • For any event, πΈ, 0 ≤ π(πΈ) ≤ 1. The closer to 1, the more likely the event will happen and the closer to 0, the more likely the event will not happen. • For any sample space, π, π π = 1. • For any empty set, ∅, π ∅ = 0. • Note: When the probability of something happening is 0, that means it cannot happen. When the probability of something happening is 1, that means it must happen. • The union of the events πΈ and πΉ, denoted πΈ ∪ πΉ and read πΈ union πΉ, is the set of all outcomes that are included in event πΈ or event πΉ or both. • The intersection of the events πΈ and πΉ, denoted πΈ ∩ πΉ and read πΈ intersect πΉ, is the set of all outcomes that are included in both πΈ and πΉ. • The complement of an event πΈ, denoted πΈ π , is the set of all outcomes in the sample space that are not in πΈ. • If you combine the outcomes of events πΈ and πΈ π you get π, the set of all outcomes. Mathematically, this is πΈ + πΈ π = π. • Since π π = 1, we get the following: π π = π πΈ + πΈ π = π πΈ + π πΈ π = 1. • This result is called the complement rule for probability. Written another way, this is π πΈ = 1 − π(πΈ π ). • Calculation Using the Complement Rule 1 • Calculation Using the Complement Rule 2 • Calculation Using the Complement Rule 3 • Calculation Using the Complement Rule 4 • This result takes advantage of the addition rule for probability. The rule is as follows: for two events, πΈ and πΉ, the probability that πΈ or πΉ occurs is given by the following formula: π(πΈ or πΉ) = π πΈ + π πΉ − π(πΈ and πΉ) • Using set notation, this becomes: π(πΈ ∪ πΉ) = π πΈ + π πΉ − π(πΈ ∩ πΉ) • Calculation Using the Addition Rule 1 • Calculation Using the Addition Rule 2 • Calculation Using the Addition Rule 3 • Calculation of Probability Using the Addition Rule Inclusion-Exclusion Principle • The “−π(πΈ and πΉ)” or “−π(πΈ ∩ πΉ)” part of the formula is to ensure there is no double counting of outcomes that events πΈ and πΉ have in common. This counting technique is known as the inclusion-exclusion principle from combinatorics. • This can actually be extended to 3 events (in fact, this can be extended to π events): πΈ, πΉ, and πΊ. The addition rule here would be as follows: π πΈ∪πΉ∪πΊ =π πΈ +π πΉ +πΉ πΊ −π πΈ∩πΉ −π πΉ∩πΊ −π πΈ∩πΊ + π(πΈ ∩ πΉ ∩ πΊ) • Note that we subtract the intersections of each pairing of the 3 events but then add the additional overlap since this correction overcorrects. • Here is a Venn diagram illustrating the inclusion-exclusion principle in action for three events: π΄, π΅, and πΆ. Inclusion-Exclusion Principle • If events πΈ and πΉ have no outcomes in common, then events πΈ and πΉ are said to be mutually exclusive. In this case, the addition rule for probability of mutually exclusive events is π πΈ ππ πΉ = π πΈ + π πΉ • Using set notation, this becomes: π πΈ∪πΉ =π πΈ +π πΉ • Mutual exclusivity is also called disjointedness. • Since events πΈ and πΈ π are mutually exclusive by definition, this is how we π arrived at the outcome π πΈ + π πΈ = 1. • Calculation of Probability of Mutually Exclusive Events • The addition rule for probability are applicable for single events with multiple outcomes. For instance, flipping a coin is a single event with 2 outcomes. Rolling a pair of dice is a single event with 36 outcomes (we will learn how to calculate the number of outcomes later, so do not linger over where the 36 comes from). • Summary of Addition Rules for Probability 1 • Summary of Addition Rules for Probability 2 • If you have followed the proliferation of sports gambling, you may have heard the term odds being used to describe the odds of an event occurring. There are two types of odds: • The odds in favor of an event πΈ occurring is given by π(πΈ) π(πΈ) = π(not πΈ) π πΈ π • The odds against an event πΈ occurring is given by π(not πΈ) π(πΈ π ) = π(πΈ) π πΈ • Calculation Using Complement Rule and Calculating Odds • What can be used for multistage experiments though? A multistage experiment is where multiple events are possible over multiple trials. For instance, if I flip a coin twice, this experiment has two events over the two trials. • For these instances, we use multiplication instead of addition. The key difference here is the use of the word or and and. When considering a single event or is the interesting phrase cause only a single event can occur. However, when multiple events can occur, and becomes the interesting phrase since we want to know what can happen across multiple trials. • Suppose we are playing the lottery and the winner is determined drawing balls with the number 0-100 written on them. If the lottery requires one to select 6 numbers in the correct order, this experiment will have 6 events: the first draw, the second draw, etc. • If the ball drawn is put back in the pool to be potentially drawn again, this experiment is said to be done with replacement. That means the first and subsequent draws will be independent. • If two events are independent, this means one event occurring does not affect the probability of the other event occurring. If the balls are drawn with replacement, each draw is done with the same pool. • Otherwise, if the experiment was done without replacement, each draw would be done with a pool with one less ball altering the probability. Hence, each subsequent draw would be dependent on the previous draw. • If πΈ and πΉ are dependent events, calculating the probability that both occur is a bit different. That is because the first trial influences the probability of the second and subsequent trials. Hence, we need the concept of conditional probability. • Conditional probability, denoted π(πΉ|πΈ) and read “the probability of πΉ given πΈ,” is the probability of event πΉ occurring given that event πΈ occurs first. • The multiplication rule for probability of dependent events is as follows: for two dependent events, πΈ and πΉ, the probability that πΈ and πΉ occur is given by the following formula π(πΉ ∩ πΈ) π πΉπΈ = π(πΈ) or • • • • π(πΈ ∩ πΉ) = π πΈ β π πΉ πΈ = P(F) β π(πΈ|πΉ). Calculation Using Conditional Probability 1 Calculation Using Conditional Probability 2 Calculation of Probability for Dependent Events 1 Calculation of Probability for Dependent Events 2 Independence • Statistical independence is one of, if not the, most important concept in all of statistical analysis. • Events are said to independent if knowledge of one event does not provide information about another. Put another way, the occurrence of one event does not influence the occurrence of the other event(s). • For instance, when randomly sampling, you want each draw to be independent of the next draw. That way, how a sample is drawn will have no effect on the results. • Events πΈ and πΉ are independent if and only if π πΈ πΉ = π(πΈ) and π πΉ πΈ = π(πΉ). • The following two formulas are called the multiplication rule for independent events. These are sometimes called the product rule. • If two events, πΈ and πΉ, are independent, then π πΈ ∩ πΉ = π(πΈ) ⋅ π(πΉ). • If π events, πΈ1 , πΈ2 , β―, πΈπ , are independent, then π πΈ1 ∩πΈ2 ∩ β― ∩πΈπ = π(πΈ1 ) ⋅ π πΈ2 ⋅ β― ⋅ π(πΈπ ). • Example Determining if Two Events are Independent • Example Using a Tree Diagram and Product Rule • Example Using the Product Rule • Example Using Sampling with Replacement and Product Rule • Real Life Application of the Product Rule • Example Using the Complement Rule and the Product Rule Joint, Marginal, and Conditional Probability Mutual Exclusivity and Independence ARE NOT THE SAME THING!!! Complement of Union and Intersection • Using set operations, we can calculate the following probabilities and mathematical manipulations thereof: π πΈπΆ ∩ πΉπΆ = π πΈ ∪ πΉ πΆ =1−π πΈ∪πΉ π πΈπΆ ∪ πΉπΆ = π πΈ ∩ πΉ πΆ =1−π πΈ∩πΉ Counting (for probability) • When it comes to multistage experiments or experiments with several trials, it is paramount that you keep count of the number of outcomes at each stage or trial of the experiment. • One method for counting all outcomes in a multistage experiment is using the Fundamental Counting Principle which states that for a multistage experiment with π stages (or trials) where the first stage has π1 outcomes, the second stage has π2 outcomes, the third stage has π3 outcomes, and so forth, the total number of possible outcomes for the sequence of stages that make up the multistage experiment is π1 π2 π3 β― ππ . • Calculation Using the Fundamental Counting Principle 1 • Calculation Using the Fundamental Counting Principle 2 • Calculation Using the Fundamental Counting Principle 3 • Calculation Using the Fundamental Counting Principle 4 • Calculation Using the Fundamental Counting Principle (without replacement) • Calculation of Probability Using the Fundamental Counting Principle • Summary of Multiplication Rules for Probability 1 • Summary of Multiplication Rules for Probability 2 • A factorial is the product of all positive integers less than or equal to a given positive integer, π, given by π! = π(π − 1)(π − 2) β― (2)(1) where π is a positive integer. • Note that, by definition, 0! = 1. • A combination is a selection of objects from a group without regard to their arrangement (or order). For instance, if I hold a raffle and the first 10 people win the same prize, the order is irrelevant; I just want to know who the 10 people are. Hence, when order is not important, the following formula is used to calculate the number of combinations: π! πΆ(π, π) = . π! π−π ! • Calculating Probability Using Combinations 1 • Calculating Probability Using Combinations 2 • Calculating Probability Using Combinations and the Fundamental Counting Principle 1 • Calculating Probability Using Combinations and the Fundamental Counting Principle 2 • A permutation is a selection of objects from a group where the arrangement (or order) is specific. For instance, for the raffle example, the first person drawn gets a better prize than the second person, the order matters! I need to know the 10 people in the correct order to dole out the prizes correctly. Hence, when order is important, the following formula is used to calculate the number of permutations: π(π, π) = π! . π−π ! • In both formulas, π objects are selected from a group of π distinct objects, so π and π are both positive integers with π ≤ π. • For our raffle example, π = 10, the number of people drawn for a prize and π is the number of people who entered the raffle. • Example of a Permutation 1 • Example of a Permutation 2 • Example of a Permutation 3 • Calculating Probability Using Permutations • Consider groups of objects where some of the objects are identical. For instance, a bucket of 100 coins. You are bound to have multiple pennies, nickels, dimes, and/or quarters. • Special (or distinguishable) permutations involve objects that are identical. The number of distinguishable permutations of π objects, of which π1 are all alike, π2 are all alike, and so forth, is given by π! π1 ! π2 ! β― ππ ! Where π1 + π2 + β― + ππ = π. • Example of Special Permutations • Summary of Combinations and Permutations 1 • Summary of Combinations and Permutations 2 • Summary of Calculating Probabilities Using Counting Techniques, Combinations, and Permutations Bayes’ Rule/Law/Theorem • Bayes’ Rule is a clever way to obtaining a conditional probability given new information. Essentially, an event occurs which provide additional information that can be used to revise a previous probability. • This previous probability is called a prior probability and the revised probability is called the posterior probability. Bayes’ Theorem • Let πΈ be an event and πΉ1 , πΉ2 ,β―, πΉπ be π mutually exclusive and collectively exhaustive events. Then Bayes’ Theorem states, π(πΉπ ∩ πΈ) π(πΉπ ∩ πΈ) π πΉπ πΈ = = π(πΈ) π πΈ ∩ πΉ1 + π πΈ ∩ πΉ2 + β― + π πΈ ∩ πΉπ π(πΉπ ) β π(πΈ|πΉπ ) = π πΉ1 β π πΈ πΉ1 + π πΉ2 β π πΈ πΉ2 + β― + π(πΉπ ) β π(πΈ|πΉπ ) π(πΉπ ) β π(πΈ|πΉπ ) = π π=1 π(πΉπ ) β π(πΈ|πΉπ ) • Example of Bayes' Theorem 1 • Example of Bayes' Theorem 2 • Summary of Bayes' Theorem Bayes’ Theorem Bayes’ Theorem Applications References ο±Wiley, C.W., Denley, K., & Atchley, E. (2021). Beginning Statistics (Third Edition). Hawkes Learning. ο±Hawkes, J.S. (2019). Discovering Statistics and Data (Third Edition). Hawkes Learning. ο±Anderson, D.R., Sweeney, D.J., Williams, T.A., Camm, J.D., Cochran, J.J., Fry, M.J., & Ohlmann, J.W. (2020). Essentials of Statistics for Business and Economics (Ninth Edition). Cengage. ο±https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_princip le