1 Chapter 3 – Probability Defn: An experiment is a test or series of tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify reasons for changes in the output response. Defn: A random experiment is one whose outcome cannot be predicted with certainty. Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three replications. In this case, there are two experimental factors – concentration and temperature – with two levels of concentration and five levels of temperature. The (random) response variable is reflectivity. Defn: A set is a collection of elements. Defn: A sample space is the set of all possible outcomes of a random experiment. Example: If our random experiment is to flip a fair coin twice, then the sample space is S = {HH, HT, TH, TT}. Example: If our random experiment is to select a random sample of size 3 from a class of 23 students, then the sample space is the set of all possible collections of 3 students from the class. There are 1771 elements in the sample space, corresponding to the number of ways that I could select three people from the class. If the random experiment has a finite number of possible outcomes, then the sample space will be finite, as in the above two examples. The sample space is said to be discrete if the number of possible outcomes of the random experiment is either finite or countably infinite. Example: I want to count the number of traffic accidents occurring on freeways in Duval County in the course of a year. I cannot say ahead of time what the maximum count might be. Hence my sample space is countably infinite: S = {0, 1, 2, 3, ....}. The sample space is said to be continuous if the number of possible outcomes of the random experiment is uncountably infinite. Example: The random experiment is to measure the lifetime of a randomly selected AAA battery coming off an assembly line. The measurement is in hours, and the set of possible values is all points on an interval of real numbers starting at 0 hours as the left-hand endpoint. There is an uncountably infinite number of possible values for the lifetime (of course, the set of possible recorded values depends on the resolution of the measuring instrument). Defn: Given a set S, another set A is called a subset of S, denoted A S , if every element of A is also an element of S. 2 Defn: An event is a subset of a sample space. Defn: Given a set S, and two subsets, A and B, we say that A = B if A B and B A; i.e., if every element of A is also an element of B and every element of B is also an element of A. Defn: Given a set S, and two sets A S and B S , we define the union of A and B, denoted by A B , to be the set of all elements of S that are either elements of A or elements of B or elements of both A and B. Note: If S is the sample space of a random experiment, then A and B are events, and event that either A or B (or both A and B) happens when we perform the experiment. A B is the Defn: Given a set S, and two sets A S and B S , we define the intersection of A and B, denoted by A B , to be the set of all elements of S that are elements of both A and B. Note: If S is the sample space of a random experiment, then A and B are events, and A B is the event that both A and B happen when we perform the experiment. Defn: The empty set, or null set, , is the set that contains no elements. Note: The null set is a subset of every set. Defn: Two sets A, B are said to be mutually exclusive if A B . Defn: The complement A of a set A S is A x S : x A . Example: Let the random experiment be to flip a fair coin three times. Then the sample space is S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. Let A = {exactly 2 heads occur when we flip a fair coin 3 times}, or A = {HHT, HTH, THH}. B = {at least one tail occurs when we flip a fair coin 3 times}, or B = {HHT, HTH, HTT, THH, THT, TTH, TTT}. C = {the same result occurs for every flip of the coin}, or C = {HHH, TTT}. Then 𝐴 ∪ 𝐵 = 𝐵, 𝐴 ∪ 𝐶 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝑇𝐻𝐻, 𝑇𝑇𝑇}, 𝐴 ∩ 𝐵 = 𝐴, 𝐴 ∩ 𝐶 = 𝜙, 𝐵 ∪ 𝐶 = 𝑆, and 𝐵 ∩ 𝐶 = {𝑇𝑇𝑇}. We see that events A and C are mutually exclusive - they cannot both happen at the same time. We also see that A is a subset of B, and that 𝐶 = 𝐵̅ . A useful way to visualize relationships among events is through the use of Venn diagrams (see pp. 4849). Counting 3 In many situations, the size of the sample space, or of the events of interest, are so large that special tools may be needed to find a) the total number of possible outcomes of the random experiment, and b) the number of ways that a particular even can happen. Theorem 3.1: If sets 𝐴1 , 𝐴2 , … , 𝐴𝑘 contain, respectively, 𝑛1 , 𝑛2 , … , 𝑛𝑘 elements, then the number of ways that we can create another set by choosing one element from each of the k sets is (𝑛1 )(𝑛2 ) … (𝑛𝑘 ). Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three replications. In this case, there are two experimental factors – concentration and temperature – with two levels of concentration and five levels of temperature. The (random) response variable is reflectivity. The number of different possible combinations of sulfone concentration and temperature is (2)(5) = 10. Example: Let the random experiment be to flip a fair coin 10 times. We want to find the size of the sample space. There are two possible outcomes of each flip of the coin. Therefore, the total number of possible outcomes (size of the sample space) is 210 = 1024. Example: p. 51. Defn: For any positive integer n, we define the factorial function by 𝑛! = (1)(2)(3) ⋯ (𝑛 − 1)(𝑛). We also define 0! = 1. Defn: If we have a set of n objects, and we select r (where r ≤ n) of them in a particular order, the particular arrangement of the objects is called a permutation of n objects taken r at a time. Theorem 3.2: The number of permutations of n objects selected from a set of n distinct objects 𝑛! 𝑛𝑃𝑟 = (𝑛)(𝑛 − 1) ⋯ (𝑛 − 𝑟 + 1) = . (𝑛 − 𝑟)! Example: p. 52 We are more often concerned with random sampling, i.e., with selecting a subset from a population without regard to ordering of the subset. Theorem 3.3: The number of ways in which r objects can be selected from a set of n distinct objects is (𝑛)(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑟 + 1) 𝑛! 𝑛 ( )= = . 𝑟 𝑟! 𝑟! (𝑛 − 𝑟)! 4 Example: I want to estimate the average height of students in the class by using a simple random sample of size n = 3. I have a population of size N = 23. The number of possible samples that I could select is 23! 23 ( )= = 1771. 3 3! 20! 1 If I do simple random sampling, then each of these possible samples has an equal chance, 1771 = 0.000564653, of being selected. Probability The concept of probability is fundamental to all of statistics, since statistical inference involves drawing conclusions from incomplete information, implying that there is some degree of uncertainty about the conclusions. Mathematically, a number called the probability of an event should be a measure of our assessment of the likelihood of the occurrence of that event when the random experiment is performed. There are two primary interpretations of probability: 1) Subjective approach: Probability values are assigned based on educated guesses about the relative likelihoods of the different possible outcomes of our random experiment. This approach involves advanced concepts and principles, such as entropy. 2) Relative frequency approach: In this approach to assigning probabilities to events, we look at the long-run proportion of occurrences of particular outcomes, when the random experiment is performed many times. This long-run proportion tells us the approximate probability of occurrence of each outcome. We will consider only the relative frequency approach, since the other approach involves concepts (such as entropy and information) that are beyond the scope of this course. Example: If we flip a coin once, what is the likelihood that the outcome is a head? Why? For a single coin flip, we cannot say with certainty what the outcome will be. However, if we flip a coin 1,000, 000 times, we are fairly sure that approximately one-half of the outcomes will be heads. This approach is based on the Law of Large Numbers, which says, in particular, that the relative frequency of occurrence of a particular outcome of a random experiment approaches a specific limiting number between 0 and 1 if we perform the experiment a very large number of times. The earliest work on probability theory as a mathematical discipline considered situations in which all possible outcomes of a random experiment were equally likely to occur. This classical probability concept may be stated as follows: if there are m equally likely possibilities, of which one must occur 𝑠 and s are regarded as favorable, or as a "success," then the probability of a "success" is given by 𝑚. In many situations encountered in statistics (such as simple random sampling), the concept of equally likely outcomes applies. 5 Example: The random experiment is to flip a fair coin twice. The sample space of the experiment is S = {HH, HT, TH, TT}. Let A = {at least one head occurs} = {HH, HT, TH}. Since the coin is fair, each of the four possible outcomes of the experiment has an equal chance of occurring. Therefore, the probability that event A 3 happens is 𝑃(𝐴) = 4 = 0.75. Example: A second example is my random sampling example. Since there are 1771 possible samples that could be selected, the probability that I will select a particular sample of 3 people from the group 1 of 23 students is 1771 = 0.000564653. Example: p. 56. Sometimes, we cannot use the equally likely outcomes concept, but must find probabilities by some other means. In any case, whenever we assign probabilities to events, there are certain conditions that must be satisfied. These conditions are sometimes considered to define, in a concrete way, the concept of probability. Basic Laws of Probability (Kolmogorov’s Axioms): For a random experiment with sample space S, we have 1) PS 1 , 2) For any A S , P A 0; . 3) If E1, E2, E3, ..., En S are mutually exclusive (i.e., no two of the events can happen at the same time), then PE1 E2 E3 ... En PE1 PE2 PE3 PEn . These axioms must be satisfied by any assignment of probabilities to events. The axioms by themselves, however, do not tell us what the numerical values of those probabilities are. We must also use other information that is specific to the particular random experiment being performed. Theorem 3.5: If the sample space, S, of a random experiment is finite, and if A is any event, then P(A) equals the sum of the probabilities of the individual outcomes comprising A. The validity of this theorem follows directly from Kolmogorov's axioms. Example: From handout. Theorem 3.6: (Generalized Addition Rule) If S is the sample space of a random experiment, and if A and B are any two events (not necessarily mutually exclusive), then 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵). It is easy to verify this result using a Venn diagram. Note: If A and B are mutually exclusive, then the above equation reduces to Kolmogorov's Third Axiom. Example: From handout. 6 Example: p. 62. Theorem 3.7: (Complement Rule) If S is the sample space of a random experiment, and if A is any event, then 𝑃(𝐴̅) = 1 − 𝑃(𝐴). Example: from handout. Example: p. 63. Conditional Probability and Independence Sometimes (often) a random experiment is performed in several successive stages, and at each stage, some information is gained about the ultimate outcome of the experiment. Before beginning the experiment, we may find the sample space, S, of all possible outcomes. After performing the first stage of the experiment, it may be that the information gained allows us to conclude that some of the elements of S are no longer possible as ultimate outcomes of the experiment. We may then want to recalculate the probabilities of events of interest. Defn: If A and B are two events such that P(B) > 0, then the conditional probability that A occurs, P A B given that B occurs, is given by P A | B . P B Note: The definition also says that P A B P A | B P B , provided that P(B) > 0. Example: Suppose that our random experiment consists of rolling a pair of fair dice, one green and one red. Let A = {the sum of numbers showing on top face of dice is 7}; let D = {neither number showing on the top faces is greater than 4}. What is P(A)? What is P(A|D)? Example: p. 68. Defn: Two events A and B are independent if the occurrence or non-occurrence of one does not change the probability that the other occurs. If the events are not independent, then they are called dependent. Note: If two events are mutually exclusive, they cannot be independent. If they are independent, they cannot be mutually exclusive. However, if they are not mutually exclusive, then they may or may not be independent. Theorem 3.9 (Multiplication Rule): If the events A and B are independent, then P(A B) = P(A)P(B). Alternatively, the events are independent if P A | B P A and P B | A P B . Example: Suppose that our random experiment is to flip a fair coin twice. The sample space of the experiment is S = {HH, HT, TH, TT}. Let A = {first flip results in a head} = {HH, HT}, and let B = {second flip results in a head} = {HH, TH}. Thus, the event 𝐴 ∩ 𝐵 = {𝐻𝐻}. 7 Our intuition tells us that the result of the second flip of the coin should be unrelated to the result of the first flip. Since the coin is fair, we may assume that all outcomes of the random experiment are equally likely to occur, so that 2 𝑃(𝐴) = = 0.5, 4 2 𝑃(𝐵) = = 0.5, 4 1 Also, 𝑃(𝐴 ∩ 𝐵) = 4 = 0.25. Thus, P(A B) = P(A)P(B). The multiplication confirms our intuition that the two events are independent. Example: from handout. Example: p. 69, at bottom Example: p. 70, at bottom Important Note: Random sampling guarantees that members of the sample will be independent of each other. Bayes' Theorem In many situations, certain conditional probabilities are relatively simple to calculate, but the conditional probabilities that are of interest to the researcher are more difficult to calculate. Sometimes, we may make use of a result first discovered by a 17th century English clergyman, Thomas Bayes, who also dabbled in mathematics and probability. Theorem 3.10 (The Law of Total Probability): Assume that 𝐵1 , 𝐵2 , … , 𝐵𝑛 are a collection of events satisfying the following conditions: i) 𝐵1 ∪ 𝐵2 ∪ 𝐵3 ∪ … ∪ 𝐵𝑛 = 𝑆, and ii) 𝐵𝑖 ∩ 𝐵𝑗 = 𝜙, for all i ≠ j. In other words, the collection of events partitions the sample space. Let A be any other event. Then 𝑛 𝑃(𝐴) = ∑ 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 ). 𝑖=1 Theorem 3.11 (Bayes' Theorem): If 𝐵1 , 𝐵2 , … , 𝐵𝑛 are a partition of the sample space S, and if A is any other event, then 𝑃(𝐵𝑟 )𝑃(𝐴|𝐵𝑟 ) 𝑃(𝐵𝑟 |𝐴) = 𝑛 , ∑𝑖=1 𝑃(𝐵𝑖 )𝑃(𝐴|𝐵𝑖 ) for r = 1, 2, 3, ..., n. Note that the order of the conditioning is reversed between the left-hand side of the equation and the right-hand side. In the situations in which Bayes' Theorem is useful, the conditional probability that is of interest to the researcher is 𝑃(𝐵𝑟 |𝐴), while the conditional probabilities 𝑃(𝐴|𝐵𝑖 ) are relatively simple to calculate, or are easily known. Example: from handout 8 Example: p. 74, at bottom.