Chapter 6 Probability 6.1 Relative frequency and its properties In chapter 3, we introduced frequency distributions for both categorical data and numerical data. Let us recall them first. The frequency for a particular category – the number of times the category appears in the data set. The relative frequency for a particular category the proportion of the time that the category appears in the data set. Relative frequency = Frequency / Total number of observations. Example 6.1 A veterinarian kept track of the type of 40 animals treated in her clinic during a typical day. The following relative frequency distribution summarizes her data Type of animal Cat Dog Horse Others Frequency 15 12 4 9 40 Relative Frequency 0.375 0.300 0.100 0.225 ? It is easy to see that relative frequency has the following properties. Properties of relative frequency 1. 0 relative frequency 1. 2. The cumulative relative frequency of two different categories is the sum of their relative frequencies 3. Relative frequencies of all categories should total 1. 6.2 Probability and its basic properties A probability a number between 0 and 1 that reflects the likelihood of occurrence of some outcome. A probability of 1 corresponds to an outcome that occurs 100% of the time - a certain outcome. A probability of 0 corresponds to an outcome that occurs 0% of the time - an impossible outcome. The probability of an outcome, denoted by P(outcome), is interpreted as the long-run relative frequency of the outcome when the experiment is performed repeatedly under identical conditions. Example 6.2 Suppose that we track tosses of a fair coin. The results for the first twenty tosses might be as follows (H – head, T – tail): 1 H 1 Toss Number Outcome Relative frequency of H 11 T .546 Relative frequency of H Toss Number Outcome Relative frequency of H 2 H 1 3 T .667 4 H .75 12 T .5 13 H .538 14 T .5 5 T .6 6 T .5 7 H .571 8 H .625 9 H .667 10 T .6 15 T .467 16 T .438 17 H .471 18 T .444 19 H .474 20 H .5 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of tosses Figure 6.1 The fluctuation of relative frequency As the number of tosses increases, the relative frequency does not continue to fluctuate wildly, but instead stabilizes and approaches some fixed number, called the limiting value. This limiting value is the true probability. Basic properties of probability 1) 0 P(any outcome) 1, since probability is interpreted as the long-run relative frequency and a relative frequency cannot be less than 0 or greater than 1. 2) (Addition rule) If two outcomes A1, A2 cannot occur simultaneously, then P(A1 or A2) = P(A1)+P(A2). Generally, if any two of outcomes A1, A2, , Ak cannot occur simultaneously, then P(A1 or A2 or or Ak) = P(A1)+P(A2)++P(Ak). 3) (Complement rule) The probability that an outcome A will not occur is equal to 1 minus the probability that the outcome will occur, that is, P(not A) = 1 – P(A). Three ways to find probability a) Determining probabilities analytically. Let S = {all possible outcomes} and E = {some outcomes} an event. Generally, probabilities can be determined analytically, by employing mathematical rules and probability properties. For example, if all outcomes are equally likely to occur, then P(E) = (the number of outcomes in E) / (the number of total outcomes). Example 6.3 1) E – Get a head in a toss of a fair coin. Here S = {head, tail}. Since it is a fair coin, “head” and “tail” are equally likely to occur. Thus, P(E) = 1/2 2) Suppose that a multiple-choice problem has 4 choices with one correct and a student answered the problem by guess. E – The student got the correct answer to the problem. Since the student answered the problem by guess. 4 choices are equally likely to be chosen. Thus, P(E) = ¼ 3) E – Get an ace from a well-shuffled deck of cards. Since it is a well-shuffled deck of cards, each card has the same chance to be chosen. Thus, P(E) = 4/52 = 1/13 b) Estimating probabilities empirically When an analytical approach is impossible or impractical, we can estimate probabilities empirically, that is, by observed long-run proportions. For example, the probability of the event that we get a head in tossing an unfair coin can be estimated by observed proportion of head in 200 tosses. Intuitively, we feel that more times we toss, more accurate this estimate will be. If the proportion is based on 20 tosses rather than 200, we will be more hesitant to use this proportion as an estimate of the corresponding probability. c) Estimating probabilities by simulation When we are unable to determine probabilities analytically and it is impractical to estimate them empirically, we can estimate probabilities by simulation. 1) Design a method that uses a random mechanism (such as a random number generator, the selection of a ball from a box, the toss of a coin, etc.) to represent an observation. Be sure that the important characteristics of the actual process are preserved. 2) Generate an observation using the method in step 1, and determine whether the outcome of interest has occurred. 3) Repeat step 2 a large number of times. 4) Estimate the probability by the relative frequency of the outcome of interest, that is, (the number of observations for which the outcome of interest occurred) / (the total number of observations generated). Example 6.4 Suppose that couples who wanted children were to continue having children until a boy is born. If we assume that each newborn child is equally likely to be a boy or a girl, would this behavior change the proportion of boys in the population? We can simulate sibling groups by tossing a fair coin with “head” representing a boy and “tail” representing a girl. Sibling group 1 Sibling group 2 Sibling group 3 Sibling group 4 TH H H TTTTTH girl, boy boy boy girl, girl, girl, girl, girl, boy Continuing the simulation to obtain a large number of observations suggests that the long-run proportion of boys in the population would still be .5. We can also simulate sibling groups by drawing a ball at random from a box that contains a red ball and a blue ball with replacement, with the blue ball representing a boy and the red ball representing a girl. Generally, we use statistical software packages to carry out simulations. Independence Independent outcomes: Two outcomes are said to be independent if the probability that one outcome occurs is not affected by knowledge of whether the other has occurred. More than two outcomes are said to be independent if knowledge that some of the outcomes have occurred does not change the probabilities that any of the other outcomes occur. Dependent outcomes: If the occurrence of one outcome changes the probability that the other outcome occurs, the outcomes are dependent. Example 6.5 (1) Toss a coin twice. Let A1 get a head in the first toss. A2 get a head in the second toss. Then A1 and A2 are ?. (2) Choose two cards sequentially from a well-shuffled deck of cards. Let B1 get an ace in the first choosing. B2 get an ace in the second choosing Then B1 and B2 are ? since P(B2 if B1 occurs) = ? is different from P(B2 if B1 does not occurs) = ?. The multiplication rule for independent outcomes If two outcomes, A1 and A2, are independent, the probability that both outcomes occur is the product of the individual outcome probabilities, that is, P(A1 and A2 ) = P(A1)P(A2). More generally, if k outcomes, A1, , Ak, are mutually independent, then P(A1 and A2 and and Ak) = P(A1)P(A2) P(Ak) Example 6.6 A student guesses answers to two multiple-choice problems (each with 4 choices). Find (1) P (the student gets two correct answers), (2) P (the student gets two wrong answers), (3) P (the student gets one correct answer), (4) P (the student gets at least one correct answer). Let A1 the student gets the correct answer to the first problem, A1c the student gets the wrong answer to the first problem, A2 the student gets the correct answer to the second problem, A2c the student gets the wrong answer to the second problem. Then (1) P (the student gets two correct answers) = P (A1 and A2) = P (A1) P (A2) = (¼) (¼) = 1/16 (2) P (the student gets two wrong answers) = P ( A1c and A2c ) = P ( A1c ) P ( A2c ) = (3/4) (3/4) = 9/16 (3) P (the student gets one correct answer) = P ((A1 and A2c ) or ( A1c and A2)) = P (A1 and A2c ) + P ( A1c and A2) = P (A1) P ( A2c )+P ( A1c ) P (A2) = (1/4)(3/4)+(3/4)(1/4) = 6/16 = 3/8 (4) P (the student gets at least one correct answer) = 1- P (the student gets two wrong answers) = 1-(9/16) = 7/16 Example 6.7 At the opening of a new hit movie the age and sex of 400 movie-goers was recorded and are given below. Sex \ Age Female Male 0 - < 13 14 10 13 - < 20 72 110 20 - < 30 86 62 Over 30 22 24 1) Estimate the probability of the event A that a movie-goer is female. 2) Estimate the probability of the event B that a movie-goer is younger than 20. 3) Are the two events A and B independent? 4) If two movie-goers are selected independently of each other, estimate the probability that both are female. 1) P(A) = (?+?+?+?)/? = 0.485. 2) P(B) = (?+?+?+?)/? = 0.515. 3) Since P(A and B) = P(a movie-goer is female and is younger than 20) = (?+?)/400 = 0.215 0.2498 = 0.485 0.515 = P(A)P(B), A and B are not independent. 4) Since two movie-goers are selected independently, P(both are female) = ? ? = 0.2352. Making decisions based on probability We can use probability information to make a decision. Example 6.8 (1) When we go to a casino, we may decide to play a game or not based on P(we win the game). (2) We may decide to invest in a stock or not based on P(we earn money from the investment). (3) We may assess the risk of a nuclear power plant by P(reactor core melts).