Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Honors Statistics Aug 23-8:26 PM Daily Agenda Aug 23-8:31 PM 1 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 A Skip 4, 12, 16 Write a program to generate random numbers. I've decided to give them free will. Apr 25-10:55 AM Toss 4 times Suppose you toss a fair coin 4 times. Let X = the number of heads you get. First List the Sample Space .... HHHH THHH HHHT THHT HHTH THTH HHTT THTT HTHH TTHH HTHT TTHT HTTH TTTH HTTT TTTT (a) Find the probability distribution of X. (b) Make a histogram of the probability distribution. Describe what you see. frequency 0.5 0.4 0.3 0.2 0.1 Number of heads (c) Find P(X ≤ 3) and interpret the result. P( X ≤ 3) = + + + = 15 16 = 0.9375 The probability that 4 tosses of a coin results in 3 or fewer heads is 0.9375 Nov 28-12:03 AM 2 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Kids and toys In an experiment on the behavior of young children, each subject is placed in an area with five toys. Past experiments have shown that the probability distribution of the number X of toys played with by a randomly selected subject is as follows: (a) Write the event “plays with at most two toys” in terms of X. What is the probability of this event? P(x ≤ 2) = 0.03 + 0.16 + 0.30 = 0.49 (b) Describe the event X > 3 in words. The probability that the young child plays with more than 3 toys. What is its probability? P(X > 3) = 0.17 + 0.11 = 0.28 What is the probability that X ≥ 3? P(X ≥ 3) = 0.28 + 0.23 = 0.51 Nov 28-12:08 AM Kids and toys Refer to Exercise 4. Calculate the mean of the random variable X and interpret this result in context. µx = 0(0.03) + 1(0.16) + 2(0.30) + 3(0.23) + 4(0.17) + 5(0.11) = 2.68 If many, many children participated in this experiment, the mean number of toys that randomly selected children would play with will average 2.68 toys. (The expected number of toys a randomly selected young child will play with is 2.68.) This statement is optional. Nov 28-12:16 AM 3 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Kids and toys Refer to Exercise 4. Calculate and interpret the standard deviation of the random variable X. Show your work. σ2x = (0-2.68)2(0.03) + (1-2.68)2(0.16) + (2-2.68)2(0.30) + (3-2.68)2(0.23) + (4-2.68)2(0.17) + (5-2.68)2(0.11) = 1.7176 σx =√1.7176 = 1.31057 The standard deviation of X is σx = 1.31057 The number of toys a randomly selected young child will play with will typically differ from the mean (2.68) by about 1.31 toys. Nov 28-12:22 AM Benford’s law Faked numbers in tax returns, invoices, or expense account claims often display patterns that aren’t present in legitimate records. Some patterns, like too many round numbers, are obvious and easily avoided by a clever crook. Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model known as Benford’s law.7 Call the first digit of a randomly chosen record X for short. Benford’s law gives this probability model for X (note that a first digit can’t be 0): (a) Show that this is a legitimate probability distribution. all individual probabilities are between 0 and 1 0.301 + 0.176 + 0.125 + 0.097 + 0.079 + 0.067 + 0.058 + 0.051 + 0.046 = 1 (b) Make a histogram of the probability distribution. Describe what you see. See histogram above. The distribution is NOT symmetric. It is skewed to the right. The data should be analyzed using the 5 number summary. (c) Describe the event X ≥ 6 in words. What is P(X ≥ 6)? What is the probability that the first digit in a legitimate legal document is 6 or greater? P(X ≥ 6) = 0.067 + 0.058 + 0.051 + 0.046 = 0.222 (d) Express the event “first digit is at most 5” in terms of X. What is the probability of this event? P(X < 6) = 1 - P(X ≥ 6) = 1 - 0.222 = 0.778 Nov 14-5:53 PM 4 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Benford’s law Refer to Exercise 5. The first digit of a randomly chosen expense account claim follows Benford’s law. Consider the events A = first digit is 7 or greater and B = first digit is odd. (a) What outcomes make up the event A? What is P(A)? P(X ≥ 7) = 0.058 + 0.051 + 0.046 = 0.155 (b) What outcomes make up the event B? What is P(B)? P(X is odd) = 0.301 + 0.125 + 0.079 + 0.058 + 0.046 = 0.609 (c) What outcomes make up the event “A or B”? What is P(A or B)? Why is this probability not equal to P(A) + P(B)? P(X ≥ 7 or X is odd) = 0.609 + 0.155 - (0.058 + 0.046) = 0.66 Both 7 and 9 are included in each event and must their sum must be subtracted because they were counted twice ( the general probability addition rule) Nov 28-12:14 AM Benford’s law and fraud A not-so-clever employee decided to fake his monthly expense report. He believed that the first digits of his expense amounts should be equally likely to be any of the numbers from 1 to 9. In that case, the first digit Y of a randomly selected expense amount would have the probability distribution shown in the histogram. > (a) Explain why the mean of the random variable Y is located at the solid red line in the figure. The mean is the balance point of the distribution. So it is located in the center of a uniform or symmetric distribution histogram in this case at 5. > (b) The first digits of randomly selected expense amounts actually follow Benford’s law (Exercise 5). According to Benford’s law, what’s the expected value of the first digit? Explain how this information could be used to detect a fake expense report. µx = 1(0.301) + 2(0.176) + 3(0.125) + 4(0.097) + 5(0.079) + 6(0.067) + 7(0.058) + 8(0.051) + 9(0.046) = 3.441 To detect a fake expense report, compute the sample mean of the first digits of the numbers on the report. A value closer to 3.441 suggests a truthful report but a value closer to 5 (the more uniform distribution) suggest a false report. > (c) What’s P(Y > 6) in the above distribution? According to Benford’s law, what proportion of first digits in the employee’s expense amounts should be greater than 6? How could this information be used to detect a fake expense report? P(Y > 6) = 0.058 + 0.051 + 0.046 = 0.155 For a uniform distribution the P(Y > 6) = 0.3 To detect a fake expense report, compute the percent of the first digits that are greater than 6 on the report. A value closer to 0.155 suggests a truthful report but a value closer to 0.3 (the more uniform distribution) suggest a false report. Nov 28-12:18 AM 5 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Benford’s law and fraud Refer to Exercise 13. It might also be possible to detect an employee’s fake expense records by looking at the variability in the first digits of those expense amounts. > (a) Calculate the standard deviation σY. This gives us an idea of how much variation we’d expect in the employee’s expense records if he assumed that first digits from 1 to 9 were equally likely. σ2x = (1-5)2(0.10) + (2-5)2(0.10) + (3-5)2(0.10) + (4-5)2(0.10) + (5-5)2(0.10) + (6-5)2(0.10) + (7-5)2(0.10) + (8-5)2(0.10) + (9-5)2(0.10) = 6.66 σx =√6.66 = 2.58 > (b) Now calculate the standard deviation of first digits that follow Benford’s law (Exercise 5). Would using standard deviations be a good way to detect fraud? Explain. σ2x = (1-3.44)2(0.301) + (2-3.44)2(0.176) + (3-3.44)2(0.125) + (4-3.44)2(0.097) + (5-3.44)2(0.079) + (6-3.44)2(0.067) + (7-3.44)2(0.058) + (8-3.44)2(0.051) + (9-3.44)2(0.046) = 6.06052 σx =√6.06052 = 2.42 Because the standard deviations are so close 2.58 and 2.42 it would be difficult to determine fake reports from legitimate reports using the standard deviation. Nov 28-12:22 AM Finish Stuff from Yesterday Apr 25-1:29 PM 6 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 1. Write in words what the meaning of P(X ≥ 3) is. What is this probability? 2. Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event? 3. Sketch a graph of the probability distribution. Describe what you see. 0.5 0.4 0.3 0.2 0.1 4. Find the expected value of the distribution. Interpret this value in context. 5. Find the standard deviation of the distribution. Interpret this value in context. Nov 27-10:56 PM 1. Write in words what the meaning of P(X ≥ 3) is. What is this probability? What is the probability that a randomly selected student in online statistics 101 earned a grade of B or higher? P(X ≥ 3) = 0.42 + 0.26 = 0.68 2. Write the event “the student got a grade worse than C” in terms of values of the random variable X. What is the probability of this event? P(X < 2) = 0.02 + 0.10 = 0.12 frequency of letter grade earned 3. Sketch a graph of the probability distribution. Describe what you see. 0.5 0.4 0.3 0.2 0.1 Value of letter grade This distribution of letter grade probabilities is not symmetric and skewed left. The center is at the median which appears to be about 3. The data spreads from 0 to 4 giving a range of 4. The data should be analyzed using the 5 # summary and there are no mentionable deviations. 4. Find the expected value of the distribution. Interpret this value in context. µx = 0(0.02) + 1(0.10) + 2(0.20) + 3(0.42) + 4(0.26) = 2.8 If many, many STAT 101 students were randomly selected, their GPA's would average 2.8 points. 5. Find the standard deviation of the distribution. Interpret this value in context. σ2x = (0-2.8)2(0.02) + (1-2.8)2(0.10) + (2-2.8)2(0.20) + (3-2.8)2(0.42) + (4-2.8)2(0.26) = 1 σx =√1 = 1 The standard deviation of X is σx = 1 A randomly selected stats 101 grade will will typically differ from the mean (2.8) by about 1 point. Nov 27-10:56 PM 7 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Nov 27-10:38 PM Nov 28-9:29 PM 8 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Area above the x axis Nov 21-11:55 AM CONTINUE ... CONTINUOUS RANDOM VARIABLES AND PROBABILITY Dec 2-3:01 PM 9 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Why does P(X = #) = 0 in the continuous "world" Y P( 0 < x < 1) = (1)(1) = 1.0 Dec 2-7:34 PM The probability distribution for a continuous random variable assigns probabilities to intervals of outcomes rather than to individual outcomes. In fact, all continuous probability models assign probability 0 to every individual outcome. Only intervals of values have positive probability. To see that this is true, consider a specific outcome from the random number generator of the previous example, such as P(Y = 0.7). The probability of this event is the area under the density curve that’s above the point 0.70000…on the horizontal axis.But this vertical line segment has no width, so the area is 0. For that reason, UT I O N CA P(0.3 ≤ Y ≤ 0.7) = P(0.3 ≤ Y < 0.7) = P(0.3 < Y < 0.7) = 0.4 ALL continuous probability models assign probability 0 to every individual outcome. P(x=3) = 0 In many cases, discrete random variables arise from counting something —for instance, the number of siblings that a randomly selected student has. Continuous random variables often arise from measuring something —for instance, the height or time to run a mile for a randomly selected student. Nov 28-9:33 PM 10 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 DISCRETE Apr 27-11:47 AM Y b) P(X #22 Random numbers a) P(Y≤ 0.4) = b) P(Y < 0.4) = c) P(0.1 < Y ≤ 0.15 or 0.77 ≤ Y < 0.88) = d) What important fact about continuous random variables does comparing your answer to a and b illustrate? Nov 21-11:51 AM 11 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Dec 1-2:08 PM Sep 26-6:57 PM 12 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Sep 26-6:58 PM ± Oh normdist program Oh normdist program I will be true I'm blue Oh normdist program Sep 27-1:27 PM 13 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Nov 30-11:23 AM Nov 30-11:23 AM 14 Chapter 6 Section 1 Day 2 Notes.notebook April 28, 2017 Nov 29-3:12 PM Nov 21-8:16 PM 15