AP Stats General Random Variable Distribution Review Name_________________________ Per______ 1. Spell Checking software catches “non-word” errors which result in a string of letters that is not a word, such as “het” instead of “the”. When students are asked to write a 250 word essay without spell checking, the number X of nonword errors has the following distribution: Value of X: P(X): 0 0.1 1 0.2 2 0.3 3 0.3 4 0.1 a) Write the event “at least one nonword error” in terms of X. What is the probability of this event? b) Describe the event 𝑋 ≤ 2 in words. What is its probability? What is the probability that 𝑋 < 2? c) Calculate the mean of the random variable X and interpret this result in context. 2. Faked numbers in tax returns, invoices or expense account claims often display patterns that aren’t present in legitimate records. Some patterns, like too many round numbers, are obvious and easily avoided by a clever crook. Others are more subtle. It is a striking fact that the first digits of numbers in legitimate records often follow a model known as Benford’s law. Call the first digit of a randomly chosen record X for short. Benford’s law gives this probability model of X (note that the first digit can’t be 0): Value of X: P(X): a) b) c) d) e) f) 3. 1 0.301 2 0.176 3 0.125 4 0.097 5 0.079 6 0.067 7 0.058 8 0.051 9 0.046 Show that this is a legitimate probability distribution. Make histogram of this probability distribution and describe what you see. Describe the event 𝑋 ≥ 6 in words. What is 𝑃(𝑋 ≥ 6)? Express the event “first digit is at most 5” in terms of X. What is the probability of this event? What is the expected value of the first digit? What is the standard deviation of the first digit? A not-so-clever employee decided to fake his monthly expense report. He believed that the first digits of his expense amounts should be equally likely to be any of the numbers from 1 to 9. a) Create a probability distribution table for this event. b) Make a histogram of this probability distribution and describe what you see. c) Explain why the mean is 5. d) Using the expected value from problem 2, explain how this information could be used to detect a fake expense report. e) Using the “not-so-clever” approach, what is P(X > 6) ? How could this information be used to detect a fake expense report? f) What is the standard deviation of this distribution? Would comparing standard deviations be a good way of detecting a fake? Explain. 4. A study of 12,000 able-bodied male students at the University of Illinois found that their times for the mile run were approximately normal with a mean of 7.11 minutes and a standard deviation of 0.74 minutes. If a student is chosen at random from this group, find the probability that his time is less than 6 minutes. Use all proper notation with conclusion in context. 5. Many chess advocates believe that chess play develops general intelligence, analytical skill and the ability to concentrate. According to such beliefs, improved reading skills should result from efforts to play chess. A study was conducted. All subjects in the study participated in a comprehensive chess program and their reading performance was measured before and after the program. The graphs and numerical summaries below provide the information on the subjects pretest scores, posttest scores and the difference ( post – pre ) between these two scores. a) Did the students have higher reading scores after participating in the chess program? Give appropriate statistical evidence to support your answer. b) If the studn found a statistically significant improvement in the reading scores, could you conclude that playing chess causes an increase in reading skills: Justify your answer. c) What is the equation of the linear regression model relating posttest and pretest scores? Define any variables used. d) Discuss what r2 and the residual plot tell you about this linear regression model. 6. Rotter Partners is planning a major investment. The amount of profit X (in millions of dollars) is uncertain, but an estimate gives the following probability distribution: Profit: Probability: 1 0.1 1.5 0.2 2 0.4 4 0.2 10 0.1 Based on this estimate, what is the mean and standard deviation of the profit? Rotter Partners owes its lender fee of $200,000 plus 10% of the profits X. So the firm actually retains Y = 0.9X – 0.2 from the investment. Find the mean and standard deviation of Y. Show your work. 7. A company’s single-serving cereal boxes advertise 9.63 ounces of cereal. In fact, the amount of cereal X in a randomly selected box follows a Normal distribution with a mean of 9.70 ounces and a standard deviation of 0.03 ounces. a. Let Y = the excess amount of cereal beyond what’s advertised in a randomly selected box, measured in grams (1 ounce = 28.35 grams). Find the mean and standard deviation of Y. b. Find the probability of getting at least 3 grams more cereal than advertised. Show your work. 8. The design of a toaster calls for a 100-ohm resistor and a 250-ohm resistor connected in series so that their resistances add. The resistance of the 100-ohm resistor is normally distributed with a mean of 100 ohms and a d standard deviation of 2.5 ohms, whiles the resistance of the 250-ohm resistor is normally distributed with a mean of 250 ohms and a standard deviation of 2.8 ohms. a. Describe the distribution of the total resistance. b. What is the probability that the total resistance lies between 345 and 355 ohms? Show your work. 9. The amount a life insurance company earns on a 5-year term life policy is labeled X. Calculations reveal that µX = $303.35 and σX = $9707.57. The risk of insuring one person’s life is reduced if more persons are insured. a. Suppose that two 21-year-old males are insured, and their ages at death are independent. If X1 and X2 are the insurer’s income from the two insurance policies, the insurer’s average income W can be expressed as: 𝑊= 𝑋1 + 𝑋2 = 0.5𝑋1 + 0.5𝑋2 2 Find the mean and standard deviation of W. b. If four men are insured and the amount of income earned on each policy is independent, find the mean and standard deviation of V, the average income of the four policies. Show your work. Why is the risk of insurance reduced with more people being insured? 10. The Transportation Security Administration (TSA) is responsible for airport safety. On some flights, TSA officers randomly select passengers for an extra security check before boarding. One such flight had 76 passengers – 12 in first class and 64 in coach. Some passengers were annoyed that the 7 passengers chosen were all from coach. What is the probability that all passengers chosen for the random check were from coach? 11. In baseball, a 0.300 hitter gets a hit in 30% of times at bat. When a baseball player hits 0.300, fans tend to be impressed. Typical major leaguers bat about 500 times a season and hit about 0.260. A hitters successive tries seem to be independent. Could a typical major leaguer hit 0.300 just by chance? Compute an appropriate probability to support your answer. 12. Ed and Adelaide attend the same high school, but are in different math classes. The time “E” that it takes Ed to do his math homework follows a normal distribution with a mean of 25 minutes and a standard deviation of 5 minutes. Adelaide’s math homework time “A” follows a normal distribution with a mean of 50 minutes and a standard deviation of 10 minutes. a. Describe the distribution of the difference in the amount of time each student spent on their assignments. (D = A – E) b. Find the probability that Ed spent longer on his assignment than Adelaide did on hers. Show your work. 13. Which whether the following is geometric, binomial or neither: a. The number of 6s I get if I roll a die 10 times b. The number of times I have to roll a die to get two 6s c. The number of cards I deal from a deck of 52 cards until I get a heart d. The number of digits I read in a randomly selected row of the random number table until I find a 7 e. The number of 7s in a row of 40 random digits 14. The weight of tomatoes chosen at random from a bin at the farmer’s market is a random variable with a mean of 10 ounces and a standard deviation of 1 ounces. Suppose we pick four tomatoes at random from the bin a. Find their total weight “T”. b. Find the standard deviation of the four tomatoes 15. According to the Census Bureau, 13% of American adults are Hispanic. An opinion poll plans to contact a SRS of 1200 adults. a. What is the mean number of Hispanics in such a sample? What is the standard deviation? b. Should we be suspicious if the sample selected for the poll contains 15% Hispanic people? Compute an appropriate probability to support your answer. Answers: 1 a) The event {𝑋 ≥ 1} 𝑜𝑟 {𝑋 > 0}. P = 0.9 b) No more than two nonword errors. 𝑃(𝑋 ≤ 2) = 0.6 𝑃(𝑋 < 2) = 0.3 c) mean is 2.1. On average, undergraduates make 2.1 nonword errors per 250-word essay. 2 a) All probabilities are between 0 and 1 and they add up to 1 b) This is a right skewed distribution with a mode of 1. c) The first digit in a readomly chosen record is a 6 or higher. p = 0.222 d) The event {𝑋 ≤ 5}. p = 0.778 e) µ = 3.441 x f) σ = 2.4618 3 a) Value of X: P(X): 1 0.1 2 0.1 3 0.1 4 0.1 5 0.1 6 0.1 7 0.1 8 0.1 9 0.1 b) The distribution would have a uniform, symmetric shape. c) Since the distribution is uniform and symmetric, the mean would be exactly in the middle which is 5 d) To detect a fake, compute the sample mean of the first digits and see if it is near 3.441 or near 5. e) p = 0.333. Using Benford’s law, the same probability is 0.155. When looking at the suspect report, find the percent of figures that start with number higher than 6. If that percent is closer to 33%, than to 15%, it is probably fake. f) σ = 2.58. No this would not be the best way to find a fake because the standard deviations are not too different from each other. 4) N (7.11, 0.74) find P(X < 6). 𝑧 = 6−7.11 0.74 = −1.50 P(X>6) = 0.0668. There is a 6.7% chance that the student will run the mile faster than 6 minutes 5) a) Yes, students did have higher scores in general after participating in the chess program. The mean difference was 5.38 and the median was 3. This means that at least half of the students (though less than three quarters of them since Q1 was negative) improved their reading scores. b) No, we cannot conclude that chess causes an increase in reading scores. We did not have a control group that did not participate in the chess program so we do not have a comparison group. It may be that children naturally improve their reading scores over than time period. c) Predicted posttest = 17.897 + 0.78301(pretest) d) The residual plot does not show a pattern and the scatter plot shows a positive linear correlation. r calculates to 0.746 which is a moderate, positive linear association. Therefore, the LSRL is an appropriate model. 6) µx = 3 and σx = 2.52. µY = $2.5 million and σY = 2.268 million 7 a) µY = 1.985 grams, σY = 0.8505 grams b) P(Y ≥ 3) = P(Z ≥ 1.19) = 0.1170 8 a) T = R1 + R2 is Normal with a mean of 350 ohms and a standard deviation of 3.737 ohms. b) P(345 < resistance < 355) = P( -.133 < Z < 1.33) – 0.8164 9 a) µW = $303.35 and σW = $6,864.29 b) 𝑉 = 𝑋1 +𝑋2 +𝑋3 +𝑋4 4 µV = $303.35 and σV = $4,853.79. The variation is smaller by a factor of 1⁄ √2 10 ) Binomial: X = people chosen from coach out of 7 , n=7, p=0.8421 𝑃(𝑋 = 7) = (77)0.8427 (1 − 0.842)0 = 0.30 11) Let X be the number of hits out of 500 times at bat. P(X ≥ 150) = 0.0207. We would expect only 2% of typical baseball players to hit 0.300 so it is probably not by chance that a player hits 0.300. 12 a) The difference distribution will be a normal distribution with a mean of 25 minutes and a standard deviation of 11.18 minutes. b) P(D < 0) = P(Z < 0) = 0.0125 13 a) binomial b) neither c) geometric d) geometric e) binomial 14 a) 10 x 4 = 40 ounces b) 𝜎 = √12 + 12 + 12 + 12 = 2 15 a) µ = 156 σ = 11.6499 b) If there were 15% then there would be 1200(0.15) 180 Hispanics in the sample. The problem does not state if the distribution is binomial or Normal. If you assume binomial P(X ≥ 180) = 0.0235. If you assume normal, P(X ≥ 179.5) = 0.0218. In either case, the probabilities are low so we should be suspicious of the opinion poll.