Practice Problems for Midterm #2 1. According to scientists, asteroids 500 meters in diameter or larger are expected to strike the earth once every 10,000 years (on average). Fill in the blank: “Mathematically, there is a 10% chance that the earth will be struck by a large asteroid within the next __________ years.” Answer: P(x x0) = 0.1 = 1-e-x0 / 10,000 x0 = 1053.6 years. 2. You have collected the following data from a population that is approximately normal in distribution: 41.19 40.04 35.27 18.91 15.60 47.02 58.67 39.38 35.04 47.16 30.60 45.50 64.91 39.95 20.61 The sample has a mean of 38.66 and a standard deviation of 13.70. Suppose you plan to collect one additional data point. a) What is the probability that the sample point will be less than 13.452? Answer: z13.452 = (13.452 – 38.66)/13.70 = -1.84. From the normal table, we see that – 1.84 corresponds tom P(-1.84 < z < 0) = 0.4671. Therefore, P(x<13.452) = 0.5-0.4671 = 0.0329. b) Find the point x0 such that P(x > x0) =1.1%. Answer: We want the upper tail to have a probability of 0.011. We must therefore find 0.5-0.011 = 0.489 in the table. This corresponds to a z of 2.29. Therefore x0 = 38.66+2.2913.70 = 70.03. 3. Suppose there is a test for a disease which affects 1 person out of every 2000. The test correctly identifies a sick person 99% of the time (i.e., 1% of the time, the test says that the sick person is actually healthy). Unfortunately, the test returns false positives (a healthy person is identified as being sick) 2% of the time. A friend tests positive for the disease. What is the probability that your friend actually has the disease? Answer: P(sick) = 1/2000 = 0.0005; P(failed test|sick) = 0.99; P(failed test|healthy) = 0.02. Using Bayes’ Rule, Psick P failed test sick Psick failed test Psick P failed test sick Phealthy P failed test healthy 0.0005 0.99 0.0005 0.99 0.9995 0.02 0.0242 4. Serious traffic accidents occur, on average, twice per weekday in a city. The city has emergency services capable of handling four serious accidents in a day. When more than four accidents occur, emergency services are requested from surrounding communities. What percentage of weekdays will the city need help from surrounding communities? Answer: We want P(x>4) = f(5)+f(6)+f(7)+f(8)+….. and can use the Poisson distribution. This is an infinite sum, so we can simplify it by calculating P(x>4) = 1 - f(4) - f(3) - f(2) - f(1) – f(0). 2 4 e 2 2 3 e 2 2 2 e 2 f 4 0.0902; f 3 0.1804 ; f 2 0.2707; 4! 3! 2! 21 e 2 2 0 e 2 f 1 0.2707; f 0 0.1353 1! 0! So, P(x>4) = 1-0.0902-0.1804-0.2707-0.2707-0.1353 = 0.0527 = 5.27% 5. BRIEFLY evaluate the validity of the following statement: “Even if a variable is continuous in nature, we cannot measure it in a continuous way. For example, we cannot measure things out to an infinite number of decimal places. An implication of this is that continuous probability distributions are only useful in theory and are not useful when applied to real-world situations.” Answer: There are two strong arguments here. First, many distributions have so many discrete possibilities that they are nearly continuous (suppose we measure temperature to three decimal places, for instance). In such cases, a continuous distribution is a reasonable approximation. Second, the Central Limit Theorem says that the sample mean from any distribution (even if it is discrete) will be approximately normal in distribution if n is large. Thus a continuous distribution is the theoretical distribution for sample means. For both these reasons, continuous distributions are of critical importance in real-world settings. 6. You are interested in selecting a simple random sample of 200 households from the phone book. You know the number of entries in the book, but some of them are for businesses. A colleague suggests the following method. You randomly select 200 numbers from a discrete uniform distribution between 1 and the number of entries in the phone book. You then select the entries corresponding to those numbers. If a selection happens to be a business, you simply move to the next item in the phone book and include that in your sample. Does this methodology meet the criteria for a simple random sample? BRIEFLY explain your answer. Answer: Households immediately following businesses are more likely to be chosen than those not following businesses. The strategy does not therefore meet the criteria. 7. Explain the difference between unbiasedness and efficiency. Which is more important in choosing a point estimate? BRIEFLY justify your answer. Answer: An unbiased estimate is one that is correct in expectation. An estimate is more efficient than another estimate if its standard deviation is lower. Having an unbiased estimate is usually considered more important than having the most efficient estimate. 8. You have collected the following data from an unknown population: 47 44 42 0 8 4 1 49 4 2 43 43 42 5 42 46 5 8 45 3 49 3 9 46 5 48 44 9 1 47 BRIEFLY explain how you would estimate the distribution of the sample mean. IN ONE SENTENCE, justify your method. There is no need to do any calculations here. Answer: The expected value of the sample mean would be the average of the numbers. The standard deviation of the sample mean would be the sample standard deviation divided by the square root of 30. The sample mean would be approximately normal in distribution. This approach is justified by the Central Limit Theorem. 9. Suppose that you plan to flip a fair coin 1000 times and record the number of heads. What is the probability that at least 520 heads will be recorded? Answer: By the Central Limit Theorem, the sample proportion will be approximately normal in distribution. p p1 p n 0.52 0.5 0.51 0.5 1.27 . From the 0.0158 . Then, z 0.0158 1000 normal table, we see that z=1.27 corresponds to an area of 0.3980 between z=1.27 and z=0. P( p >0.52) = 0.5-0.3980 = 0.1020. 10. A company purchases electronic switches from three different suppliers. 55% of the switches come from supplier A, 30% from supplier B, and 15% from supplier C. Supplier A switches are defective with probability 1.0%. Supplier B switches are defective with probability 2.0%. Supplier C switches are defective with probability 3.0%. a) Assuming that the company manufactures a radio that uses exactly one of the switches, what is the probability that a randomly chosen radio will have a defective switch? Answer: P(def) = P(A)P(def|A) + P(B)P(def|B) + P(C)P(def|C) = 0.550.01 + 0.30.02 + 0.150.03 = 0.016 b) Suppose that a radio is found to have a defective switch. Which supplier is most likely to have provided the switch? Answer: Using Bayes’ Rule, P A def P A Pdef A PB Pdef B PC Pdef C 0.55 0.01 0.55 0.01 0.3 0.02 0.15 0.03 0.344 PB def PB Pdef B P A Pdef A PB Pdef B PC Pdef C 0.3 0.02 0.55 0.01 0.3 0.02 0.15 0.03 0.375 PC def P A Pdef A PC Pdef C P A Pdef A PB Pdef B PC Pdef C 0.15 0.03 0.55 0.01 0.3 0.02 0.15 0.03 0.281 Supplier B is the most likely to have supplied the switch. c) What is the probability that the supplier in b) provided the switch? Answer: See answer to b). 11. On average, 10 people go through a supermarket checkout line every hour. The probability of someone entering the checkout line is the same for any two time intervals of equal length. a) What is the probability that between 2 and 5 (inclusive) people will approach the checkout line during a 30 minute period? Answer: Using the Poisson distribution, 5 2 e 5 53 e 5 5 4 e 5 55 e 5 f 2 f 3 f 4 f 5 0.576 2! 3! 4! 5! b) What is the probability that no customers will approach the checkout line in the next 6 minutes? Answer: = 10/12 = 0.8333 0.83330 e 0.8333 f 0 0.435 0! 12. You have conducted a study of spring term behavior and have found the following: 60% of all students both attend class regularly and go to Goshen every week. Of the students who do not go to Goshen every week, 85% attend class regularly. 75% of all students go to Goshen every week. What percentage of students attend class regularly? Answer: Pattend Goshen 0.6 ; Pattend no Goshen 0.85 ; PGoshen 0.75 Pattend Pattend Goshen Pattend no Goshen Pattend no Goshen Pattend no Goshen Pno Goshen 0.85 0.25 0.2125 Pattend 0.6 0.2125 0.8125 . 13. As part of a promotion, the supermarket randomly chooses customers who then receive a 20% discount on their purchases. The randomization process is such that the probability that a given customer receives the discount is 0.12. What is the probability that more than 4 of the first 30 customers will receive the discount? Answer: Using the Binomial distribution, f 5 f 6 f 7 ... 1 f 0 f 1 f 2 f 3 f 4 30 30 30 30 30 1 0.12 0 0.88 30 0.121 0.88 29 0.12 2 0.88 28 0.12 3 0.88 27 0.12 4 0.88 26 0 1 2 3 4 0.288 14. You work in a building that has a notoriously slow elevator. On average, the elevator arrives at a floor 3 minutes after the button is pressed. The probability that the elevator will arrive during a given 5-minute interval is the same as for any other 5-minute interval. a) What is the probability that you will have to wait longer than 5 minutes for the elevator to arrive? Answer: The exponential applies. Px 5 e 5 / 3 0.1889 . b) What is the probability that you will have to wait for between 1 and 2 minutes? Answer: Px 2 e 2 / 3 0.5134 and Px 1 e 1 / 3 0.7165 , so P1 x 2 0.7165 0.5134 0.2031. 15. An oil well has produced an average of 100 barrels per day over the last 30 days. The standard deviation of daily production is 10 barrels per day. The data appears to be normal in distribution. a) What is the probability that the well will produce at least 110 barrels tomorrow? 110 100 1 gives an area of 0.3413 between z=1 and z=0. P(x>110) = 0.5Answer: z 10 0.3413 = 0.1587. b) What is the probability that the well will produce between 90 and 120 barrels tomorrow? 90 100 120 100 1 and z 2 give areas of 0.3413 and 0.4772. So Answer: z 10 10 P(90<x<120) = 0.3413+0.4772 = 0.8185. c) Suppose that a geologist originally told you that the well should produce an average of 105 barrels per day. Evaluate the validity of his assessment. 105 110 z 2.74 for 105, which would not necessarily be considered an outlier. It is far 1.826 enough away from the mean that we should be concerned, however. 16. On average, it rains 55 days per year (assume 365 days per year). What is the probability that it will rain 2 days or less over the next 7 days? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?). Answer: P(rain) = 55/365 = 0.1507. Using the binomial distribution, 7 f 0 0.1507 0 0.84937 0.3187 0 7 f 1 0.15071 0.84936 0.3959 1 7 f 2 0.1507 2 0.84935 0.2107 2 f(0)+f(1)+f(2) = 0.9254 The binomial is a reasonable assumption when we have a two-state outcome and observations are independent. 17. On average, the February temperature is 40 degrees with a standard deviation of 8 degrees. What is the probability that the temperature will be between 36 and 50 degrees on a given day in February? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?). Answer: Using the normal distribution, we have z36 = (36-40)/8 = -0.5 and z50 = (50-40)/8 = 1.25. From the normal table, we see that z36 -0.1915 and z50 0.3944. P(36<x<50) = 0.1915+0.3944 = 0.5859. In using the normal distribution, we assume that the distribution of temperature is shaped, at least approximately, like the bell curve. 18. Consider the following data: Day: Monday Tuesday Wednesday Thursday Friday Week 1 Sales 43 50 57 47 55 Week 2 Sales 37 58 56 44 42 Week 3 Sales 34 59 54 44 44 Week 4 Sales 32 54 59 57 49 Week 5 Sales 43 54 59 52 41 Week 6 Sales 31 54 51 58 59 Week 7 Sales 42 42 55 59 52 Week 8 Sales 49 45 54 47 42 Week 9 Sales 48 57 42 42 55 Week 10 Sales 33 45 41 54 41 Your boss asks you to calculate the average daily sales. How do you respond? What is the probability of getting sales between 38 and 41 (inclusive) next Monday? Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?). Answer: The mean daily sales during the period is 48.44. This might be misleading because Monday may be an outlier (it is the only day with any sales in the 30s and it is the only day without sales in the 50s). The mean sales figures by day are Day Mean Sales Monday 39.2 Tuesday 51.8 Wednesday 52.8 Thursday 50.4 Friday 48.0 One approach in addressing the potential outlier is to calculate the standard deviation of daily sales. In this case, s = 5.47. Monday is therefore (39.2-48.44)/5.47 = -1.69 standard deviations away from the mean. For a small sample, this suggests (although it is far from conclusive) that Monday may be an outlier. Coupling this with the intuition above (which basically says that the ranges are vastly different) leads me to believe that Monday is quite likely an outlier. I would therefore tell my boss that although the mean daily sales is 48.44, Monday averages 39.2 while the rest of the week averages 50.75. There are several reasonable approaches to estimating the probability of getting between 38 and 41 sales next Monday. Treating Monday as an outlier, we must use Monday’s mean and standard deviation (6.66) in the analysis. The data is clearly discrete, but we may choose to use the normal distribution as an approximation. z37.5 = (37.5-39.2)/6.66 = -0.255 and z41.5 = (41.5-39.2)/6.66 = 0.345. From the normal table, z37.5 0.21055 and z41.5 0.13495 (in both cases, I interpolated between the numbers). P(38x41)0.21055+0.13495 = 0.3455. An alternative approach is to assume that the data follows a discrete uniform distribution. This seems reasonable when we observe that the numbers are spread pretty evenly between 30 and 50. Given a 20 unit distribution and four outcomes of interest (38, 39, 40, and 41), we estimate P(38x41) = 4/20 = 0.20. The two estimates differ greatly because the assumptions behind the discrete uniform and normal distributions are so different. Ideally, we would gather as much data as possible to determine what distribution is most appropriate. Note that we might also consider a binomial distribution if we can get data on the number of sales attempted, etc. 19. Your company recently manufactured a lot of 1000 units. Those units were tested and 8 were found to be defective. Unfortunately, the defective units were subsequently mixed in with the good units. Your task is to separate the good units from the bad. What is the probability that you will have to test 999 units to determine to separate the units? (Hint: once you test 999 units, you know whether the last one is good or not….if you have 7 bad units out of 999, you know that last one is bad….if you have 8 bad units out of 999, you know the last one is good). Answer: You would have to test 999 only if there is one defective unit left after you have tested the first 998 units (if you find 6 bad units out of the first 998 tested, you would stop knowing that the last two were bad). So, the probability of having to test 999 units is equal to the probability that exactly 1 unit is bad out of the last 2. Here, p=8/100 = 0.008. 2 f x 0.008 1 0.008 0.01587 . So, there is a 1.587% chance that you would have to test 1 the 999 units. 20. A bank screens credit applicant based on three factors, current debt, income, and prior payment history. 40% of all applicants are rejected. 15% of applicants fail the debt test. 20% of applicants fail the income test. 5% of applicants fail the payment history test. You know that a certain customer applied and was rejected. What is the probability that the customer was rejected due to low income? Comment on your ability to answer the question if 30% of all applicants are rejected (and the other numbers are the same). Answer: We want P(low income|rejection). We know from Venn diagrams and the derivation of Bayes’ Rule that Pincome rejection Pincome rejection Pdebt rejection Phistory rejection 0.20 0.20 0.15 0.05 0.5 So there is a 50% chance that the applicant failed due to low income. We can use this approach because the factors are independent. If 30% of all applicants are rejected, then there must be some applicants that were rejected for multiple reasons. To answer the question, we must clarify whether we are interested in “rejected due only to low income” or “rejected due to low income and/or other reasons”. If the former, the answer is 0.20/0.30 = 66.67%. If the latter, we cannot answer without additional information. Pincome rejection 21. Suppose that telemarketing sales are dependent on two factors: weather (when it’s raining, more people are home) and time of day (if you call during prime time, people are less likely to answer the phone). Those factors are independent. It rains with probability 0.1 and prime time constitutes 40% of the normal calling hours. A telemarketer can make 10 calls per hour. The net profit (including everything except telemarketer wages) per successful call is $9 and the probabilities of success on a given call are as follows. Raining Not Raining Prime Time 0.25 0.15 Not Prime Time 0.3 0.2 Telemarketers charge $15 per hour. What is the expected profit per hour of calling? Should you implement a restricted calling plan? If so, what would you recommend? What is the expected profit per hour of calling under the new plan? Answer: The probabilities for the possible scenarios are Raining Not Raining Prime Time 0.10.4 = 0.04 0.90.4 = 0.36 Not Prime Time 0.10.6 = 0.06 0.90.6 = 0.54 P(success on a randomly chosen call) = P(prime time & raining)P(success | prime time & raining) + P(not prime time & raining)P(success | not prime time & raining) + P(prime time & not raining)P(success | prime time & not raining) + P(not prime time & not raining)P(success | not prime time & not raining) = 0.040.25 + 0.060.3 + 0.360.15 + 0.540.2 = 0.19 The expected profit per hour of calling is then 100.19$9 - $15 = $2.10. On average, 1.9 calls per hour are successful, giving a net profit of $17.10 less the $15 paid to the telemarketer. The lowest probabilities of success occur during prime time, so we might consider not making calls during prime time. To answer this, we consider whether prime time calling is profitable or not. P(success on a randomly chosen call during prime time) = P(raining)P(success | raining) + P(not raining)P(success | not raining) = 0.10.25 + 0.90.15 = 0.16 The expected profit per hour of calling during prime time is then 100.16$9 - $15 = -$0.60. So, we should not make calls during prime time. P(success on a randomly chosen call not during prime time) = 0.10.3 + 0.90.2 = 0.21. Expected profit under the new plan = 100.21$9 - $15 =$3.90. One might also consider not calling unless it is raining, but that would be difficult to implement. It also might result in low morale because the employees would be on a very uncertain work schedule. I therefore chose not to consider that possibility. 22. Briefly discuss the importance of the Central Limit Theorem. Answer: The CLT is the cornerstone of most of statistics. It is important because the distribution of the average of any sample is approximately normal in distribution for large n. Even if we have no idea what the underlying distribution is, we can test certain hypotheses because of the CLT. 23. A city is investigating traffic patterns and observes that, on average, 20 cars go through an intersection per hour. a) What is the probability that no car will go through during the next five minutes? Answer that question twice, using a different distribution each time. Be sure to list any assumptions that you make in answering the problem (i.e., What distribution did you use? What assumptions are needed so that it is reasonable to use that distribution?). Answer: The average # of cars during a 5 minute period is 205/60 = 1.6667. Using the Poisson 1.6667 0 e 1.6667 0.1889 . The average time between cars is 60/20 = 3 distribution, f 0 0! minutes. Using the exponential distribution, Px 5 e 5 3 0.1889 . b) What is the probability that no more than three cars will go through the intersection during the next ten minutes? Answer: Average # of cars during a 10 minute interval = 3.33333. Using the Poisson distribution, f(0)+f(1)+f(2)+f(3) = 0.0357+0.1189+0.1982+0.2202 = 0.573. 24. Briefly explain how you would choose a random sample of 10 items from a group of 100. Answer: Assign a random number to each of the 10 (using the Excel RAND() function, for example) and then select the ten with the lowest random numbers. 25. Conceptually, why is the normal distribution a reasonable approximation to the binomial under some circumstances? Answer: Consider an experiment in which we flip a coin 100 times and record the number of heads. We then repeat that experiment many times and produce a histogram of the results. Intuitively, we believe that the histogram would show a peak around 50, with symmetric, declining tails on both sides. I.e., the histogram would have a bell-shaped curve. Mathematically, it is a reasonable approximation because of the Central Limit Theorem. We could rescale the histogram to be the sample mean with a head=1 and a tail=0. By the CLT, we know that the average would be approximately normal in distribution. 26. Briefly explain why the normal distribution might be considered the most important of all distributions. Answer: From the CLT, we know that the sample mean of any distribution is approximately normal in distribution when n is large. The normal distribution therefore appears naturally in nearly every scenario. It is consequently the most used and most important distribution. 27. What is the standard error of the mean and why is it an important concept? Answer: The standard error of the mean is the standard deviation of the sample mean. It is important because we can use it in conjunction with the sample mean to test hypotheses about population parameters.