Math 251, Review for Final, Autumn 2002 (Rough Answers) The following questions are samples of the types of questions that may be on the final. There may be questions on the final from topics not represented here. For further review, look at your old tests and reviews, assigned homework, etc. Material covered since 3rd test will probably comprise about 30% of the points on the final test. This material includes hypothesis test for means (large samples and small samples), hypothesis tests for proportions, hypothesis test for the difference of two population means. The chi-square test for goodness of fit, and analysis of variance. The rest of the test will comprise of questions chosen from the other material covered throughout the quarter. 1. (a) Which type of random variable is the number of consumers refusing to answer a telephone survey and what possible values can it take? Discrete -- can take any nonnegative integer value, i.e, {0,1,2,3,4,5,....} (b) How many bridge hands are there that have 4 aces? What is the probability of getting such a hand? The number of total Bridge Hands is N(S) = 52C13= 52! = 635,013,559,600 13!39! The number of hands with four aces is N(A) =4C4 48C9= 1 48! =1,677,106,640 9!39! The probability of four aces is N(A) =.00264 N(S) 2. For events A and B in a sample space S, we are told P(A) = .5 and P(B) = .3 and P(A and B) = .15. Which of the following is true? (a) A and B independent events. (b) P(A or B) = .8 (c) A and B mutually exclusive events. (d) All of the above. ANSWER: A [Check that P(A and B) = P(A)*P(B), also, neither (b) nor (c) are true because P(A and B) > 0] 3. Which of the following is true about a binomial random variable for n trials with probability of success on each trial given as p. (a) The probability of n successes is pn. (b) Its variance is equal to np(1-p). (c) The probability of no successes is (1-p)n. (d) All of the above. ANSWER: D [Look at formulas for binomial random variables!] 4. An hypothesis test on the mean reports a P-value of .031. Which of the following is true? (a) The null hypothesis should be accepted if the level of significance is .03. (b) The null hypothesis should be rejected if the level of significance is .05. (c) There is almost a 97% chance of making a Type I error. (d) All of the above. ANSWER: B (Reject null hypothesis if P-value level of significance ) 5. If a 95% confidence interval for the population mean has length 12 when the sample size is 100, what would the length of a 95% confidence interval from the same population be if the sample size were 1600? (a) 12 (b) 48 (c) 3 (d) 6 ANSWER: C (the length decreases by a factor of 4 which is the square root of 1600 over the square root of 100) 6. A two-tailed hypothesis test on the mean of an approximately normal population is conducted with a sample size of n=10. For what t-values should the null hypothesis be rejected given that the level of significance is .05? (a) t -1.96 or t 1.96 (c) t -1.833 or t 1.833 (b) t -2.262 or t 2.262 (d) t -2.228 or t 2.228 ANSWER: B (two-tailed test, find that the critical value is 2.262 with d.f. = 9) 7. (a) Given the data 9,12,15,17,17,19,23,44,57,61,63,70. Find the mean, median, range, and the mode. ANSWERS: i) x = 407, therefore the mean is 4071233.92. ii) The median is in the 6.5th place, therefore the median is (19+23) 2 = 21. iii) The range is 70 - 9 = 61. iv) The mode is 17 which is the most common data value. b) If your score is at the 81st percentile on a national exam which was taken by 200,000 people, approximately how many of those 200,000 test takers scored higher than you? ANSWER: Approximately 19% of 200,000 which is 38,000. 8. In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate, where in reality, unknown to the pollster, 42 percent support the Republican candidate. (a) What is the value of the statistic of interest? ANSWER: 45% (a statistic is a numerical property of the sample) (b) What is the value of the parameter of interest? ANSWER: 42% (a parameter is a numerical property of the population) (c) Describe the population of interest. ANSWER: The population is all voters in the state. (d) In general, is it true that given a certain population, the parameter of interest will not change under repeated sampling? Explain. ANSWER: True, the parameter does not depend on a specific sample, so it doesn’t change when the sample changes. 9. (a) According to Chebychev’s theorem, how much data from any distribution can be more than 3 standard deviations from the mean? ANSWER: At most 1/9 of the data. (b) Given a population of size 4,800 with unknown distribution, at least how many data values are within 4 standard deviations of the mean? ANSWER: At least 15/164,800 = 4,500 10. The following ranked data represent the number of miles driven each day by a salesman over a 30-day period. 31 71 86 37 74 86 43 75 87 44 75 89 44 78 89 55 81 92 58 81 92 65 81 93 65 82 99 66 84 101 Construct a relative frequency histogram for these data whose first class has class limits 30-44: ANSWER: limits 30-44 45-59 60-74 75-89 90-104 Frequency 5 2 5 13 5 Rel. Freq. .167 .067 .167 .433 .167 See your text for constructing the relative frequency histogram. Class Width = 15 11. Consider the sample of 30 numbers 31 71 86 37 74 86 43 75 87 44 75 89 44 78 89 55 81 92 58 81 92 65 81 93 65 82 121 66 84 133 for which x = 2258, and x2 = 184670 (or (x-2= 14,717.86667) Find: (a) the sample mean 225830=75.27 (b) the sample variance 507.51 [Note: s2 = (30(184670)-22582)/(3029) ] (c) the sample standard deviation 22.53 (the square root of the answer in (b)) (d) Given that Q1= 65, Q2= 79.5 and Q3= 87, construct a boxplot for the data. For the box plot -- see text -- noting that the lower whisker starts at 31, the box has hinges at 65 and 87 with the vertical line in the box at 79.5 (the median), and the upper whisker ends at 133 (e) Find the interquartile range for the data. IQR = 22 12. (True or False) (a) True The median is a resistant measure because it is not influenced by extreme observations. (b) False The mean is a resistant measure because extreme measures on one side average out with those on the other side. (c) True The mean and median are equal in a symmetric distribution. (d) True The mean is usually to the right of the median in a distribution that is skewed to the right. 13. The following represent scores of a group of 15 students on Math and English tests. Scores on English Test 73 75 77 77 78 79 80 81 82 83 84 85 85 86 89 Scores on Math test 72 75 79 83 84 85 87 88 90 91 92 93 93 97 98 (a) Construct and leaf plots splitting stems 7,8,9 into two parts with leaves 0-4 on one part and 59 on the other part? Math: 7 7 8 8 9 9 2 59 34 578 01233 78 7 7 8 8 9 9 3 57789 01234 5569 English: 9|8 = 98 (b) Which test scores seem to have a higher standard deviation? Explain. Don't compute! The math test scores appear to have a higher standard deviation because they are more spread out (the English test scores are bunched much more closely together than the math test scores). 14. Suppose distribution of test scores for a certain test is normal with = 70 and = 12. Suppose that 500 students wrote the test. (a) What test score would have a z-score of -2.25? ANSWER: x = 43 [Solve -2.25 = (x - 70) 12] (b) What score would put a student at the 90th percentile? ANSWER: the 90th percentile has a z-score of approximately 1.28, thus x 85.36, i.e. P 90 85.36 [The x was found by solving 1.28 = (x - 500) 12] (c) Approximately what number of students would have scores between 60 and 90? ANSWER: .7492 z = (60 - 70)/12 -.833 z = (90 - 70)/12 1.67 P(-.833 < z < 1.67) .9525 - .2033 = .7492 15. A study of behavior of a large number of drug offenders after treatment for drug abuse suggests that the likelihood of conviction within a two-year period after treatment may depend on the offender's education. The proportions of the total number of cases falling to four education/conviction categories are shown in the following table: 10 or more years of education 9 or less years of education Convicted .1 Not Convicted .3 .27 .33 Suppose a single offender is selected from the treatment program. Define the events: A: The offender has 10 or more years of education. B: The offender is convicted within 2 years of completion of treatment. ANSWERS: (a) P(A or B) = P(A)+ P(B) - P(A and B) = .4 + .37 - .1 = .67 (b) P(A and B) = .1 (c) P(B|A) = P(A and B) P(A) = .1 .4 = .25 (d) The probability that neither A nor B occurs is: 1 - P(A or B) = 1 - .67 = .33 (e) A and B are not independent because P(A)P(B) P(A and B) (f) A and B are not mutually exclusive because P(A and B) 0. (Note that these answers (a)-(d) can be found in different ways using the table above.) 16. A business employs 600 men and 400 women. Five percent of the men and 10% of the women have been working there for more than 20 years. If an employee is selected by chance, what is the probability the employee is male, given that the length of employment is more than 20 years? ANSWER. Let A = event that the employee is male and B = event the employee has worked there more than 20 years Then P(B) = [.05*600+.10*400]/1000 = .07 P(A and B) = (.05)(600)/1000 = .03 P(A|B) = P(A and B) P(B) = .03/.07 .4286 which is the probability an employee is male given that the length of employment is more than 20 years. (alternatively, this can be solved noting that 30 out of the 70 employees who have worked there for more than 20 years are male). 17. (a) How many permutations are there of 30 objects taken 3 at a time? 30P 3 = 30!27! = 302928 = 24,360 (b) In how many ways can a gold medal, silver medal and bronze medal be awarded to 30 competitors in a fencing competition? 30P 3 = 30!27! = 302928 = 24,360 (c) How many menu possibilities are there in a restaurant that offers 5 different appetizers, 6 Salads, 12 main dishes and 10 desserts if one choice is made from each category? ANSWER: 561210 = 3,600 (d) Suppose that a large shipment of CD’s contains 5% defective CD’s. Suppose a customer chooses 2 of these CD’s at random. What is the probability that: i) Both CD’s will be good? ANSWER: (.95)(.95) = .9025 ii) Both CD’s will be defective? ANSWER: (.05)(.05) = .0025 iii) Exactly one CD is defective? ANSWER: 1 – (.9025+.0025) = .095 iv) At least one CD is defective? ANSWER: 1 - .9025 = .0975 (or .095+.0025 = .0975) v) At least one CD is good? ANSWER: 1 - .0025 = .9975 (or .9025+ .095 = .9975) 18. A jury pool consists of 13 men and 15 women. What is the probability that a randomly chosen jury from this pool will consist of 5 men and 7 women? Number of ways of choosing a jury: 28C12 = 30421755 Number of ways of choosing a jury with 5 men and 7 women from pool: 13C5 * 15C7 = 1287*6435 = 8281845 Probability of choosing a jury with 5 men and 7 women from pool: 8281845 30421755 .2722 19. Let x be the random variable that represents the number of heads observed when 5 fair coins are tossed. Make a probability distribution for x, and find the probability that one will get more than 3 heads when tossing five fair coins. x p(x) 0 .03125 1 .15625 2 .3125 3 .3125 4 .15625 5 .03125 This was computed by using the binomial probability formula, i.e. p(x) = 5Cx (.5)x(.5)5-x Thus, P(x > 3) = .15625+.03125 = .1875. Thus there is a probability of .1875 of getting more than 3 heads when tossing 5 fair coins. Hence if one were to toss 5 fair coins 10000 times, they would expect to have more than 3 heads (on average) 1875 of those times. 20. Consider the random variable whose probability distribution is given by the following table. x p(x) 3 .1 7 .3 8 .45 11 ? (a) Is this a discrete or continuous random variable? ANSWER: Discrete (b) P(x = 11) = 1 - (.1 + .3 + .45) = .15 (c) Construct a probability histogram for p(x), and compute the expected value of x and the standard deviation of x. The expected value is: E(x) = 3(.1) + 7(.3) + 8(.45) + 11(.15) = 7.65 The variance is: 2=32(.1)+ 72(.3)+ 82(.45)+ 112(.15)- 7.652 = 4.0275 Thus the standard deviation is = 4.02751/2 2.0069 See text for constructing histograms (Section 4.1, p. 164ff). 21. The following sample data concerns the number of years a student studied German in school versus their score on a proficiency test. Years (x) 3 Test Score(y) 57 Note: x = 35 4 78 y = 697 4 72 2 58 5 89 3 63 x2 = 133 y2 = 50085 4 73 5 84 3 75 2 48 xy =2554 (a) Find the equation of the least squares line for this data. slope = (102254 - 35697)/(10133 - 352) 10.90476 y-intercept = (697 10) - 10.90476(3510) 31.533 Thus the equation of the line is: y 10.905x + 31.533 (b) Use your line from (a) to predict the score on the proficiency test of a person who had 3.5 years of German. y 10.9053.5 + 31.533 69.7 (c) Use the regression line in (a) to predict the number of years of German required to achieve a proficiency score of 75. x (75 - 31.533) 10.905 3.99 years (d) Compute the correlation coefficient r for this data. What does this coefficient suggest about a linear relationship between number of years German was studied in school and test scores for this sample? That is, determine whether it is a good fit, and whether it indicates a positive or negative linear relationship. r = [(10)(2554)-(35)(6970)] [(10(133)-352)1/2 (10(50085)-6972)1/2] 9.11 This value is reasonably close to +1, which means it represents a good linear relation with positive slope (i.e. as x increases so does y). The closer r is to +1 (or to -1) the better the linear fit will be. 22. Cascade Airlines (a.k.a. “Crashcade” and now defunct) records showed that on average 10% of prospective passengers will not claim their reservations on a certain flight. Suppose that they booked 21 passengers for 20 seats on that flight. (a) Find the mean and standard deviation for the number of passengers who will claim a reservation. = (21)(.9) = 18.9 2 = 21(.9)(.1) = 1.89 Thus the mean number of passengers showing up is 18.9 with standard deviation 1.375 (b) Find the probability that all passengers who show up for the flight will receive a seat? P(x 20) = 1 - P(x = 21) = 1 - (.9)21 = 1 - .1094 = .8906 [We have solved for P(x 20) because if 20 or fewer passengers show up, they will all have seats.] 23. A developer wishes to test whether the mean depth of water below the surface in a large development tract was less than 500 feet. For the sample data, n = 32 test holes, the sample mean was 486 feet, and the standard deviation was s = 53 feet. Complete the test using the P-value approach, and report the conclusion for a 1% level of significance. Null Hypothesis: 500 Alternative Hypothesis: < 500 Standardized Test Statistic: z = [486 - 500] [53/(32)1/2] -1.49 P-Value: P(z < -1.49) = .5 - .4319 = .0681 We would not reject the null hypothesis at a level of significance of .01 because the P-value is larger than .01. 24. A vendor was concerned that a soft drink machine was not dispensing 6 ounces per cup, on average. A sample size of 40 gave a mean amount per cup of 5.95 ounces and a standard deviation of .15 ounce. (a) Find the P-value This is a two-tailed test: Null Hypothesis: = 6 Alternative Hypothesis: 500 The observed value is: z = [5.95 - 6] [.15/(40)1/2] -2.11 The P-value is: P(z < -2.11) + P(z > 2.11) = 2(.0174) = .0348 (b) For which of the following levels of significance would the null hypothesis be rejected? (i) = .10 (ii) = .05 (iii ) = .01 Reject the null hypothesis in (i) and (ii) since the P-value is smaller than ; do not reject the null hypothesis in (iii). (c) For each case in part (b), what type of error has possibly been committed? Possible Type I error may occur in (i) and (ii) while a Type II error may occur in (iii). (d) Find a 98% confidence interval for the mean amount of soda dispensed per cup. For c = .96, zc = 2.05 (approximately), look at z value corresponding to an area of .98 on table. Thus the confidence interval, using the large sample method (n is at least 30) yields endpoints: 6 2.05.15(401/2) and, so the confidence interval is: (5.9514,6.0486) (e) Supposing that the population standard deviation is = .15, what sample size would be needed so that the margin of error in a 98% confidence interval is E = .01? z z ANSWER: The formula to use is: E = c which implies n c and so we compute E n 2 n = (2.05.15/.01)2 = 945.5625 thus we should use a sample size of 946. 25. On June 7, 1999 a poll on the USA Today website showed that out of 2000 respondents, 71% felt that Andre Agassi deserved to be ranked among the greatest tennis players ever. (a) Assuming that the 2000 respondents form a random sample of the population of tennis fans, construct a 95% confidence interval for the proportion of all tennis fans who feel that Andre Agassi should be ranked among the greatest tennis players ever. ANSWER: (.6901, .7299) To find this confidence interval we computed: .71 1.96(.71.29/2000)1/2 = .71 .0199, where the 1.96 = zc for c = .95 and p-hat is .71. (b) Based on (a), would you be comfortable in saying that the poll is accurate to within plus or minus 2 percent 19 times out of 20? Explain. Yes, 95% confidence interval is .71 .0199, hence intervals based on this size of random sample with the given proportion should have an accuracy of 2% on average 19 times out of 20. (c) In actuality, the survey was based on voluntary responses from readers of the USA Today sports website. Do you think the 2000 respondents actually formed random sample? Explain. No -- the readership of the website is limited to those who have access to the site and chose to visit it; moreover, the survey was not based on a random selection of even those users of the website, but on those who chose to respond to the poll. 26. (a) Suppose that a February Gallup poll of 1200 randomly selected voters found that 53 percent support George W. Bush's energy policy. Conduct an hypothesis test at a level of significance of = .01 to test whether the true voter population support for George W. Bush's energy policy in February was greater than 50 percent. ANSWER: Null Hypothesis: p .50 Alternative Hypothesis: p > .50 Critical Region: z 2.33 Test Statistic: z = (.53 - .5)/(.5.5/1200)1/2 2.08 Conclusion: Because 2.08 does not fall in the critical region there is not sufficient evidence to reject the null hypothesis at a level of significance of 1%. (b) Report the P-value of the test in (a) and give a practical interpretation of it. P-value = P(z > 2.08) = 1 - .9812 = .0188. Thus we are roughly 98% certain that more than 50% of all voters support President G.W. Bush’s energy policy. 27. A brand of paint claims that in one coat, 1 gallon will cover at least 350 square feet on average. A random sample of ten 1-gallon cans produced the following data. Area Covered (Square Feet): 342, 378, 358, 364, 381, 392, 339, 356, 386, 347 Note: for this data x = 3643 x2 = 1330395 (a) Conduct the hypothesis test: H0: = 350 vs. Ha: > 350 at a level of significance of significance of = .05. Be sure to state critical region, test statistic and conclusion in your answer. We assume the distribution is approximately normal, so we use the Student’s t-distribution with d.f. = 9. The critical region is: t 1.833 From the data, we calculate the standard deviation s 19.0032 (using the same method as we used in 11(c)) and so the test statistic is t (364.3 - 350) (19.0032/(10)1/2) 2.378 Because 2.378 is in the critical region, we reject the null hypothesis and conclude that on average a gallon of paint covers more than 350 square feet. (b) Construct a 99% confidence interval for the mean. tc = 3.25 for c=.99 and d.f.= 9, thus the confidence interval is 364.3 3.2519.0032/(10)1/2 i.e. (344.8,383.8) 28. In a 1993 survey of 50 Education graduates and 50 Social Science graduates, the following data were obtained for their average starting salaries. Major Education Social Sciences Mean 22,554 20,348 St. Dev 2225 2375 (a) Find a point estimate for the difference in average starting salaries for Education and Social Science majors. ANSWER: 22,554 - 20,348 = 2206 (b) Let 1 be the population mean salary for the Education graduates and 2 be the population mean salary for the Social Science graduates. Report the P-value for the hypothesis test H0: 1- 2 = 1200 versus Ha: 1- 2 > 1200. First, [22252/50 + 23752/50]1/2 460.24. Thus we compute the test statistic z (2206 - 1200) 460.24 2.19 Therefore, the P-value is: P(z > 2.19) = .5 - .4857 = .0143 (c) Based on (b), do you think there is sufficient evidence to believe that 1 is at least $1200 greater than 2 ? Explain. Yes -- based on the P-value we are 98.5% sure that the alternative hypothesis is true, i.e., the mean salary for education graduates is at least $1200 more than the mean salary for social science graduates. 29. Suppose that the probability is .91 that a person who has reservations for a certain opera will show up, and the decision of one person is independent from that of another. Suppose the opera has sold 1243 tickets. What is the probability that at least 1140 people will show up for the opera. ANSWER: use the normal approximation to the binomial theorem (section 5.6): First, we check that np = 1243.91 = 1131.13 5 and nq = 1243.09= 111.87 5 so that it is valid to use the normal approximation to the binomial distribution. Then, = 1131.13 = (1243.91.09)1/2 10.0897 and using the continuity correction, we compute P(x > 1139.5 ) = P(z > (1139.5– 1131.13)10.0897) = P(z > .82956) = 1-.7967 = .2033 That is, there is a 20.33% chance (approximately) that at least 1140 people will show up. 30. (a) If you were to conduct an hypothesis test to determine if the means from several different populations are equal using the method of analysis of variance, what assumptions would you make on the populations? What distribution would you use to conduct your test? The populations are normal with equal variances and the samples are independent. (See text p. 524) (b) Problem 3, p. 532. (See Answer in Text) 31. (a) A local radio station claims that 15 percent of all people in Riverside say it is their favorite station, 65 percent of all people in Riverside listen to it occasionally, while 20 percent never listen to it. Suppose you surveyed 200 randomly selected people in Riverside and found that of those 200 people, 20 claimed it was their favorite station, 131 said they listen to it occasionally, while 49 never listen to it. Conduct an hypothesis test at a level of significance of .05 to determine whether the stations claim is correct. Make sure to state the rejection region for your test. ANSWER: Favorite Occasional Never % 15 65 20 O 20 131 49 E 30 130 40 (O - E)2 100 1 81 Because the expected number is at least 5 in each category, we can proceed with the chi-square goodness of fit test as in section 10.1. The rejection region with d.f. = 2 and = .05 is 2 5.991. The observed value 2 = (O-E) 2/E = 100/30 + 1/130 + 81/40 = 5.336. Because the observed value does not fall in the rejection region, we don’t have enough evidence at the .05 level of significance to reject the radio station’s claim. (b) What are the assumptions one must make when using the chi-square test for Goodness-of-Fit? ANSWER: sample must be randomly selected, and the expected frequency for each category must be at least 5. (c) For further practice, see, e.g. problem 3, p. 500. 32. List conditions that are needed on the population and on the random sample(s) in order to make inferences in the following settings. In some cases, there may be no conditions required, so just list none. (a) Confidence interval for a mean from a large sample. ANSWER: sample size is at least30. (b) Hypothesis test on a mean using a small sample. ANSWER: population must be approximately normal. If population standard deviation is known, use the normal distribution. If population standard deviation is not known, use the sample standard deviation and the student’s t-distribution. (c) Hypothesis test on a proportion. ANSWER: np and nq must be at least 5. (d) Hypothesis test concerning two means from large independent samples. ANSWER: samples form each population must be independent (as suggested) and each must be of size at least 30. 33. Confidence intervals for variance and standard deviation. Do problem #11 on p. 307. See text for answer.