252x0721 3/19/07 ECO252 QBA2 SECOND HOUR EXAM March 23 2007 Name Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. Make diagrams! x ~ N 10, 7 - If you are not using the supplement table, make sure that I know it. 1. P7 x 25 2. Px 15 3. P5 x 0 4. x.085 (Do not try to use the t table to get this.) 1 252x0721 3/19/07 II. (22+ points) Do all the following? (2points each unless noted otherwise). Look them over first – the computer problem is at the end. Show your work where appropriate. Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. A table identifying methods for comparing 2 samples is at the end of the exam. 3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 4. Use a 5% significance level unless the question says otherwise. 5. Read problems carefully. A problem that looks like a problem on another exam may be quite different. 6. Make sure that you state your null and alternative hypothesis, that I know what method you are using and what the conclusion is when you do a statistical test. 1. (Anderson, Sweeny, Williams) We wish to compare miles per gallon of two similar automobiles. A random sample of 8 automobiles is chosen and 8 drivers are asked to drive the cars on identical roads. The data is as follows. Row 1 2 3 4 5 6 7 8 Driver 1 2 3 4 5 6 7 8 Model 1 Model 2 difference x1 x2 d 28 23 25 23 24 26 29 24 26 22 27 22 23 25 27 26 2 1 -2 1 1 1 2 -2 I have computed x1 25.25 , s1 2.2520 , x 2 24 .75 and s 2 2.1213 a. Compute the sample variance for the d column – Show your work! (2) b. Is there a significant difference between the gas consumption in the two models? State your hypotheses! (2) c. Test to see if the variances of the two cars’ gas consumption are similar. (2) [6] 2 252x0721 3/19/07 Exhibit 1: A quality control engineer is in charge of manufacture of computer disks. Two different processes can be used to manufacture the disks. The engineer suspects that the Kohler method produces a greater proportion of defective disks than the Russell method. Out of a sample of 150 Kohler disks, 27 are defective. Out of a sample of 200 Russell disks, 18 are defective. If Kohler disks are sample 1 and Russell disks are sample 2, test the engineer’s suspicion at the 1% level. 2. The hypotheses that should be tested in exhibit 1 are a. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 b. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 c. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 d. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 e. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 f. H 0 : p1 p 2 0 and H 1 : p1 p 2 0 g. None of the above. (Write in correct answer.) 3. For exhibit 1, find the value of the test ratio. (3) [8] [11] 4. For exhibit 1, the hypotheses in 2 and the test ratio in 3 draw an approximately normal curve and show the ‘reject’ region by shading it. (3) [14] 5. For exhibit 1and the hypotheses in 2, find a p-value for the test. (2) [16] 5a.For exhibit 1, find a 17% .17 2-sided confidence interval for the difference between the 2 proportions. (4) 3 252x0721 3/19/07 Exhibit 2: A data entry operation sends a group of its employees to a typing course. The table below shows their speed before x1 and after x 2 . d x1 x 2 . r1 and r2 represent the ranks of the numbers when the before and after speeds are ranked between 1 and 16. d is the absolute value of the items in the d column. r d drops the zero and ranks the numbers in d from 1 to 7 and r d * is the ranks with their signs added. Row 1 2 3 4 5 6 7 8 Processor 1 2 3 4 5 6 7 8 Before rB After rA d abs d rank sRank x1 r1 x2 r2 d d rd rd * 59 57 60 66 68 59 72 52 5.5 3.5 7.5 12.0 13.0 5.5 15.0 1.0 57 62 60 63 69 63 74 56 3.5 9.0 7.5 10.5 14.0 10.5 16.0 2.0 2 -5 0 3 -1 -4 -2 -4 2 5 0 3 1 4 2 4 2.5 7.0 * 4.0 1.0 5.5 2.5 5.5 2.5 -7.0 * 4.0 -1.0 -5.5 -2.5 -5.5 6. Assume that exhibit 2 represents the scores of one sample of eight employees before and after the training. Can we say that the median speed has risen? Do an appropriate statistical test. (3) [19] 7. Assume that instead the before and after columns represent independent samples. Can we say that the median speed has risen? Do an appropriate statistical test. (3) [22] 4 252x0721 3/19/07 8. The owner of Mother Truckers (which actually moved me once) wants to prove that her firm is superior to her arch rivals Wallflower Van Lines and wants to use proportion of shipment with claims filed as a way of doing that. She assembles the following data. Mother Truckers Wallflower Total Shipments Sampled 900 750 Total number of shipments with 162 60 claims over $50 Which would be proper to analyse the data? a. 2 test for independence. b. 2 test for homogeneity c. ANOVA d. z test for comparing 2 proportions. e. Sign test f. The McNemar Test g. None of the above. 9. Which is the closest to the probability that a 2 random variable with 4 degrees of freedom will be greater than 10? a. .01 b. .05 c. .10 d. .99 e. .95 f. .90 10. During a period of 20 days 720 patients arrive at a hospital or an average of 1.5 per hour over 480 hours. For example during 106 of the 480 hours there were no arrivals. See if a Poisson distribution fits these data. (6) [32] Row x 1 0 2 1 3 2 4 3 5 4 6 5 or more O 106 140 125 106 3 0 480 xO 0 140 250 318 12 0 720 5 252x0721 3/19/07 11. Computer question. a. Turn in your first computer output. Only do b, c and d if you did. (3) b. A researcher believes that bank CEOs are paid more than utility CEOs. A random sample of eight salaries (in thousands) is collected for each industry. What were the null and alternative hypotheses tested? At the 95% confidence level could the researcher state that bank CEOs are paid more than utility CEOs? Why? How would the results be affected if we insist on a 99% confidence level? (2) c. What is the difference between the two hypothesis tests that were done with the salary data? (1) d. (Lee) A manufacturer is afraid that the company is producing slow egg timers. A sample of 12 timers is chosen and the time in seconds that was needed for the timers to run out was recorded. What hypotheses were tested? Can the manufacturer conclude that the timers are slow if a 95% confidence level is used? Why? How would the results be affected if we insist on a 99% confidence level? (2) [40 actually 44] ————— 3/19/2007 7:55:12 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > print c1 c2 Data Display Row 1 2 3 4 5 6 7 8 Banks 755 712 845 985 1300 1143 733 1189 Utilities 620 395 653 1050 1030 528 610 964 MTB > describe c1 c2 Descriptive Statistics: Banks, Utilities Variable Banks Utilities N 8 8 N* 0 0 Variable Banks Utilities Maximum 1300.0 1050.0 Mean 957.8 731.3 SE Mean 81.3 87.9 StDev 230.0 248.6 Minimum 712.0 395.0 Q1 738.5 548.5 Median 915.0 636.5 Q3 1177.5 1013.5 MTB > TwoSample c1 c2; SUBC> Alternative 1. Two-Sample T-Test and CI: Banks, Utilities Two-sample T for Banks vs Utilities SE N Mean StDev Mean Banks 8 958 230 81 Utilities 8 731 249 88 Difference = mu (Banks) - mu (Utilities) Estimate for difference: 226.500 95% lower bound for difference: 14.437 T-Test of difference = 0 (vs >): T-Value = 1.89 P-Value = 0.041 DF = 13 6 252x0721 3/19/07 MTB > TwoSample c1 c2; SUBC> Pooled; SUBC> Alternative 1. Two-Sample T-Test and CI: Banks, Utilities Two-sample T for Banks vs Utilities SE N Mean StDev Mean Banks 8 958 230 81 Utilities 8 731 249 88 Difference = mu (Banks) - mu (Utilities) Estimate for difference: 226.500 95% lower bound for difference: 15.589 T-Test of difference = 0 (vs >): T-Value = 1.89 Both use Pooled StDev = 239.4934 P-Value = 0.040 DF = 14 MTB > print c6 Data Display Seconds 190 199 198 176 180 174 181 183 208 188 198 165 MTB > describe seconds Descriptive Statistics: Seconds Variable Seconds N 12 N* 0 Variable Seconds Maximum 208.00 Mean 186.67 SE Mean 3.60 StDev 12.47 Minimum 165.00 Q1 177.00 Median 185.50 Q3 198.00 MTB > Onet c6; SUBC> Test 180; SUBC> Alternative 1. One-Sample T: Seconds Test of mu = 180 vs > 180 Variable Seconds N 12 Mean 186.667 StDev 12.471 SE Mean 3.600 95% Lower Bound 180.202 The methods were listed in the outline in the following table. Paired Samples Location - Normal distribution. Method D4 Compare means. T 1.85 P 0.046 Independent Samples Methods D1- D3 Location - Distribution not Normal. Compare medians. Method D5b Method D5a Proportions Method D6b Method D6a Variability - Normal distribution. Compare variances. Method D7 7 252x0721 3/19/07 Blank page. 8 252x0721 3/19/07 ECO252 QBA2 SECOND EXAM March 23, 2007 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state clearly what number you are using to personalize data (Your Version number). There is a penalty for failing to include your student number on this page and not stating version number in each section. Please write on only one side of the paper. 1. A bicycle manufacturer wishes to test the proposition that the age of bicycle buyers is older in mountain biking country than in flatter land. In the course of a few hours in Mountain City and Flatland City two sets of customer data are collected - 11 ages in Mountain City and 9 in Flatland City. Personalize the data as follows. The manufacturer’s researcher brings his little brother along. The brother is 10 + x years old, where x is the second to last digit of your student number. The brother puts his age in as a last item in both columns. So now the researcher has one column of 12 ages and another of 10 ages. Example: Ima Badrisk has the number 375290, so the 12th number in the ‘Mtn’ column is 19 as is the 10th number in the ‘Fltlnd’ column. Row 1 2 3 4 5 6 7 8 9 10 11 Mtn 29 38 31 17 36 28 44 9 32 23 35 Fltlnd 11 14 15 12 14 25 14 11 8 a. You are the data analyst and you are fairly clueless. So you compare the ages every way possible. First you compute means and standard deviations for both columns (Show your work!) (3) b. With no good reason to do so, you compare the mean ages assuming a Normal distribution with equal variances (4). You may use a test ratio, a critical value or a confidence interval (2 points extra if you use all three and get the same result each time. c. Now you are not sure that was right and repeat the analysis while dropping the assumption of equal variances. (4 extra credit) d. But you are not really sure that that was right either, so repeat the analysis by comparing medians. (3) [10] e. So now you have three different sets of results and you have to decide which one to present to your boss. To decide whether you should have used the method in b) or in c) you compare variances. (2) f. But since, perhaps, you should have compared medians instead, you use a test to see if the data in Mountain city was Normally distributed. (4). g. So, on the basis of these tests, which method should you have used? Make a decision and present your results. (1) [17] 9 252x0721 3/19/07 2. A corporate president is beginning to worry that his customer representatives are dressing too informally. A sample of 11 representatives are selected and told not to wear a suit the first week and then told to wear a suit the following week. Customers are asked to rate the representatives according to how professionally they were treated, and from their questionnaires, each representative is given a rating. The ratings appear below. Personalize the data as follows. The 10 in the ‘without’ column is an obvious error. ‘Correct’ it by adding the last digit of your student number to it, and make a corresponding correction in the difference column. If your student number ends in zero add 10. Example: Ima Badrisk has the number 375290, so the 11th number in the ‘Without’ column is 20 and the 11 th number in the ‘Difference’ column is 2. a. Test to see if the Reps received significantly higher ratings when wearing suits assuming that the samples come from the Normal distribution. (3) b. Test to see if the Reps received significantly higher ratings when wearing suits without assuming a Normal distribution. (3) c. So, given the source of the data, which of the two is the correct method to use? Why? (1) [24] Row 1 2 3 4 5 6 7 8 9 10 11 Rep A B C D E F G H I J K With 27 23 25 22 25 26 21 25 26 28 22 Without 22 16 25 19 21 24 20 19 23 26 10 Difference 5 7 0 3 4 2 1 6 3 2 12 For your convenience, the following sums have been calculated for the first 10 numbers in each column. With Sum 248 Sum of squares 6194 Without Sum 215 Sum of squares 4709 Difference Sum 33 Sum of squares 153 3. The table below is data that were assembled to see if there is a difference in numbers of children among students of various types of higher education institutions. Samples were taken in community colleges (CC), large universities (LU) and small colleges (SC). Personalize the data by adding the third to last digit of your student number to the 25 in the upper right-hand corner. a. Is the number of children independent of the type of institution? (5) b. Divide each sample into those with children and those without and use a Marascuilo procedure to find the three possible differences between the proportions with kids and tell which pairs have significant differences. (3) Row 1 2 3 4 Number 0 Kids 1 Kid 2 Kids More Kids CC 25 49 31 22 LU 178 141 54 14 SC 31 12 8 6 10 252x0721 3/19/07 c. A sample of customer’s purchases at a dollar store appears below. Personalize the data by adding the third to last digit of your student number to the 7 at the end. If that digit is zero add 10. Calculations of the mean and variance have been done to remind you how to work with frequencies. f is the frequency of the class and x is the class midpoint. Row 1 2 3 4 5 6 7 n Class 0 to under 10 10 to under 20 20 to under 30 30 to under 40 40 to under 50 50 to under 60 60 to under 70 f 129 , x f 6 14 29 38 25 10 7 129 midpoint fx fx2 5 30 150 15 210 3150 25 725 18125 35 1330 46550 45 1125 50625 55 550 30250 65 455 29575 4425 178425 fx 4425 34.3023 n 129 and s 2 fx 2 nx 2 n 1 178425 129 34 .3023 2 128 1359.375. Check it to see if the sample follows represents the Normal distribution. (4) d. The following data should be personalized by adding the third to last digit of your student number divided by 100 to .621. Example: Ima Badrisk has the number 375290, so she adds .02 to .621 and gets .641. The formula for the cumulative function for the continuous uniform distribution between c and d is xc F x . Check to see if these data are uniformly distributed between zero and 1. (3) [40] d c 0.621 0.503 0.203 0.477 0.710 0.581 0.329 0.480 0.554 0.382 11