252x0761 10/29/07 ECO252 QBA2 SECOND EXAM Nov 1-5 2007 Name Class________________________ Student Number_______________ Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. Make diagrams! x ~ N 11, 9 - If you are not using the supplement table, make sure that I know it. 1. Px 12 2. P21 x 21 3. P0 x 10.05 4. x.065 (Do not try to use the t table to get this.) 1 252x0761 10/29/07 II. (5+ points) Do all the following. Look them over first – There is a section III in the in-class exam and the computer problem is at the end. Show your work where appropriate. There is a penalty for not doing Problem 1a. Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. A table identifying methods for comparing 2 samples is at the end of the exam. 3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 4. Use a 5% significance level unless the question says otherwise. 5. Read problems carefully. A problem that looks like a problem on another exam may be quite different. 6. Make sure that you state your null and alternative hypothesis, that I know what method you are using and what the conclusion is when you do a statistical test. 1. (Groebner) We wish to compare the amount of time men and women spend in the supermarket. The two columns below, x1 and x 2 represent two independent samples with 7 shoppers in each sample. You may assume that the parent distributions are Normal. d x1 x 2 Row 1 2 3 4 5 6 7 Men x1 31 41 21 27 31 35 24 Women x2 Difference d 33 33 26 41 33 48 44 -2 8 -5 -14 -2 -13 -20 Minitab computes the following. Variable x1 x2 d N 7 7 7 Mean 30.00 36.86 -6.86 SE Mean 2.55 2.91 StDev 6.76 7.69 Minimum 21.00 26.00 -20.00 Q1 24.00 33.00 -14.00 Median 31.00 33.00 -5.00 Q3 35.00 44.00 -2.00 Maximum 41.00 48.00 8.00 a. Compute the sample variance for the d column – Show your work! (2) b. Is there a significant difference between the variances for men and women? State your hypotheses and your conclusion clearly! (2) c. Test to see there is a difference between the average amount of time men and women shop. (3) d. Using the sample means and standard deviations computed above and changing each sample size from 7 to 100, find an 87% 2-sided confidence interval for the difference between the amount of time men and women shop. Does it indicate a significant difference between men’s and women’s times? Why? (3) [10] 2 252x0761 10/29/07 III. (18+ points) Do as many of the following as you can. (2points each unless noted otherwise). Look them over first – the computer problem is at the end. Show your work where appropriate. .10 Note that if you have a table like this .90 , and if you know one number on the inside of .20 .80 1.00 the table, you can get the rest by subtracting. 1. A professor wishes to see if the variability of scores for people taking the introductory accounting course is different. He takes a sample of the scores of 13 non-accounting students and 10 accounting students and gets the following results: n1 13, n 2 10, s12 210.2 and s 22 36.5 . Though this is a 2-sided test with a 95% confidence level, he can actually do the entire test by comparing s12 9,12 a) against F.05 2 s2 b) c) d) e) f) g) h) s12 s 22 s12 s 22 s12 s 22 s 22 s12 s 22 s12 s 22 s12 s 22 s12 9,12 against F.025 12,9 against F.05 12,9 against F.025 9,12 against F.05 9,12 against F.025 12,9 against F.05 12,9 against F.025 9,12 _______ 2. F.05 150, 250 is, at most, ______. If you did not get this from the Supplementary Tables, you must explain how F.05 you found this. 3 252x0761 10/29/07 Exhibit 1: Sample size Mean Males 12 55000 Females 18 48266.7 Std error 4764.82 Difference between means 6733.3 Std Deviation 11741.29 13577.63 p-value .0743 z 1.4528 p-value .0787 t 1.4221 (Berenson et. al.) A researcher randomly samples 30 graduates of an MBA program and records data on their starting salaries. Let x1 represent the salaries of men and x 2 represent the salaries of women. Sample data is above. 3. The difference in starting salaries is 6733 .3 9758 .4 . a. The difference is statistically significant because 9758.4 is larger than 6733.3 b. The difference is statistically significant because the confidence interval supports a null hypothesis. c. The difference is statistically insignificant because 6733 is smaller than 9758.4 d. The difference is statistically insignificant because the confidence interval would lead us to reject a null hypothesis. 4. If the researcher is trying to show that the starting salaries for men are greater than the starting salary for women, and assuming that the population mean is appropriate to compare salaries, D 1 2 , her null hypothesis should be: a. D 0 b. D 0 c. D 0 d. D 0 e. D 0 f. D 0 g. None of the above 5. If the researcher in exhibit 1 is attempting to show that starting salaries for women are significantly below starting salaries for men, the appropriate test ratio is: a. 4764.83 b. 1.4221 c. 1.4528 d. -1.4221 e. -1.4528 f. .0743 6. If the researcher knows the population standard deviation for men and women are both 11000, the appropriate degrees of freedom for the test are: a. 28 b. Gotten by the formula DF s12 s22 n 1 n2 2 s12 2 n1 n1 1 s 22 2 n2 n2 1 c. 12 (The smaller of 12 and 18) d. 30 e. None of the above. 4 252x0761 10/29/07 7. I am testing the hypothesis H 0 : 300 . I get a value of x 162 , which results in a p-value of .076. What are the p-values for H 0 : 300 and (Note Error!) H 0 : 300 ? a. Both are .076 b. Both are .038 c. The first is .038 and the second is .962 d. The first is .962 and the second is .038 e. Both are .962 f. Not enough information [14] 8. A 2007 survey (Baltic Surveys for the International Republican Institute) of a sample of 1062 opinions in Moldova asked what countries are the greatest social and economic threats to Moldova. The data presented said that 23% (244 people) said that Russia was one of those countries and that 25% (265) that the US was one. a) If 15% (159 people) said that both of the countries are in this threatening group, and country 1 is Russia and country 2 is the US, and we wish to test the statement that the US is significantly less popular than Russia in Moldova, what would our null and alternative hypotheses be? Answer in terms of 1 , 2 and D 1 2 , or 1 and 2 or p1 , p 2 and p p1 p 2 as appropriate. (Example if 1 , 2 and F 1 2 were a reasonable answer, you might get two points for saying H 0 : 1 2 , H 1 : 1 2 and wrongly saying that this becomes H 1 : F 0 and the corresponding H 0 : F 0 ) (3) b) Test this hypothesis using the data above (4) (Note that it is possible to get a) right and still choose the wrong method, if this happens there will be partial credit.) [21] 5 252x0761 10/29/07 9. A survey of 2714 respondents, of whom 55% (1493) were women, conducted by the International Republican Institute in Iraq in 2005, says that 12% of men and 15% of women believed that Iraqi women had sufficient rights, opportunities and protections under the new constitution. Does this show a significant difference between attitudes of men and women? Assume that the men and women are two independent samples. (State and test your null and alternate hypotheses.) (5) [28] 6 252x0761 10/29/07 10. Computer question. a. Turn in your first computer output. Only do b, c and d if you did. (3) b. (Groebner) A manufacturer of parts is afraid that machine 2 is making bolts with larger diameter than machine 1. 100 bolts are made with each machine. The first machine produces bolts with a mean of 0.501 inches and a standard deviation of .025 inches, while the second produces bolts with a mean of .509 inches and a standard deviation of .034 inches. Does the second machine need adjustment? What are our null and alternative hypotheses? (1) c. Is the appropriate computer run A or B below? Why? (1) d. What is our conclusion – do we reject the null hypothesis using a 5% significance level? Why? Does this mean that we should adjust machine 2? (2) e. On the basis of run C, could we have made things easier by assuming equal variances? Why? (1) [36] MTB > TwoT 100 0.5010 .025 100 0.5090 .034; SUBC> Alternative -1. A)Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 100 0.5010 0.0250 0.0025 2 100 0.5090 0.0340 0.0034 Difference = mu (1) - mu (2) Estimate for difference: -0.008000 95% upper bound for difference: -0.001023 T-Test of difference = 0 (vs <): T-Value = -1.90 P-Value = 0.030 DF = 181 P-Value = 0.970 DF = 181 MTB > TwoT 100 0.5010 .025 100 0.5090 .034; SUBC> Alternative 1. B)Two-Sample T-Test and CI Sample N Mean StDev SE Mean 1 100 0.5010 0.0250 0.0025 2 100 0.5090 0.0340 0.0034 Difference = mu (1) - mu (2) Estimate for difference: -0.008000 95% lower bound for difference: -0.014977 T-Test of difference = 0 (vs >): T-Value = -1.90 MTB > VarTest 100 .000625 100 .001156. C) Test for Equal Variances 95% Bonferroni confidence intervals for standard deviations Sample N Lower StDev Upper 1 100 0.0215543 0.025 0.0296942 2 100 0.0293139 0.034 0.0403842 F-Test (normal distribution) Test statistic = 0.54, p-value = 0.002 7 252x0761 10/29/07 Blank 8 252x0761 10/29/07 ECO252 QBA2 SECOND EXAM March 23, 2007 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ Class hours registered and attended (if different):_________________________ IV. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points). In each section state clearly what number you are using to personalize data. There is a penalty for failing to include your student number on this page, not stating version number in each section and not including class hour somewhere. Please write on only one side of the paper. You must do 3a (penalty). 1. (Groebner, et. al.) Your company produces hair driers for retailers to sell as house brands. A design change will produce considerable savings, but the new design will not be adopted unless it is more reliable. For a sample of 250 hair driers with the old design, 75 failed in a simulated 1-year period. For a sample of 250 driers with the new design 50 a fail in a simulated one year period, where a is the second-to-last digit of your student number. Use a 90% confidence level. Make sure that I know what value you are using for a . a) Can we say that the proportion of the redesigned driers that fail is significantly lower than that of the driers with the current design? (3) b) Do a 95% two-sided confidence interval for the difference between the two proportions. (1) c) After you have implemented your decision on using the new design, a newly-hired engineer recommends another design change (the newest design) that she claims will decrease the proportion that fail even further. For a sample of 100 driers, 18 fail in a simulated one-year period. Do a test of the equality of the three proportions, again using a 90% confidence level. (4) d) Follow your results in c) with a Marascuilo procedure for finding pairwise differences between the proportions for the three designs. Assuming that there is no cost-saving in going to the newest design, would you recommend going to it? Write a paragraph long report on your conclusions from the two hypothesis tests and what decisions these implied. (4) [12] 2. (Groebner et al) A ripsaw is cutting lumber into narrow strips and should be set to produce a product whose width differs from the width specified by an amount given by a Normal distribution with a mean of zero and a standard deviation of 0.01 inch. Because we have been getting complaints about the uniformness of our product, we wish to verify the Normal distribution specified is correct. We cut 600 b pieces (where b is the last digit of your student number. Our results are as follows. Deviation from Number of pieces specified width Below -0.02 0 -0.02 to -0.01 84 -0.01 to 0 266 0 to 0.01 150 + b 0.01 to 0.02 94 0.02 and above 6 a) To use a chi-squared procedure to check the distribution, find the values of E (3) b) State and test the null hypothesis. (2) c) We have learned another procedure that can be used to test for a Normal distribution when the parameters are given. Use it now to verify your results. Can you say that the saw is working as advertised? (4) [21] 9 252x0761 10/29/07 3. (Groebner et al.) Two groups of 16 individuals were asked to do their income taxes using two tax preparation software packages. The data is below (in number of minutes required) and may be considered two independent random samples. To personalize the data add the last digit of your student number to every number in the TT00 column. Use 10 if your number ends in 0. Label the column clearly as TT1, TT2 through TT10 according to the number used. Let d TTa TC . Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 TT00 65 51 74 89 88 96 37 66 86 54 60 45 42 55 58 38 TC 88 71 89 66 78 64 74 99 79 68 93 93 86 86 81 83 Minitab has given us the following results Variable TC TT1 TT2 TT3 TT4 TT5 TT6 TT7 TT8 TT9 TT10 N 16 16 16 16 16 16 16 16 16 16 16 N* 0 0 0 0 0 0 0 0 0 0 0 Mean 81.13 63.75 64.75 65.75 66.75 67.75 68.75 69.75 70.75 71.75 72.75 SE Mean 2.60 4.76 4.76 4.76 4.76 4.76 4.76 4.76 4.76 4.76 4.76 StDev 10.40 19.05 19.05 19.05 19.05 19.05 19.05 19.05 19.05 19.05 19.05 Minimum 64.00 38.00 39.00 40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 Q1 71.75 47.50 48.50 49.50 50.50 51.50 52.50 53.50 54.50 55.50 56.50 Median 82.00 60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00 Q3 88.75 84.00 85.00 86.00 87.00 88.00 89.00 90.00 91.00 92.00 93.00 Maximum 99.00 97.00 98.00 99.00 100.00 101.00 102.00 103.00 104.00 105.00 106.00 a) Find the mean and standard deviation of d . (1) Assume the Normal distribution in b), c), and e). b) Find out if there is a significant difference between the mean times for the two packages, using a test ratio, a critical value or a confidence interval.(4) (2 extra points if you use all three methods and get the same results on all three, 3 extra (extra) points if you do not assume equal variances) c) Test the variances of the two samples for equality on the assumption that they come from the Normal distribution. (2) d) Test the d column to see if the data was Normally distributed (5) e) Actually the data above was taken from only 16 people, who were randomly assigned to use one of the methods first. Would this mean that what you did above was correct? If not do b) over again. (3) f) In view of the fact that the data was taken from only 16 people and dropping the assumption of Normality, find out if there is a significant difference between the medians of the two packages. (3) g) If you did d) e) and f), can you report on your results, indicating which result was correct? (1) [40] Be prepared to turn in your Minitab output for the first computer problem and to answer the questions on the problem sheet about it or a similar problem. 10 252x0761 10/29/07 4. (Extra Credit) Check your work on Minitab. Remind me that you did extra credit on your front page. For a Chi-squared test of Independence or Homogeneity, put your observed data in adjoining columns. Use the Stat pull-down menu. Choose Tables and then Chi Squared Test. Your output will show O and E as a single table. You will be given a p-value for the hypothesis of Independence or Homogeneity. For a test of Normality, when sample mean and variance are to be computed from the sample, put your complete set of numbers in one column. . Use the Stat pull-down menu. Choose Basic Statistics and then Normality test. Check Kolmogorov-Smirnov to get a Lilliefors test. You will be given a p-value for the hypothesis of Normality. For a Chi-squared test of goodness of fit, put your observed data in C1 and your expected data or frequencies in C2. The expected data may be proportions adding to 1 or counts adding to n . Use the Stat pull-down menu. Choose Tables and then Chi Squared Test of Goodness of Fit. Pick specific proportions or historic counts. Observed counts is C1 and the other column requested will be C2. The computed degrees of freedom will have to be reduced if you computed any statistics from the data before setting up the expected count or frequency. You are warned not to use expected counts below 5. For a test of Two Proportions, Use the Stat pull-down menu. Choose Basic Statistics and then Two Proportions. Check Summarized Data and then enter n1 , x1 , n 2 and x 2 . Use Options to set the inequality in the alternate hypotheses and check Pooled Estimate unless you are doing a confidence interval. To fake computation of a sample variance or standard deviation of the data in column c1 using column c2 for the squares, MTB MTB MTB MTB MTB MTB > > > > > > let C2 = C1*C1 name k1 'sum' name k2 'sumsq' let k1 = sum(c1) let k2 = sum(c2) print k1 k2 Data Display sum sumsq MTB MTB MTB MTB > > > > 3047.24 468657 * performs multiplication ** would do a power, but multiplication is more accurate. This is equivalent to let k2 = ssq(c1) This is a progress report for my data set. name k1 'meanx' let k1 = k1/count(c1) /means division. Count gives n. let k2 = k2 - (count(c1))*k1*k1 print k1 k2 Data Display meanx sumsq 152.362 4372.53 MTB > name k2 'varx' MTB > let k2 = k2/((count(c1))-1) MTB > print k1 k2 Data Display meanx varx 152.362 230.133 MTB > name k2 'stdevx' MTB > let k2 = sqrt(k2) MTB > print k1 k2 Sqrt gives a square root. Data Display meanx stdevx 152.362 15.1701 Print C1, C2 11 252x0761 10/29/07 To check your mean and standard deviation, use ` MTB > describe C1 To check for equal variances for data in C1 and C2, use MTB > VarTest c1 c2; SUBC> Unstacked. Both an F test and a Levine test will be run. To put a items in column C1 in order in column C2, use MTB > Sort c1 c2; SUBC> By c1. Commands like Count C1, Sum C1 and SSq C1 can be used alone without Let if the values don’t need to be stored. In the above I have continuously named and renamed the constants k1 and k2. There are many constants in Minitab on an invisible worksheet. (k1 …….k100 at least), so you can preserve your results by using separate locations for subsequent computations. 12