252y0521 4/07/05 (Page layout view!) ECO252 QBA2 Name KEY SECOND HOUR EXAM Hour of Class Registered April 4, 2005 Circle 9am 10am Show your work! Make Diagrams! Exam is normed on 50 points. Answers without reasons are not usually acceptable. I. (8 points) Do all the following. Make diagrams! x ~ N 21, 6 - If you are not using the supplement table, make sure that I know it. 21 21 0 21 z P 3.50 z 0 .4998 1. P0 x 21 .00 P 6 6 Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the area between -3.50 and 0 and note that it begins or ends at zero, so that you can just look up a single number on the table. N ormal Curv e with Mean 21 and Standard Dev iation N 6orm al Curv e with Mean 0 and Standard Dev iation 1 The Area Between 0 and 21 is 0.4998 The Area Between -3.5 and 0 is 0.4998 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 0 10 20 Da ta A x is 30 0.0 40 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 10 .22 21 2. Px 10 .22 P z Pz 1.80 Pz 0 P1.80 z 0 .5 .4641 .0359 6 Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the entire area below -1.80, and note that it is on one side of the mean, so that you subtract the area between -1.80 and zero from the entire area below zero. Normal Curv e with Mean 21 and Standard Dev iation N 6ormal Curv e with Mean 0 and Standard Dev iation 1 The Area to the Left of 10.22 is 0.0362 The Area to the Left of -1.8 is 0.0359 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 0 10 20 Da ta A x is 30 40 0.0 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 252y0521 4/07/05 (Page layout view!) 30 .4 21 7 21 z 3. P7.00 x 30 .4 P P 2.33 z 1.57 6 6 P2.33 z 0 P0 z 1.57 .4901 .4418 .9319 Make a diagram! For z draw a Normal curve with a vertical line at zero in the middle. Shade the area between -2.33 and 1.57 and note that it is on both sides of the mean, so that you add the area between -2.33 to the area between zero and 1.57. N ormal Curv e with Mean 21 and Standard Dev iation N 6orm al Curv e with Mean 0 and Standard Dev iation 1 The Area Between 7 and 30.4 is 0.9316 The Area Between -2.33 and 1.57 is 0.9319 0.07 0.4 0.06 0.3 0.04 Density Density 0.05 0.03 0.2 0.02 0.1 0.01 0.00 0 10 20 Da ta A x is 30 0.0 40 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 x.035 First we must find z 035 . This is the value of z that has Pz z .035 .035 or P0 z z .035 .5 .035 .4650 . On the Normal table, the closest we can find to .4650 is P0 z 1.81 .4649 . So z .035 1.81 and x.035 z .035 21 1.816 31.86. 4. 31 .86 21 Check: Px 31 .86 P z Pz 1.81 .5 .4649 .0351 .035 6 Make a diagram! For z draw a Normal curve with zero in the middle. Divide the area above zero into 3.5% above z 035 and 50% - 3.5% = 46.5% below z 035 . Normal Curv e with Mean 0 and Standard Dev iation N 1ormal Curv e with Mean 21 and Standard Dev iation 6 The Area to the Right of 1.81 is 0.0351 The Area to the Right of 31.86 is 0.0351 0.07 0.4 0.06 0.05 Density Density 0.3 0.2 0.04 0.03 0.02 0.1 0.01 0.0 -5.0 -2.5 0.0 Da ta A x is 2.5 5.0 0.00 0 10 20 Da ta A x is 30 40 2 252y0521 4/07/05 (Page layout view!) How the graphs were made. The program (macro) that I wrote for this is called Normarea5A and it is called by using the usual way of calling a macro in Minitab by using its name with a % in front of it. To enable Minitab to find Normarea5A, a nonsense worksheet called ‘notmuch’ is placed in the same file as NormArea5A and loaded first. It does not affect the results. The dialog below creates the first graph. ————— 4/6/2005 10:59:45 PM ———————————————————— Welcome to Minitab, press F1 for help. MTB > WOpen "C:\Documents and Settings\rbove\My Documents\Minitab\notmuch.MTW". Retrieving worksheet from file: 'C:\Documents and Settings\rbove\My Documents\Minitab\notmuch.MTW' Worksheet was saved on Fri Jan 21 2005 Results for: notmuch.MTW MTB > %normarea5a Executing from file: normarea5a.MAC Graphic display of normal curve areas Finds and displays areas to the left or right of a given value or between two values. (This macro uses C100-C116 and K100-K116) Enter the mean and standard deviation of the normal curve. DATA> 21 DATA> 6 Do you want the area to the left of a value? (Y or N) n Do you want the area to the right of a value? (Y or N) n Enter the two values for which you want the area between. DATA> 0 DATA> 21 ...working... Normal Curve Area 3 252y0521 4/07/05 (Page layout view!) II. (24+ points) Do all the following? (2points each unless noted otherwise). Note the following: 1. This test is normed on 50 points, but there are more points possible including the take-home. You are unlikely to finish the exam and might want to skip some questions. 2. A table identifying methods for comparing 2 samples is at the end of the exam. 3. If you answer ‘None of the above’ in any question, you should provide an alternative answer and explain why. You may receive credit for this even if you are wrong. 4. Many of you are still wasting both our time by making statements without statistical tests to back them up. 1. Computer Problem(Bassett et. al.) 400 children were divided into two groups. The first and larger group were taught Mathematics by traditional methods. The second group was taught by an experimental method. Test scores were recorded and are available. The computer analysis of the data is shown in the three tests below. The First Test was done using method 2. The researcher was reprimanded by her supervisor for assuming that the population variances were equal, so she ran the Second Test without assuming equal variances. Because she was very annoyed at her supervisor she ran the third test for equal variances. The output is below. Assume a significance level of 1%. Do not do any unnecessary computations. #First Test:: MTB > TwoSample c1 c2; SUBC> Pooled; SUBC> Alternative -1. Two-Sample T-Test and CI: x1, x2 Two-sample T for x1 vs x2 N Mean StDev SE Mean x1 250 68.45 7.96 0.50 x2 150 70.62 7.06 0.58 Difference = mu (x1) - mu (x2) Estimate for difference: -2.17521 95% upper bound for difference: -0.87525 T-Test of difference = 0 (vs <): T-Value = -2.76 P-Value = 0.003 DF = 398 P-Value = 0.002 DF = 343 #Second Test: MTB > TwoSample c1 c2; SUBC> Alternative -1. Two-Sample T-Test and CI: x1, x2 Two-sample T for x1 vs x2 x1 x2 N 250 150 Mean 68.45 70.62 StDev 7.96 7.06 SE Mean 0.50 0.58 Difference = mu (x1) - mu (x2) Estimate for difference: -2.17521 95% upper bound for difference: -0.91322 T-Test of difference = 0 (vs <): T-Value = -2.84 #Third Test: MTB > VarTest c1 c2; SUBC> Unstacked. Test for Equal Variances: x1, x2 F-Test (normal distribution) Test statistic = 1.27, p-value = 0.107 4 252y0521 4/07/05 (Page layout view!) a) Turn in your first computer assignment (2) b) Look only at the first test. (i) What are the null and alternative hypotheses? (1) (ii) Can we conclude that the experimental method is better? What are the numbers in the output that bring you to this conclusion? (2) c) Make a drawing of an (almost) Normal curve. Label the center of the curve with a zero and show the area under the curve that is the p-value. (1) Easiest question on the exam! Note: (From the outline) A p-value is a measure of the credibility of the null hypothesis and is defined lower low as the probability that a test statistic or ratio as extreme as or more extreme than the observed high higher statistic or ratio could occur, assuming that the null hypothesis is true. This means that if we have a calculated a t - ratio with a value of t calc , and we have a left-sided test, pvalue Pt t calc . It we have a right sided test , pvalue Pt t calc . If we have a 2-sided test pvalue 2Pt t calc or pvalue 2Pt t calc , whichever is smaller. So, for a one-sided test make a diagram of the t distribution with a mean of zero, find the value of t calc and shade the appropriate side of t calc . For a 2sided test, find both t calc and t calc and shaded the tail above whichever is positive and below whichever is negative. Solution: The important line here is T-Test of difference = 0 (vs <): T-Value = -2.76 P-Value = 0.003 DF = 398 This tells us that the alternative hypothesis is H 1 : 1 2 , that t 2.76 and that pvalue .003 . So we can say (i) The null and alternative hypotheses are H 0 : 1 2 (the opposite of our alternate hypothesis and H 1 : 1 2 . (ii) We can we conclude that the experimental method is better because we reject the null hypothesis at the 1% significance level because the p-value is 0.3% and is below 1%. To make the diagram draw a curve with a mean and a vertical line at zero and shade the area under the curve below -2.76. To get this area using Minitab I used my program tareaA as below. But since the degrees of freedom are so large, why not substitute z for t and note that Pz 2.76 .5 .4971 .0029 ? MTB > %tareaA Executing from file: tareaA.MAC Graphic display of t curve areas Finds and displays areas to the left or right of a given value or between two values. (This macro uses C100-C116 and K100-K120) Enter the degrees of freedom. DATA> 398 Do you want the area to the left of a value? (Y or N) y Enter the value for which you want the area to the left. DATA> -2.76 ...working... t Curve Area 5 252y0521 4/07/05 (Page layout view!) t Curve with 398 Degrees of Freedom and Standard Deviation 1.00252 The Area to the Left of -2.76 is 0.0030 0.4 Density 0.3 0.2 0.1 0.0 -4 -3 -2 -1 0 Data A xis 1 2 3 4 Data Display mode 0 median 0 d) Look at the results of the second test. Do they look different to you from the results of the first test? Why? (2) Solution: If we look at the same line as the first test we see the following: T-Test of difference = 0 (vs <): T-Value = -2.84 P-Value = 0.002 DF = 343 The value of t hasn’t changed much and we still have a tiny p-value. It doesn’t look different to me, but if you expressed a good reason to disagree, fine! e) Look at the results of the third test? What do you think were the null and alternative hypotheses? Was her supervisor right that she should not have assumed equal variances? Why(2) [8] Solution: If we look at the output of the third test we find the following: Test for Equal Variances: x1, x2 F-Test (normal distribution) Test statistic = 1.27, p-value = 0.107 If it is a test for equal variances and there is no indication of a 1-sided test, it should have the hypotheses H 0 : 12 22 and H1 : 12 22 . The p-value is well above the significance level of 1%, so there is no reason we can reject the assumption of equal variances. Questions 2-5 refer to Exhibit 1. Exhibit 1: (Schiffler, Adams) A new feed is supposedly superior to what you have used in the past to feed your pigs. You divide your pigs into two troops of 60 pigs each. After one month the results are as follows. You want to decide if the new feed is actually better. Assume that the sample data comes from two Normal populations with equal population variance. Old Feed x1 175 .9 , s1 12 , n1 60 New Feed x 2 180 .2 , s 2 19 , n 2 60 2. Note: d 175 .9 180 .2 4.3 What is the alternative hypothesis? a) 1 2 1 2 c) 1 2 d) * 1 2 b) e) 3. None of the above. What is s d ? (2 – 3 if I have evidence that you got it correctly) 6 252y0521 4/07/05 (Page layout view!) a) 0.5 b) 0.7 c) 1.5 d) *2.9 e) 8.4 Explanation: From Table 5 of the syllabus supplement: Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * D d t 2 s d d cv D0 t 2 s d d D0 t between Two H 1 : D D0 , sd 1 1 Means ( sd s p D n 1s12 n2 1s22 1 2 n1 n2 unknown, sˆ 2p 1 n1 n2 2 * Same as variances H : assumed equal) 0 1 2 DF n1 n2 2 H 1 : 1 2 if D0 0. n 1s12 n2 1s 22 5912 2 5919 2 144 361 252 .5 s p2 1 118 2 n1 n 2 2 1 1 2 252 .5 8.41 2.9 s d s p2 60 n1 n 2 4. If we do not reject the Null hypothesis, do we decide that there is a reason to switch to the new feed? (1) Solution: If the alternate hypothesis says 1 2 , it says that the new piggy porridge is better than the old one. If we do not reject the null hypothesis, we can’t say that the new sow slop is better and we cannot justify a switch. 5. Change your assumptions to assume that both samples have a population standard deviation of 15. Find a 93% two-sided confidence interval for the difference between the means. (3) Is there a significant difference between the means? Why? (1) Solution: From Table 5 of the syllabus supplement: Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * d cv D0 z d D d z 2 d d D0 z between Two H : D D , d 1 0 Means ( 12 22 D 1 2 d known) n1 n2 d x1 x 2 d 12 n1 22 n2 15 12 15 22 60 60 2225 7.5 2.739 . .07 . From the last page: 60 d 175 .9 180 .2 4.3 . From the second page of this solution, z .035 1.81 . So D d z 2 d 4.3 1.812.739 4.3 4.96 or -9.26 to 0.66. Since the interval includes zero (or because the value of the error part of the expression exceeds the absolute value of the difference between the sample means) the difference is not statistically significant at the 93% confidence level. [17] Questions 6 and 7 refer to Exhibit 2. 7 252y0521 4/07/05 (Page layout view!) Exhibit 2: (Lees) The net income figures for seven regions in which Smelly-Welly Dirt Devourer is sold are given before and after a reorganization. Region Before After Difference Reorganization Reorganization 1 40 62 -22 2 35 49 -14 3 42 39 3 4 30 28 2 5 55 55 0 6 63 66 -3 7 36 40 -4 The researcher decides that the Wilcoxon Signed Rank test for paired samples is appropriate. The region with a tie is dropped from consideration leaving 6 pairs for this test only. The test is one sided. Minitab gives the following sample statistics for the data for use in Problem 7. (Use .05 ) Description Variable n Mean SE Mean StDev Before 7 43.00 4.46 11.80 x1 After 7 48.43 5.15 13.62 x2 d Difference 7 -5.43 3.49 9.24 Comment: Note that you were told that the data was paired. 6. In the Wilcoxon Signed Rank Test, the number that you compare to the values in the Wilcoxon Signed Rank Test Table is (3) a. 3 b. *3.5 c. 17 d. 17.5 e. 18 f. 18.5 g. None of the above – write in the correct number and show your work. [20] Explanation: The original data is repeated with ranks of the differences r and corrected ranks r * . Region Before After Difference r r* D 1 40 62 -22 22 6 6 2 35 49 -14 14 5 5 3 42 39 3 3 3 2.5+ 4 30 28 2 2 1 1 + 5 55 55 0 -6 63 66 -3 3 2 2.57 36 40 -4 4 4 4 The sum of the positive ranks is 3.5 and the sum of the negative ranks is 17.5. We can check 67 21 3.5 17 .5. The smaller of this by recalling that the sum of the first 6 numbers is 2 the two rank sums is 3.5. (If we compare this number with Table 7, we find that for a onesided 5% test the critical value is 2. Since 3.5 is above it, we cannot reject the null hypothesis of equal medians.) 7. If we change our assumptions to state that the underlying distribution is Normal, we should not be using the Wilcoxon Signed Rank Test. If we use a test based on the mean we have all of the following: a. Confidence Interval: D d t 2 s d . 8 252y0521 4/07/05 (Page layout view!) b. Test Ratio: t d D0 sd c. Critical Value: d cv D0 t 2 s d . On the basis of the information above, find d , s d and the numbers of degrees of freedom. If you do any calculations make sure that I know what they are. (4) [24] Solution: From Table 5 of the syllabus supplement: Since we are dealing with paired data, the relevant line part of the table is stated below. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference H 0 : D D0 * D d t 2 s d d cv D0 t 2 s d d D0 t between Two H 1 : D D0 , sd d x1 x 2 Means (paired D 1 2 s data.) df n 1 where sd d n n1 n 2 n Since we are dealing with paired data, the relevant line part of the output is stated below. Description Variable d Difference n 7 Mean -5.43 SE Mean StDev 3.49 9.24 If we just read this we find d 43 48 .43 5.43, s d sd 9.24 3.49 and df n 1 6. n 7 Of course, if you are compulsive enough to finish the problem, d D0 - 5.43 - 0 6 t 1.557 1.943 t .05 so, apparently, we do not reject the null sd 3.49 hypothesis, which was never stated. 8. A marble machine is recalibrated, and the owner is afraid that it is producing marbles that are too small. The standard size is 12mm. The following results pop out after 105 diameters are fed to the computer.. One-Sample T: x1 Test of mu = 12 vs < 12 Variable x1 N 105 Mean 12.0150 StDev 0.0498 95% Upper SE Mean Bound T P 0.0049 12.0231 3.09 0.999 Make a diagram showing p-value. Suppose that you were doing a 2-sided test with the same numbers, what would the p-value be.?(Credit raised to 3) [27] Solution: a) The output tells us two things: 1) Because the alternative hypothesis is 12 this is a left-sided test and 2) t 3.09. To make the diagram draw an almost Normal curve with a center and vertical line at zero and shade the entire area below 3.09. pvalue Pt 3.09 .999 . 9 252y0521 4/07/05 (Page layout view!) t Curve with 104 Degrees of Freedom and Standard Deviation 1.00976 The Area to the Left of 3.09 is 0.9987 0.4 Density 0.3 0.2 0.1 0.0 -4 -3 -2 -1 0 Data A xis 1 2 3 4 b) If we want a two-sided test of H 0 : 12 vs. H 1 : 12 , the p-value is defined as the probability of getting a result as extreme as or more extreme than our actual results. pvalue 2Pt 3.09 2 1 Pt 3.09 2 1 .999 2 .001 .002 9. (Lee) The number of people calling in sick during a certain week is below. M 42 T 33 W 35 R 25 F 45 The null hypothesis is that people are equally likely to call in sick on each day. This is a chi-square test of (1) a. Homogeneity b. Independence c. *Uniformity d. Normal distribution e. Poisson Distribution Explanation: This is the simplest of chi-squared goodness-of-fit tests. Uniformity is the assumption that every class is of equal size. 10. (Lee) The number of people calling in sick during a certain week is below. M 42 T 33 W 35 R 25 F 45 The null hypothesis is that people are equally likely to call in sick on each day. What is the chi-square value? (3) a. 11.9023 b. *6.889 c. 3.296 d. 4.198 e. 7.895 Solution: The sum of the numbers above is n 180 . Uniformity means that each of the r 5 classes 1 1 is .2 of the data. We thus have the following: r 5 Row O 1 42 2 33 3 35 4 25 5 45 Sum 180 So 2 E 36 36 36 36 36 180 -6 3 1 11 -9 0 E O E E O2 E O 36 9 1 121 81 E O 2 1.00000 0.25000 0.02778 3.36111 2.25000 6.88889 E or O2 E 49.0000 30.2500 34.0278 17.3611 56.2500 186.8889 O2 n 6.8889 . E 11. (Lee) The number of people calling in sick during a certain week is below. M 42 T 33 W 35 R 25 F 45 10 252y0521 4/07/05 (Page layout view!) The null hypothesis is that people are equally likely to call in sick on each day. Do we reject the null hypothesis at a 5% significance level? No answer will be accepted without a reason. Solution: Df r 1 5 1 4 and from Table 1 we get 2 4 .05 9.4877 . Since our computed 6.8889 is smaller than the table value, we do not reject the null hypothesis. 2 12. (Lee) The number of people calling in sick during a certain week is below. M 42 T 33 W 35 R 25 F 45 The null hypothesis is that people are equally likely to call in sick on each day. Can we do this by another method than Chi Squared? Do it! (4) [36] Solution: This is s Kolmogorov- Smirnoff Test. H 0 : Uniform , .05 Day E n Fe O M T W R F .2 .2 .2 .2 .2 1.0 .20 .40 .60 .80 1.00 42 33 35 25 45 180 cum O Fo D 42 75 110 135 180 .2333 .4167 .6111 .7500 1.0000 .0333 .0167 .0111 .0500 .0000 Fe comes from adding .2s. Fo comes from dividing the cumulative O by n 180 . max D , the maximum difference, is .5000. The Kolmogorov-Smirnov table of critical values gives us , for n 35 .20 .10 .05 .01 CV 1.07 1.22 1.36 1.63 n n n n If we substitute n 180 , we get the table below. .20 .10 .05 .01 CV .0797 .0909 .1014 .1214 Since max D is smaller than any of the critical values, we conclude that p value .20 . For .05 , we cannot reject H 0 . 11 252y0521 4/07/05 (Page layout view!) Location - Normal distribution. Compare means. Location - Distribution not Normal. Compare medians. Paired Samples Method D4 Independent Samples Methods D1- D3 Method D5b Method D5a Proportions Method D6 Variability - Normal distribution. Compare variances. Method D7 12 252y0521 4/07/05 (Page layout view!) ECO252 QBA2 SECOND EXAM April 4, 2005 TAKE HOME SECTION Name: _________________________ Student Number: _________________________ III. Neatness Counts! Show your work! Always state your hypotheses and conclusions clearly. (19+ points) 1) An industrial plant is trying to figure out whether gas or electric fuel is cheaper per delivered quadrillion btus. Random samples of 11 electricity-using plants and 16 gas using plants are taken. The results appear below. The columns labeled x1 and x2 are the data and the rx1 and rx2 columns are the ranks of the numbers within their own column, which you may find useful. Row x1 x2 rx1 rx2 1 45.14 9.55 10 2 2 10.11 38.76 2 15 3 29.38 16.65 7 9 4 19.65 19.00 5 12 5 16.25 17.00 4 10 6 29.46 29.01 8 13 7 8.13 12.34 1 6 8 45.63 11.18 11 4 9 24.49 12.15 6 5 10 12.71 14.40 3 7 11 37.04 8.00 9 1 12 16.19 8 13 33.46 14 14 18.37 11 15 9.86 3 Minitab gives the following information, which also may help you. Descriptive Statistics: x1, x2 Variable N N* Mean SE Mean StDev Minimum Q1 x1 11 0 25.27 4.02 13.33 8.13 12.71 x2 15 0 17.73 2.35 9.11 8.00 11.18 Median 24.49 16.19 Q3 37.04 19.00 Maximum 45.63 38.7 Before you start, personalize the data as follows: If the second to last number of your student number is 0-4 add it to the second to last digit of the numbers in x1. If the second to last number of your student number is 5-9, divide it by 100 and subtract it from x1. (If the number is 2, the first two numbers become 45.14 + .20 = 45.34 and 10.21.) (If the number is 6, the first two numbers become 45.14 - .06 = 45.08 and 10.05.) Use .10 a) Compute a (mean and) standard deviation for the electric plants, show your work. Excessive rounding will be penalized throughout this exam. (1) b) Test to see if x1 is Normally distributed. (3) c) Test to see if the standard deviations of the two samples are equal. (1) d) Test to see if the means of the two samples differ significantly on the assumption that your answers to b) and c) showed equal variances and Normal distributions. Use a test ratio, critical value or a confidence interval (4) or all three (6). Your answers to all three should be almost identical. [9] e) Assume that the tests showed unequal variances but Normal distributions, repeat the test (4 extra credit) f) Assume that the tests showed that the distributions were not Normal, repeat the test. (4) [13] Solution: To save time, I will use the original numbers. It won’t change your results much. a) Compute a (mean and) standard deviation for the electric plants, show your work. Excessive rounding will be penalized throughout this exam. (1) From the computations on the next x1 277 .99 and x12 8802 .55 . On the next page we have n1 11, So x1 x 1 n1 277 .99 25 .2718 , s12 11 x12 nx12 n1 1 8802 .55 1125 .2718 2 10 13 252y0521 4/07/05 (Page layout view!) 1777 .24 177 .724 and 177.724 13.3313 10 b) Test to see if x1 is Normally distributed. (4) Assume .10 H0 : Normal The only practical method is the Lilliefors method. The numbers must be in order before we begin computing cumulative probabilities! From above, xx remember that x 25.2718 and s 13 .3333 . We compute z . (This is really a t .) Fe is the s cumulative distribution, gotten from the Normal table by adding or subtracting 0.5. Fo comes from the fact that there are 11 numbers, so that each number is one eleventh of the distribution. For .05 and n 10 the critical value from the Lilliefors table is 0.230. Since the largest deviation here is .1179, we do not reject H 0 . Row x 1 8.13 2 10.11 3 12.71 4 16.25 5 19.65 6 24.49 7 29.38 8 29.46 9 37.04 10 45.14 11 45.63 Sum 277.99 x2 z 66.10 102.21 161.54 264.06 386.12 599.76 863.18 867.89 1371.96 2037.62 2082.10 8802.55 -1.29 -1.14 -0.94 -0.68 -0.42 -0.06 0.31 0.31 0.88 1.49 1.53 O 1 1 1 1 1 1 1 1 1 1 1 11 Ocum 1 2 3 4 5 6 7 8 9 10 11 Fo 0.09091 0.18182 0.27273 0.36364 0.45455 0.54545 0.63636 0.72727 0.81818 0.90909 1.00000 Fe 0.099251 0.127705 0.173025 0.249286 0.336622 0.476617 0.621020 0.623301 0.811314 0.931932 0.936631 D 0.008342 0.054114 0.099702 0.114351 0.117924 0.068837 0.015344 0.103972 0.006868 0.022842 0.063369 c) Test to see if the standard deviations of the two samples are equal. (2) The easiest part of the takehome. Interval for Confidence Hypotheses Test Ratio Critical Value Interval Ratio of Variances 22 s22 DF , DF s12 H0 : 12 22 DF1 , DF2 2 F.5 1.5 2 F 2 2 1 s1 s 22 H1 : 12 22 1 1 , DF2 F1DF 2 DF1 n1 1 and FDF1 , DF2 2 DF2 n 2 1 s 22 DF2 , DF1 F 2 s12 .5 .5 2 or 1 2 From our computations of the variance s12 177.724 . From the computer printout s 22 9.112 82 .992 , n1 11 and n2 15 . s 2 177 .724 14,10 2.86 . We F 14,10 22 2.141 for a two sided test compare this to F n1 1, n2 1 F.05 2 82 . 992 s1 should also compare F 10,14 s12 s 22 1 10,14 2.60 against F.05 , but the test ratio is below 1 and 2.141 cannot possibly be above the critical value. Since both ratios are below their critical values, we cannot reject the null hypothesis. We can also do this by a confidence interval. In “Confidence Limits and Hypothesis testing for Variances,” in the syllabus supplement, the formula given is s12 12 s12 ( n2 1, n1 1) s12 12 s12 (14,10) 1 1 F , which becomes or 10,14 2 s 2 F.05 s 22 F.05 s 22 Fn1 1, n2 1 22 s 22 2 2 2 2 14 252y0521 4/07/05 (Page layout view!) 2 2 1 1 1 12 2.86 and, finally 0.337 12 2.597 . This interval includes one, so we 1.141 2.60 2 1.141 2 cannot reject the null hypothesis. d) Test to see if the means of the two samples differ significantly on the assumption that your answers to b) and c) showed equal variances and Normal distributions. Use a test ratio, critical value or a confidence interval (4) or all three (6). Your answers to all three should be almost identical.[9] From our computations of the variance n1 11, x1 25.2718 , and s12 177.724 . From the computer printout n 2 15, x 2 17 .73, and s 22 9.112 82 .992 . H 0 : 1 2 , H1 : 1 2 , d x1 x 2 25.27 17.73 7.54 and .10 . If we assume that n 1s12 n2 1s 22 10177 .724 1482.992 the variances are equal s p2 1 122 .464 , so that n1 n 2 2 24 1 1 11 1 1 15 122 .464 122 .464 s d2 sˆ 2p 122 .464 0.157575 19 .297 and n n 11 15 165 165 2 1 d D0 7.54 1 1 19.297 4.32928 . t 1.716 and s d s p2 sd 4.32928 n1 n 2 df n1 n 2 2 11 15 2 24 . Make a diagram: Show an almost Normal curve with a center at zero and critical values at 24 24 t .05 1.711 and t .05 1.711 . Since the computed value of t is between these, do not reject the null hypothesis. 24 s d 7.54 1.7114.32928 7.54 7.41 . Since Confidence Interval: D d t 2 s d 7.54 t 05 7.41 is smaller than 7.54, the interval does not include zero. d D0 d D0 7.54 t 1.716 Make a diagram: Show an almost Test Ratio: t sd 4.32928 sd 24 24 1.711 and t .05 1.711 . Since Normal curve with a center at zero and critical values at t .05 the computed value of t is not between these, reject the null hypothesis. Critical Value: d cv D0 t 2 s d 7.41 . Make a diagram: Show an almost Normal curve with a center at zero and critical values at 7.41 and -7.41. Since the computed value of d x1 x 2 7.54 is not between these critical values, reject the null hypothesis. e) Assume that the tests showed unequal variances but Normal distributions, repeat the test (4 extra credit) From our computations of the variance n1 11, x1 25.2718 , and s12 177.724 . From the computer printout n 2 15, x 2 17 .73, and s 22 9.112 82 .992 . H 0 : 1 2 H 1 : 1 2 d x1 x 2 25.27 17.73 7.54 . .10 15 252y0521 4/07/05 (Page layout view!) If we do not assume equal variances, use the following worksheet: sx21 s12 177 .724 16 .1567 n1 11 s x22 s 22 82 .992 5.5328 n2 15 s d2 s12 s 22 n1 n 2 2 s12 n1 16 .1567 2 26 .1039 n1 1 10 21.6895 s2 s2 2 1 2 n1 n 2 df 2 2 s2 s 22 1 n2 n1 n 1 n2 1 1 s12 s 22 21.6895 4.6571 n1 n 2 sd 2 s d2 21 .6895 2 2 2 26 .1039 2.1866 s x2 s x2 1 2 n1 1 n 2 1 2 s22 2 n 2 5.5328 2.1866 n2 1 14 470 .4344 16 .6287 28 .2905 d D0 7.54 1.619 Make a diagram: Show an sd 4.6571 almost Normal curve with a center at zero and critical values at t 16 2.120 and t 16 2.120 . Since Round this down and use 16 degrees of freedom. t .025 .025 the computed value of t is between these, do not reject the null hypothesis. f) Assume that the tests showed that the distributions were not Normal, repeat the test. (4) [13] Solution: If the parent distribution is not Normal, we can use a Wilcoxon-Mann-Whitney test of the equality of two medians. If we use a Wilcoxon-Mann-Whitney rank test, we get the following. Row x1 r1 x2 r2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 45.14 10.11 29.38 19.65 16.25 29.46 8.13 45.63 24.49 12.71 37.04 25 5 20 17 12 21 2 26 18 9 23 178 9.55 38.76 16.65 19.00 17.00 29.01 12.34 11.18 12.15 14.40 8.00 16.19 33.46 18.37 9.86 3 24 13 16 14 19 8 6 7 10 1 11 22 15 4 173 n1 11, and n 2 15 . Note that n1 n 2 11 15 26 and the sum of the first 26 numbers is 26 27 351 , so that, as a check on my ranking 178 + 173 = 351. 2 According to the outline, for values of n1 and n2 that are too large for the tables, W , the smaller of the two rank sums, has the Normal distribution with mean W W2 1 6 n 2 W 1 2 n1 n1 n2 1 and variance . If the significance level is 10% and the test is two-sided, we reject our null hypothesis if z W W W does not lie between z .05 1.645 . 16 252y0521 4/07/05 (Page layout view!) We have n1 11 and n 2 15 . W 173 is the smaller of the rank sums. W 1 2 n1 n1 n2 1 1 2 1111 15 1 5.5 27 148 .5 and W2 16 n2 W W W 173 148 .5 =1.272 1 6 15148 .5 371 .25 . Thus z W 371 .25 Since this is between the critical values, do not reject the null hypothesis of equal medians. We could also say Pvalue 2Pz 1.27 2.5 .3980 2.1020 .2040 .10 Since the p-value is above the significance level, do not reject the null hypothesis of equal medians. 2) A national survey categorized 600 responses about federal government regulation according to the income of the respondees. Too little Just enough Too much regulation regulation regulation Low income 125 48 27 Medium income 103 58 39 72 69 59 High income Personalize the data as follows: Subtract the second digit of your student number from the upper left-hand number and add it to the lower right-hand number. Do the following .05 : a) Is there a relation between incomes and views of government regulation? (4) b) Use a method for comparing two proportions to compare the proportion of low income people and medium income people who feel that there there is too little regulation. (3) c) Use the the Marascuilo procedure to compare the proportions of the three income groups that say there is just enough regulation. (Note that you are now dividing responses into ‘just enough and ‘not just enough’ and that this cuts down your degrees of freedom) (4) Solution: a) Is there a relation between incomes and views of government regulation? (4) This is a chi-squared test of homogeneity or independence. First we must complete the table by adding sums and compute the proportion of the n 600 people in each row. The O (observed) table is the original data set off below by double lines. O Too little Just enough Too much Total pr regulation regulation regulation Low income 125 48 27 200 1/3 Medium income High income 103 58 39 200 1/3 72 69 59 200 1/3 Total 300 175 125 600 1.00 n 600 . The proportions in rows, p r , are used with column totals to get the items in the E (expected) table. For example, the first number in the second row is 1 175 58 .33 . Note that row and column sums 3 in E are the same as in O except for a possible small rounding error. E TL JE TM total pr H 100 .00 58 .33 41 .67 200 1 M 100 .00 58 .33 41 .67 200 1 L 100 .00 58 .33 41 .67 200 total 100 .00 175 .00 125 .00 3 3 1 3 100 1.00 17 252y0521 4/07/05 (Page layout view!) (Note that 2 is computed two different ways here - only one way is needed.) Row O 1 125 2 103 3 72 4 48 5 58 6 69 7 27 8 39 9 59 Total 600 E E O E O2 100.00 100.00 100.00 58.33 58.33 58.33 41.67 41.67 41.69 600.02 -25.00 -3.00 28.00 10.33 0.33 -10.67 14.67 2.67 -17.31 0.02 625.000 9.000 784.000 106.709 0.109 113.849 215.209 7.129 299.636 H 0 : Opinion homogeneous by income class E O 2 O2 E E 6.25000 0.09000 7.84000 1.82940 0.00187 1.95181 5.16460 0.17108 7.18724 30.48600 156.250 106.090 51.840 39.499 57.672 81.622 17.495 36.501 83.497 630.466 .2054 9.4877 DF r 1c 1 22 4 O E 2 630 .466 600 30.466 or 30.486 O2 n E E Since this is more than 9.4877, reject H 0 . Make a diagram! Try the one below, which has the rejection region in blue. Show that 30.5 falls in the rejection region. ChiSquare Curve with 4 Degrees of Freedom and Standard Deviation 2.82843 The Area to the Right of 9.4877 is 0.0500 0.20 Density 0.15 0.10 0.05 0.00 0 5 10 Data A xis 15 20 25 Even better, how about a p-value? The diagram below tells us the pvalue 0 .05 , so we reject H 0 . ChiSquare Curve with 4 Degrees of Freedom and Standard Deviation 2.82843 The Area to the Right of 30.466 is 0.0000 0.20 Density 0.15 0.10 0.05 0.00 0 5 10 15 Data A xis 20 25 30 18 252y0521 4/07/05 (Page layout view!) b) Use a method for comparing two proportions to compare the proportion of low income people and medium income people who feel that there is too little regulation. (3) Too little Just enough Too much Total regulation regulation regulation Low income 125 48 27 200 Medium income 103 58 39 200 High income 72 69 59 200 Total 300 175 125 600 125 out of 200 low income people say that there is too little regulation. 103 out of 200 medium income people say that there is too little regulation. 125 103 .625 and p 2 .515 . n1 200 , n 2 200 , The observed proportions are p1 200 200 q1 1 p1 1 .625 .375 , and q 2 1 p 2 1 .515 .485 . Let p p1 p 2 . So p p1 p 2 .625 .515 .110 and our hypotheses become H 0 : p1 p 2 0 and H 1 : p1 p 2 0 . or H 0 : p 0 and H 1 : p 0 . p 0 0 is the value of p p1 p 2 from H 0 . s p , p0 p1 q1 p 2 q 2 .625 .375 .515 .485 .001172 .001249 .002421 .04920 n1 n2 200 200 125 103 n 2 p 2 n3 p 3 200 .625 200 .105 228 .570 q 0 1 p 0 1 .570 .430 . 200 200 n 2 n3 200 200 400 .05, z 2 z.025 1.960. Note that q 1 p and that q and p are between zero and one. p p 0 q 0 1 n1 1 n3 .570 .430 1 200 1 200 .2451 .0100 .002451 .04851 Use one of the following: Confidence interval: Since the alternate hypothesis is H 1 : p 0 , the confidence interval will be p p z 2 s p .110 1.960.04920 or p .110 0.0964 . Since the error part of the confidence interval is smaller in absolute value than p p1 p 2 .110 , the interval does not include zero. This contradicts H 0 : p 0 , so reject H 0 . Make a diagram of a Normal curve with p .110 in the middle. The area described by the confidence interval is between p .110 0.096 .014 and p .110 0.096 .206. Show zero on the graph. Since zero does not fall in the confidence interval, reject H0 . Test ratio: z p p 0 p .110 0 2.268 . Make a diagram of a Normal curve with zero in the .04851 middle. The ‘reject’ zone is the area below z z.025 1.960. or above z z.025 1.960 . Since the 2 2 test ratio is in the upper part of the ‘reject’ zone, reject H 0 . Critical value: Because the alternate hypothesis is H 1 : p 0 , we need two critical values. Use pcv p0 z 2 p 0 1.960.04851 .095 or -0.095 to 0.095. Make a diagram of a Normal curve with zero in the middle. The ‘reject’ zones are the area below -.095 and the area able .095. Since p .110 is in the upper ‘reject’ zone, reject H 0 . The p-value for this problem is 2Pp .110 2Pz 2.268 2.5 .4884 .0232 . Since this is below .05, reject H 0 . 19 252y0521 4/07/05 (Page layout view!) c) Use the Marascuilo procedure to compare the proportions of the three income groups that say there is just enough regulation. (Note that you are now dividing responses into ‘just enough and ‘not just enough’ and that this cuts down your degrees of freedom) (4) The Marascuilo procedure says that, for 2 by c tests, if (i) equality is rejected and (ii) p a p b 2 s p , where a and b represent 2 groups, the chi - squared has c 1 degrees of p a q a pb qb , you can say that you have a significant na nb freedom and the standard deviation is s p difference between p a and p b . This is equivalent to using a confidence interval of c 1 p a q a p a pb p a pb 2 n a pb qb nb Too little Just enough Too much regulation regulation regulation Low income 125 48 27 Medium income 103 58 39 High income 72 69 59 Actually, we should make this into a 2 by 3 table before we start and repeat the chi-squared test. O Low Medium High Total pr Income Income Income Just enough 48 58 69 175 .2917 Not just enough 152 142 131 425 .7083 Total 200 200 200 600 1.0000 Proportion .240 .290 .345 We create E using the row proportions. O Low Medium Income Income Just enough 58.34 58.34 Not just enough 141.66 141.66 Total 200.00 200.00 O Row 1 2 3 4 5 6 Total 48 152 58 142 69 131 600 High Income 58.34 141.66 200.00 E E O E O2 58.34 141.66 58.34 141.66 58.34 141.66 600.00 10.34 -10.34 0.34 -0.34 -10.66 10.66 0 106.916 106.916 0.116 0.116 113.636 113.636 H 0 : Opinion homogeneous by income class Total pr 175 425 600 .2917 .7083 1.0000 E O 2 E 1.83263 0.75473 0.00198 0.00082 1.94782 0.80217 5.34015 O2 E 39.493 163.095 57.662 142.341 81.608 121.142 605.340 DF r 1c 1 12 2 .2052 5.9915 O E 2 605 .340 600 5.340 or 5.3402. Hey! Much to my total surprise, we really O2 n E E shouldn’t go on since the hypothesis of homogeneity is true! However, I assume that most of you did and some of you would have gotten a rejection if you did this, so here we go! 20 252y0521 4/07/05 (Page layout view!) If s p p a q a pb qb and we are going to use .2052 5.9915 in a confidence interval of na nb c 1 p a q a p a pb p a pb 2 O Just enough Not just enough Total n a Proportion p a Proportion q a a2 n a Low Income 48 152 200 .240 .760 .0009120 pb qb nb p q we should compute a2 a a for a 1,2,3 . na Medium Income 58 142 200 .290 .710 .0010295 High Income 69 131 200 .345 .655 .0011299 2 2 p1 p 2 p1 p 2 2 .05 12 22 .240 .290 5.9915 .0009120 .0010295 0.050 0.198 p1 p3 p1 p3 2 .05 12 32 .240 .345 5.9915 .0009120 .0011299 0.105 0.111 2 p 2 p3 p 2 p3 2 .05 22 32 .290 .345 5.9915 .0010295 .0011299 0.055 0.114 So, the chi-square test spoke the truth and, because the p part of the confidence interval is smaller in absolute value than the error part, there is no significant difference between these proportions. Who woulda guessed it! 21