3/29/99 252z9922 2. a. A Gallup survey of 100 US entrepreneurs asks about the origin of the car that they drive most frequently. The answers are below. (This material will be covered in the third exam in Spring 2000) US Europe Japan 45 46 9 (i) If there is no preference among entrepreneurs as to the origin of the car that they drive, the proportions in the population with each type should be equal. Test this hypothesis using .10 . (4) (ii) Redo this test using another method. (3) b. A researcher wishes to test whether a set of data fits the distribution z ~ N 0,1 . The researcher observes the following: (This material will be covered in the third exam in Spring 2000) (i) (iii) O Interval Below -1.282 7 -1.282 to -0.842 5 -0.842 to -0.524 9 -0.524 to -0.253 6 -0.253 to 0.000 2 0.000 to 0.253 5 0.253 to 0.524 2 0.524 to 0.842 5 0.842 to 1.282 5 Above 1.282 4 Total 50 This problem isn’t as hard as it looks. Set up E . You might find this much easier to do if you look at the bottom of the t-table rather than using the normal table. (4) Do the 2 test and explain why this grouping might be superior to the one suggested in class for Normal data. (4) Solution: a. H 0 :Uniformity O (i) 45 46 9 100 f .3333 .3333 .3333 1.000 OE E fn 33.3333 33.3333 33.3333 100.000 11.6667 12.6667 -24.3333 0.000 O E 2 O E 2 E 4.0833 4.8133 17.7633 26.6600 O2 E 60.75 63.48 4.43 126.66 O2 n E E = 126.66 – 100 = 26.66. We have 3 cells, so 2 degrees of freedom. Since .2102 4.6052 is less Depending on which method we use 2 than the chi-square we computed, do not reject H 0 . 26 .66 or 2 3/29/99 252z9922 (ii) Use the Kolmogorov-Smirnov Method. O F0 O E fn E n .3333 .3333 .3333 1.000 f n 0.45 .45 33.3333 0.46 .91 33.3333 0.09 1.00 33.3333 1.00 100.000 1.22 Since the critical value is 0.122 and the maximum D is larger, reject H 0 . 100 45 46 9 100 b. Fe D .3333 .6667 1.0000 .1167 .2433 .0000 (i) H 0 : N 0,1 According to the t-table 1.282 z.10 , 0.842 z .20 , 0.524 z 30 etc. So Pz 1.282 .10 , P1.282 z 0.842 .10 etc. (ii) O OE f E fn O E 2 O2 E E 7 .10 5 2 0.8 9.8 5 .10 5 0 0.0 5.0 9 .10 5 4 3.2 16.2 6 .10 5 1 0.2 7.2 2 .10 5 -3 1.8 0.8 5 .10 5 0 0.0 5.0 2 .10 5 -3 1.8 0.8 5 .10 5 0 0.0 5.0 5 .10 5 0 0.0 5.0 4 .10 5 -1 0.2 3.2 50 1.00 50 0 8.0 58.0 We have 10 cells, so 9 degrees of freedom. Since .2019 14.6837 is greater than the chi-square we computed, accept H 0 . This grouping is superior to the one taught in class because if n 50 there will be no items in the E column that are below 5. Items this small usually complicate the test by compelling one to merge cells. 7 3/29/99 252z9922 3. a. In an ad that appeared in the Sunday Inquirer Parade Magazine (numbers slightly modified), Astra Pharmaceuticals reported that the most frequent adverse reaction to its heartburn drug, Prilosec, was a headache. In the reported test 35 out of 465 getting Prilosec reported a headache, while 5 out of 73 getting a placebo reported a headache as did 15 out of 195 getting Ranitidine, another heartburn medication. Test if there is a significant difference between these three proportions. .05 (6) b. Why can’t the problem above be done by the Kolmogorov-Smirnov method? (1) c. (Extra Credit) I lied. Actually the number that received the placebo was 62 and 4 had an adverse reaction. Why did I have to change the numbers? (2) d. Test the hypothesis that the proportion reporting headaches with Prilosec was lower than with Ranitidine. (4) Solution: ( An awful lot of people tried to use the method for d. in this part of the problem - it could never work!) DF r 1 c 1 1 2 2 H 0 : Homogeneous a. 2 2 H 1 : Not homogeneous .05 5.9915 O sum pr E pr 35 5 15 55 .075034 34 .891 5.477 14 .632 .075034 430 68 180 678 .924966 430 .109 67 .523 180 .368 .924966 sum 465 73 195 733 1.00000 sum 465 .000 73 .000 195 .000 1.00000 The proportions in rows, p r , are used with column totals to get the items in E . Note that row and column sums in E are the same as in O . (Note that 2 .055 is computed two different ways here - only one way is needed.) O2 O E 2 OE O E E E O E 2 O2 35 34.891 35.109 0.109 .00034 n 5 5.477 4.565 -0.477 .04154 E E 733 .055 733 0.055 15 14.632 15.377 0.368 .00926 430 430.109 429.891 -0.109 .00003 Since this is less than 5.9915. do not reject H 0 . 68 67.523 68.480 0.477 .00337 (Diagram!) 180 180.638 179.633 -0.368 .00075 733 733.000 733.055 0.000 .05530 b. (This material will be covered in the third exam in Spring 2000) The Kolmogorov-Smirnov test can only be used when the parameters are known. In tests of independence or homogeneity, the proportions in each row and column are the parameters and are estimated in the process of putting together E . c. The problem is partially set up with the correct values below. O sum pr E pr 35 4 15 54 .07479 32 .779 430 58 180 668 .? ? sum 465 62 195 722 1.00000 sum 465 .000 We get a cell with a value below 5, which can complicate the solution. 4.637 14 .584 .07479 ? ? ? 62 .000 195 .000 1.00000 8 3/29/99 252z9922 d. From Table 3 of the Syllabus Supplement: Interval for Confidence Interval p p z 2 sp Difference between proportions q 1 p p p1 p2 s p p1q1 p2 q 2 n1 n2 Hypotheses Test Ratio H 0 : p p0 H 1 : p p0 p 0 p 01 p 02 or p 0 0 z p p 0 p If p 0 p Critical Value pcv p0 z 2 p If p0 0 p p01q 01 p02 q 02 n1 n2 Or use s p p0 p0 q 0 1 n1 1 n2 n1 p1 n2 p2 n1 n 2 H 0 : p1 p 3 H 0 : p1 p 3 0 35 15 p1 .07527 , p 3 .076923 , or 465 195 H 1 : p1 p 3 H 1 : p1 p 3 0 35 15 .07576 , .05, z 1.645 . p p1 p3 .00165 , p 0 465 195 H 0 : p 0 Same as H 1 : p 0 p p 0 q 0 1 n1 1 n3 .07576 .92424 1 465 1195 .02258 (Only one of the following methods is needed!) p p 0 .00165 0 0.733 This is above -1.645. (Diagram!) Test Ratio: z p .02258 or Critical Value: pcv p0 z p 0 1.645 .02258 .03714 p .008106 is above this value. or Confidence Interval: p p z s p where s p p1 q1 p 3 q 3 . ( I’ll do it if you do it!). The n1 n3 interval includes 0. In all cases do not reject H 0 . 9 3/29/99 252z9922 4. Two fuel additives are being tested to see whether the there is a significant difference in miles per gallon for the two additives. The data is below. difference x1 x2 16.7 21.2 -4.5 17.3 18.6 -1.3 17.5 19.7 -2.2 18.2 22.0 -3.8 18.4 17.7 0.7 18.4 18.1 0.3 18.6 18.7 -0.1 19.1 21.2 -2.1 You may need some of the following numbers: .05, x1 18.025 , s1 0.788 , n1 8, x 2 19.650 , s 2 1.627 , n2 8, and d 1.625, s d 1.893. a. Test for a significant difference in miles per gallon if these are independent samples and the underlying distribution is not Normal. (5) b. Test for a significant difference in miles per gallon if each line represents results for a single vehicle and the underlying distribution is not Normal. (5) c. Test for a significant difference in miles per gallon if each line represents results for a single vehicle and the underlying distribution is Normal. (5) Solution: a. Wilcoxon-Mann-Whitney Method H 0 : 1 2 x1 16.7 17.3 17.5 r1 x2 r2 H 1 : 1 2 This is not paired data! If we correct the starred items we get the following: 1 2 3 r1 17.7 4 18.1 5 18.2 6 18.4 7* 18.4 8* 18.6 9* 18.6 18.7 19.1 12 19.7 21.2 21.2 22.0 *tie 10* 11* 13 14* 15* 16 r2 1 2 3 6 7.5 7.5 9.5 12 48.5 4 5 9.5 11 13 14.5 14.5 16 87.5 16 17 Check: 48 .5 87 .5 126 2 For a 5% two-tailed test, Table 6 says that the lower critical value is 49. The lower of the two rank sums, W 48.5 is below this value, so reject H 0 . 10 3/29/99 252z9922 b. Wilcoxon Signed rank test for paired data. H 0 : 1 2 If we add items with + and – signs separately, we find T 31, T 5 . To check this, compute difference rank -4.5 8-1.3 4-2.2 6-3.8 70.7 3+ 0.3 2+ -0.1 1-2.1 5- T T 31 5 36 89 . From Table 7 2 with n 8 , TL TL .025 4 , and since 5, H 0 : 1 2 0 H 1 : 1 2 0 d 1.625, s d 1.893, 3.5834 0.66927 , DF n 1 7, t .7025 2.365 8 n 8 (Only one of the following methods is needed!) d 0 1.625 0 2.428 Test Ratio: t This is not on the sd 0.66927 sd 1.893 2 the smaller T is above the critical value, do not reject H 0 . c. Test of equality of means for paired data. H 0 : 0 H : 2 or 1 2 or 0 1 H1 : 0 H 1 : 1 2 sd H 1: 1 2 . interval between –2.365 and +2.365. or Critical Value: d cv 0 t s d 0 2.365 0.66927 1.583 2 d 1.625 is not on this interval. or Confidence Interval: d t 2 s d 1.625 1.583 or –3.208 to –0.042. This interval does not include 0. With all methods reject H 0 . Note that this method is more powerful than the one in c. However, it still should not be used unless the conditions justify it. 3/29/99 252z9922 5. The second column from the last page is repeated. (Use .05 ) x2 21.2 18.6 19.7 22.0 17.7 18.1 18.7 21.2 You may need some of the following numbers: x 2 19 .650 , s 2 1.627 , n 2 8 . a. Test these data to see if the distribution is Normal. (5) (This material will be covered in the third exam in Spring 2000) b. Test these data to see if the distribution is Normal with a mean of 17 and a standard deviation of 0.5. (5) (This material will be covered in the third exam in Spring 2000) c. Assume that the distribution is not normal and test whether these data have a median of 17.2 (3 – 5 if you use a method learned recently) Solution: a. H 0 : N ?, ? H 1 : Not Normal Because the mean and standard deviation are unknown, this is a Lilliefors problem. The x values must be x x x 19 .650 in order From the data we find that x 19.650 and s 1.627 . t .This is often s 1.627 called z as in a K-S problem and F t is a cumulative Normal probability computed just like F z below. x t F t O O n Fo D 19 .7 21 .2 21 .2 22 .0 1.20 0.95 0.65 0.58 0.03 .1151 .1711 .2578 .2810 .5120 1 1 1 1 1 17 .7 0.95 .8289 1 0.95 .8289 1 1.44 .9251 1 0.125 0.125 .0099 18 .1 0.125 0.250 .0789 18 .6 0.125 0.375 .1172 18 .7 0.125 0.500 .2190 0.125 0.125 0.125 0.125 0.625 0.750 0.825 1.000 .1130 .0789 .0039 .0749 MaxD .2190 Since the Critical O n 8 Value for .05 is .285 , do not re ject H 0 . b. H 0 : N 17 ,0.5 H 1 : Not N 17 ,0.5 Because the mean and standard deviation are known, this is a Kolmogorov-Smirnov problem. x x 17 The x values must be in order z . 0 .5 x 17 .7 18 .1 18 .6 18 .7 19 .7 21 .2 21 .2 22 .0 z F z O O n Fo D 1.40 .9192 1 2.20 .9861 1 3.20 .9993 1 3.40 .9997 1 5.40 1.000 1 8.40 1.000 1 8.40 1.000 1 10 .00 1.000 1 0.125 0.125 .7942 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.250 0.375 0.500 0.625 0.750 0.825 1.000 .7361 .6243 .4997 .3750 .2500 .1250 .0000 O n 8 Note that it might be better to group the repeated values together with O 2 and MaxD .7942 Since the Critical Value for .05 is .454 , reject H 0 . O .25 . It would not n affect the results. 12 3/29/99 252z9922 c. Wilcoxon Signed rank test for paired data. H 0 : 7.2 H 1 : 7.2 The x values need not be in order (but it makes things easier). ‘difference’ below is x . .05 x difference rank So T 0, T 36 . To check this, compute 17.7 0.5 1+ T T 0 36 36 89 . From Table 7 18.1 0.9 2+ 2 18.6 1.4 3+ with , T n 8 T L .025 4 , and since 5, L 18.7 1.5 4+ 2 19.7 2.5 5+ the smaller T is below the critical value, reject 21.2 4.0 6.5+ H0. 21.2 4.0 6.5+ 22.0 4.8 8 An alternative and less powerful test is the sign test. Here, note that there are 8 numbers above the median. pvalue 2Pz 8 21 Pz 8 21 .99609 2.00391 .00782 . Since this is below the significance level, reject H 0 . 13 3/29/99 252z9922 6. In a wage discrimination case the following hourly wage data is reported. .05 A Normal distribution is assumed. Men Women n1 31 n 2 15 x1 9.25 a. b. c. d. x 2 8.70 s1 0.90 s 2 1.35 Test the statement that the variance for women is greater than the variance for men. (2) Test the statement that the variance for women is not equal to the variance for men. (2) Test the hypothesis that men have a wage less than or equal to that of women, assuming that the variances differ between the two populations. (6) (Note - this method was not covered in Spring 2000 - If you have studied this material and want an extra credit question on it, please tell me in advance.) Repeat the test in c using the same sample means and variances, but assuming that n1 310 and n 2 150 . (4) H 0 : 12 22 s 2 1.35 2 14,30 2.04 Solution: a. is smaller than .05 , 22 2.25 . Since F F.05 2 s1 H 1 : 12 22 0.90 2 our ratio, reject H 0 . H 0 : 12 22 b. H 1 : 12 22 check is s 22 s12 s12 s 22 0.90 2 1.30 2 1, s 22 s12 2.25 . Since the first ratio is below 1, the only one we need 14,30 2.34 . Since F F.025 is larger than our ratio, do not reject H 0 . 2 H 0 : 0 H 0 : 1 2 H 0 : 1 2 0 c. or , .05 . See problem II for where 1 2 or H 1 : 0 H 1 : 1 2 H 1 : 1 2 0 formulas. n1 31, x1 9.25, s1 0.90 , n 2 15, x 2 8.70, s 2 1.35 , d x1 x 2 0.55 . s12 0.90 2 0.02613 n1 31 s 22 1.35 2 0.12150 n2 15 s12 s 22 0.14763 n1 n 2 DF s12 s 22 n1 n 2 2 sd s12 s 22 0.14763 0.3842 n1 n 2 2 2 s12 s 22 n1 n2 n1 1 n2 1 0.14763 2 0.02613 2 0.12150 2 30 20 .233 , so use 20 degrees of freedom. 14 14 3/29/99 252z9922 t.20 05 1.725. (Only one of the following methods is needed!) Test Ratio: t d 0 0.55 0 1.432 This is below 1.725. sd 0.3842 or Critical Value: d cv 0 t 2 s d 0 1.725 0.3842 0.663 . 0.55 lies below this value. or Confidence Interval: d t s d 0.55 1.725 0.3842 0.113 This interval includes 0. In all cases do not reject H 0 . d. H 0 : 0 H : 2 H 0 : 1 2 0 or 0 1 , where 1 2 or H 1 : 0 H 1 : 1 2 H 1 : 1 2 0 n1 310 , x1 9.25, s1 0.90 , n 2 150 , x 2 8.70, s 2 1.35 , d x1 x 2 0.55 . Because of the large sample size, we can act as if the variances were known. From Table 3 in the syllabus supplement: Interval for Confidence Hypotheses Test Ratio Critical Value Interval Difference Between Two Means ( Known) H 0: 0 d z 2 d d 12 n1 22 n2 t H1 : 0 1 2 d 0 sd d cv 0 z 2 d d x1 x 2 z .05 1.645 , sd s12 s 22 n1 n 2 0.90 2 1.35 2 310 150 0.1215 (Only one of the following methods is needed!) d 0 0.55 0 4.527 This is above 1.645. Test Ratio: z sd 0.1215 or Critical Value: d cv 0 t 2 s d 0 1.645 0.1215 0.1999 . 0.55 lies above this value. or Confidence Interval: d t s d 0.55 1.645 0.1215 0.350 This interval does not include 0. In all cases reject H 0 . 15 3/29/99 252z9922 IV. Computer Problem. There is no computer problem on the Spring 2000 exam, but you are responsible for the rule on p-value below. 1. Hand in your first problem (3 – 2point penalty for not handing in). 2. Assume that your output is: MTB > ttest mu = 30 ‘glop’; SUBC > alt=1. TEST OF MU=30 VS MU > 30 Variable N Mean StDev glop 20 38.00 9.23 a. b. c. SE Mean T P-Value 2.06 3.88 0.0025 (Don’t do this problem unless you handed in the computer problem.) Show how the value of t was computed from the values of the mean and standard deviation (1) Give the null hypothesis and tell, using the p-value, whether (and why) you would accept it if .020 . (1) What would the p-value be for the following tests (2): (i) MTB > ttest mu = 30 ‘glop’ (ii) MTB > ttest mu = 30 ‘glop’; SUBC > alt = -1. x 0 38 .00 30 s 9.23 3.88, s x 2.06 . sx 2.06 n 20 The rule on p-value: if the p-value is less than the significance level reject the null hypothesis; if the p-value is greater or equal than the significance level, do not reject the null hypothesis. Solution: a. t b. c. H 0 : 30 Since the p-value of .0025 is less than the significance H 1 : 30 level ( .020 ), reject H 0 . See diagrams. (i) Since this is a 2-sided test, double the probability between t and the nearest corner. Thus the p-value is 2(.0025) = .005. (If .020 , reject H 0 .) (ii) This is the opposite test to the test in b., so the p-value is 1 - .0025 = .9975. (If .020 , do not reject H 0 .) 16