252y0221 3/29/02 ECO252 QBA2 Name KEY SECOND HOUR EXAM Hour of Class Registered (Circle) (Open this document in 'Page Layout') March 19, 2002 MWF 10 11 TR 12:30 2:00 Hour of Class Attended (If Different) __ I. (14 points) Do all the following. Diagrams will help! x ~ N 5,9 Probabilities still can't be negative! 16 5 3 5 z P 0.89 z 1.22 1. P 3 x 16 P 9 9 P 0.89 z 0 P0 z 1.22 .3133 .3888 .7021 5 5 0 5 z P 0.56 z 0 .2123 2. P0 x 5 P 9 9 14 5 25 5 z P 3.33 z 1.00 3. P25 x 14 P 9 9 P 3.33 z 0 P0 z 1.00 .4996 .3413 .8409 4 5 2 5 z P 0.78 z 0.11 4. P2 x 4 P 9 9 P 0.78 z 0 P 0.11 z 0 .2823 .0438 .2385 2 5 5. F 2 (The Cumulative probability up to 2) . Px 2 P z 9 Pz 0.33 Pz 0 P 0.33 z 0 .5 .1293 .3707 6. A symmetrical interval about the mean with 37% probability. 6. We want two points x.698 and x.315 , so that Px.685 x x315 .3700 . Make a diagram Showing 5 in the middle at the center of a 37% region split into two areas with probabilities of 18.5%. From the diagram, if we replace x by z, P0 z z.315 .1850 . The closest we can come is P0 z 0.48 .1844 or P0 z 0.49 .1879 . Since neither of these is much closer than the other, use z.315 0.485 , and x z.315 5 0.485 9 5 4.365 , or 0.635 9.365 5 0.635 5 z to 9.365. To check this note that P0.635 x 9.365 P 9 9 .1844 .1879 P 0.485 z 0.485 2 P0 z 0.485 2 .3723 37 % . 2 Of course z.315 0.48 or z.315 0.49 is perfectly acceptable. 7. x.02 - We want a point x.02 , so that Px x.02 .02 . Make a diagram of z showing zero in the middle, .4800 between 0 and z.02 and .02 above z.02 . From the diagram, if we replace x by z, P0 z z.02 .4800 . The Normal table says P0 z 2.05 .4798 , which is the closest we can come to.4800. So z.02 2.05 , and x z.02 5 2.059 23.45. To check this note 23 .45 5 Px 23 .45 P z Pz 2.05 Pz 0 P0 z 2.05 .5 .4798 .0202 .02. 9 252y0221 3/29/02 II. (6 points-2 point penalty for not trying part a.) Show your work! Do not answer part b with a yes or no unless you have stated your hypotheses! a. According to your text, a study was made to compare ages of purchasers of Crest with nonpurchasesrs, yielding the following results. These are two independent samples taken from an approximately normal population. Row crest nocrest x1 1 2 3 4 5 6 7 8 x2 34 35 23 44 52 46 28 48 28 22 44 33 55 63 45 31 The Minitab 'describe' function gave the following results for the 'nocrest' column. Variable N Mean Median StDev nocrest 8 40.12 38.50 14.11 a. Compute the standard deviation, s1 , for the 'crest' column. Show your work! (3) b. Compute a 90% confidence interval for the difference betweeen the two population means 1 and 2 on the assumption that these are independent samples taken from approximately normal populations with similar variances. According to your confidence interval, is there a significant difference between the population means? Why? (3) Solution: a) n1 8. Row 1 2 3 4 5 6 7 8 x1 34 35 23 44 52 46 28 48 310 x12 1156 1225 529 1936 2704 2116 784 2304 12754 b) From Table 3 of the Syllabus Supplement: Interval for Confidence Hypotheses Interval Difference H 0: 0 d t 2 sd between Two H 1: 0 1 1 Means ( sd s p 1 2 n1 n2 unknown, variances DF n1 n2 2 assumed equal) x1 s12 x 1 n1 x 2 1 310 38 .75 8 n1x12 n1 1 12754 838 .75 2 741 .5 105 .92857 7 7 s1 10.29161 Test Ratio t sˆ 2p d 0 sd Critical Value d cv 0 t 2 sd n1 1s12 n2 1s22 n1 n2 2 2 252y0221 3/29/02 x1 38 .75, s12 105 .92857 , s1 10 .29161 x2 40 .12, s2 14 .11, s22 199 .0921 d x1 x2 38 .75 40 .12 1.37 DF n1 n2 2 8 8 2 14 sˆ 2p n1 1s12 n2 1s22 = 7105 .92857 7199 .0921 105 .92857 199 .0921 n1 n2 2 14 2 .10, 152 .510335 14 t.05 1.761 sd s p 1 1 n1 n2 152 .510335 1 1 152 .510335 .25 8 8 38 .12758375 6.17475 Confidence Interval: d t sd 1.37 1.761 6.17475 1.37 10.87 or -12.24 to 9.50. The 2 interval includes 0, so there is no significant difference between the means. Formally, our hypotheses are H 0 : 0 H 0 : 1 2 H 0 : 1 2 0 H 1 : 0 or H : or H : 0 We do not reject H 0 . 2 2 1 1 1 1 1 2 3 252y0221 3/29/02 III. Do at least 3 of the following 4 Problems (at least 10 each) (or do sections adding to at least 30 points Anything extra you do helps, and grades wrap around) . You must do problem 2a! Show your work! State H 0 and H1 where applicable. Do not answer a question 'yes' or 'no' without citing a statistical test. Use a 95% confidence level unless another level is specified. 1. For your convenience, data is repeated from the previous page. Use a 90% confidence level in this problem. Row crest nocrest x1 1 2 3 4 5 6 7 8 x2 34 35 23 44 52 46 28 48 Variable nocrest 28 22 44 33 55 63 45 31 N 8 Mean 40.12 Median 38.50 StDev 14.11 a. Test the hypothesis that the mean age of Crest buyers is lower* than the mean for those who did not buy Crest. Assume that these are independent samples taken from approximately normal populations with similar variances and . (i) State your null and alternate hypotheses. (2) (ii) Find a critical value for the difference between the sample means and use it to test your hypothesis. (2) (iii) Repeat the test using a test ratio and find an approximate p-value for the hypothesis. (3) (iv) Repeat the test using a confidence interval. (2). b. Test the equality of the standard deviations of the two samples. (i) Do the test without using a confidence interval. (2) (ii) Create a confidence interval for the variance ratio and use it to find a confidence interval for the ratio of the standard deviations. (3.5) c. (Extra credit) Repeat the tests in a(ii) - a(iv) dropping the assumption of equal variances. (6) Solution: a. (i) Because we are being asked if the mean of Crest users is less than the mean for non-Crest users, we are asking if 1 2 . Because this does not contain an equality this must be an alternate H 0 : 1 2 0 H 0 : 0 H 0 : 1 2 hypothesis. Thus we are testing or or , .10. H1 : 1 2 0 H1 : 0 H1 : 1 2 From the previous page x1 38.75, x2 40 .12, d x1 x2 38.75 40.12 1.37, 1 1 6.17475 . n1 n2 14 Because this is a one - sided hypothesis, we use t.10 1.345 . DF n1 n2 2 8 8 2 14 and sd s p (ii) The formula for a Critical Value is d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value below zero. The critical value formula becomes d CV 0 t sd 0 1.345 6.175 8.305 . Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region below -8.305. Since -1.37 is not in this region, we cannot reject H 0 . *Note: Because I confused a few people on this, I tried to figure out what you thought I was asking before grading this. Your tests had to agree with your null hypothesis. 4 252y0221 3/29/02 (iii) The formula for a Test Ratio is : t x x 2 10 20 d 0 or t 1 . sd sd d 0 1.37 0 0.221 . To do a conventional test make a diagram showing an almost sd 6.17475 Normal curve with a mean at zero and a 'reject' region below t n 1 t 14 1.345 . Since t .10 0.221 is not in this region, we cannot reject H 0 . To find an approximate p- value, compare this value of t with the values on the DF 14 line of the t table. Because this is a left-sided test, we 14 want to know the area below --0.221. Since t.14 40 0.258 and t.45 0.128, we can say that .40 pval .45. (iv) The formula for a Confidence Interval is d t 2 s d or 1 2 x1 x 2 t 2 s d . Since the alternate hypothesis is H1 : 1 2 or 0 , this interval becomes d t sd 1.37 1.3456.17475 1.37 8.305 6.935. Since this interval includes zero and numbers above zero, we cannot reject H 0 . H : 2 b. The best place to find the formulas for comparing variances to test 0 1 is the outline. Recall H 1 : 1 2 that s12 105.9285, s22 199.0921, n1 n2 8 , and .10. (i) . If we want to do a 2-sided test where DF1 n1 1 7 and DF2 n2 1 7 , we compare 7 7 7 against FDF1 , DF2 F.05 3.79 and 2 so s 22 7,7 . against FDF2 ,DF1 F.05 s12 s22 s12 s12 s 22 199 .0921 1.879 , 105 .9285 s12 must be below one. Since both ratios are not above the corresponding table values for s 22 F , we cannot reject the null hypothesis of equality. (ii) A 2-sided confidence interval is 22 s 22 ( n1 1, n2 1) F , which becomes s12 Fn2 1,n1 1 12 s12 2 s 22 1 2 2 2 1.879 1 22 1.879 3.79 or 0.496 22 7.12 . The opposite interval is 3.79 1 1 s12 s 22 12 s12 ( n2 1, n1 1) 1 1 2 1 , which becomes 3.79 or 12 n1 1, n2 1 2 s 2 F 2 1 . 879 3 . 79 1 . 879 F 2 2 2 1 2 0.140 12 2.02 . If we take the square roots , we get confidence intervals for the ratios of 22 standard deviations 0.704 2 2.67 or 0.375 1 1.42 . Since this interval includes one, we 2 1 cannot reject H 0 . H : 2 0 H : 0 H : 2 c. We are testing 0 1 or 0 1 or 0 , .10. H1 : 1 2 0 H1 : 0 H1 : 1 2 5 252y0221 3/29/02 From the formula table; Interval for Confidence Interval Difference between Two Means( unknown, variances assumed unequal) Hypotheses H 0 : 0 d t 2 s d DF s12 s22 n 1 n2 1 2 2 s 22 n1 n1 1 Critical Value d cv 0 t 2 s d d 0 sd Same as H 0: 1 2 2 s12 t H 1: 0 s12 s 22 n1 n2 sd Test Ratio 2 n2 n2 1 H 1: 1 2 if 0 0 From the previous pages x1 38.75, x2 40 .12, d x1 x2 38.75 40.12 1.37, s12 105.9285, s22 199.0921, n1 n2 8 For these calculations, done by Minitab, I used s1 10.2916, and s2 14.1100, so that Minitab reported s12 105.917 and s22 199.092. s12 105 .917 13 .2397 n1 8 s22 199 .092 24 .8865 n2 8 sd s12 s22 38 .1262 6.17464 n1 n2 s12 s22 38 .1262 n1 n2 DF s12 s 22 n1 n 2 2 2 2 s12 s 22 n1 n2 n1 1 n2 1 6.17464 2 13 .2397 2 .24 .8865 2 7 1453 .60 1453 .60 12 .8050 25 .0412 88 .4769 113 .518 7 12 So use 12 degrees of freedom. Because this is a one - sided hypothesis, we use t.10 1.356 . (ii) The formula for a Critical Value is d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value below zero. The critical value formula becomes d CV 0 t sd 0 1.356 6.17464 8.373 . Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region below -8.373. Since -1.37 is not in this region, we cannot reject H 0 . iii) The formula for a Test Ratio is : t x x 2 10 20 d 0 or t 1 . sd sd d 0 1.37 0 0.222 . To do a conventional test make a diagram showing an almost sd 6.17464 Normal curve with a mean at zero and a 'reject' region below t n 1 t 12 1.356 . Since t .10 0.222 is not in this region, we cannot reject H 0 . 6 252y0221 3/29/02 (iv) The formula for a Confidence Interval is d t 2 s d or 1 2 x1 x 2 t 2 s d . Since the alternate hypothesis is H1 : 1 2 or 0 , this interval becomes d t sd 1.37 1.3566.17645 1.37 8.373 7.003 . Since this interval includes zero and numbers above zero, we cannot reject H 0 . Two reminders 1) A one-sided hypothesis is tested by a one-sided test which includes a one-sided critical value or a onesided confidence interval. 2) A table from the outline: Methods for Comparing Two Samples. Paired Samples Location - Normal distribution. Method D4 Compare means. Location - Distribution not Normal. Compare medians. Method D5b Independent Samples Methods D1- D3 Method D5a Proportions Method 6 Variability - Normal distribution. Compare variances. Method 7 7 252y0221 3/29/02 2 a. Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3 - 2 point penalty for not handing this in.) b. A new battery is being tested for use in a tiny stuffed animal. We will use the new battery if it is longerlasting than the old one. Use a 95% confidence level. Slightly edited Minitab output is below. The assumption was that the old battery had an average life of 4.6 hours and this is tested for both the old and new battery before they are compared. Can we say that either battery has a life that is significantly different from 4.6 hours? What evidence in what tests led you to your conclusion? (3) c. Continuing to use the data below, which one of these tests would you use to decide whether to switch batteries. What would you do? Why? (2) d. Using an almost Normal curve, and the appropriate values of t, shade the areas represented by the pvalues in tests 3, 4 and 5. (3) e. Using means and standard deviations from the printout, explain how the computer got the values of t in tests 3, 4 and 5. (2) Part f is at the end of the printout. Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\2X0221-2.MTW'. Retrieving worksheet from file: C:\MINITAB\2X0221-2.MTW Worksheet was saved on 3/13/2002 MTB > print 'new' Data Display new 3.3 6.4 3.9 5.4 5.1 7.3 4.5 5.8 6.4 4.5 4.6 4.9 7.2 6.1 3.9 5.1 3.2 4.0 3.5 5.3 4.0 3.5 5.1 4.5 3.8 MTB > print 'old' Data Display old 4.2 2.9 4.5 4.9 5.0 4.0 5.2 3.5 3.0 5.2 4.4 MTB > describe 'new' 'old' Descriptive Statistics Variable N Mean new 18 5.106 old 18 4.233 Variable new old Min 3.300 2.900 Median 5.000 4.300 TrMean 5.081 4.250 Q1 3.975 3.500 Q3 6.175 5.025 Max 7.300 5.300 StDev 1.215 0.793 SEMean 0.286 0.187 Test 1: MTB > ttest mu=4.6 'new' T-Test of the Mean Test of mu = 4.600 vs mu not = 4.600 Variable new N 18 Mean 5.106 StDev 1.215 SE Mean 0.286 T 1.76 P-Value 0.096 SE Mean 0.187 T -1.96 P-Value 0.066 Test 2: MTB > ttest mu=4.6 'old' t-Test of the Mean Test of mu = 4.600 vs mu not = 4.600 Variable old N 18 Mean 4.233 StDev 0.793 8 252y0221 3/29/02 H 0 : new 4.6 H 0 : old 4.6 b) Solution: The two tests above are and . We are using .05 . In H1 : new 4.6 H1 : old 4.6 both cases the p-value is above the significance level so do not reject the null hypothesis. Test 3: MTB > twosamplet 'new' 'old' Two Sample T-Test and Confidence Twosample T for new vs old N Mean StDev SE new 18 5.11 1.22 old 18 4.233 0.793 Interval Mean 0.29 0.19 95% C.I. for mu new - mu old: ( 0.17, 1.57) T-Test mu new = mu old (vs not =): T= 2.55 P=0.016 DF= 29 Test 4: MTB > twosamplet 'new' 'old'; SUBC> alt= 1. Two Sample T-Test and Confidence Interval Twosample T for new vs old N Mean StDev new 18 5.11 1.22 old 18 4.233 0.793 SE Mean 0.29 0.19 95% C.I. for mu new - mu old: ( 0.17, 1.57) T-Test mu new = mu old (vs >): T= 2.55 P=0.0082 DF= 29 Test 5: MTB > twosample t 'new' 'old'; SUBC> alt = -1. Two Sample T-Test and Confidence Interval Twosample T for new vs old N Mean StDev new 18 5.11 1.22 old 18 4.233 0.793 SE Mean 0.29 0.19 95% C.I. for mu new - mu old: ( 0.17, 1.57) T-Test mu new = mu old (vs <): T= 2.55 P=0.99 DF= 29 c. Solution: If the new battery is 'longer - lasting,' we will find that, new old so that our hypotheses are H 0 : new old . These are the hypotheses tested in test 4. The p-value reported in this test is .0082, H1 : new old which is certainly less than .05 . So reject the null hypothesis and buy the new battery. d. Solution: Make a diagram showing an almost-normal curve with a mean at zero. In test 3, a 2-sided test, the p-value is twice the probability that t 2.55 , so shade the area above 2.55 and below -2.55 and label it 1.6%. In test 4, a 1-sided test, the p-value is the probability that t 2.55 , so shade the area above 2.55 and label it 0.82%, which must be half the p-value in test 4. In test 5, a 1-sided test, the p-value is the probability that t 2.55 , so shade the area below 2.55 2.55 and label it 99%. Except for rounding, this is one minus the p-value in test 4. 9 252y0221 3/29/02 2 2 d 0 e. Solution: As is shown on page 6, above. the formulas are s s1 s2 and t . The 'describe d sd n1 n2 printout shows that xnew 5.106 , xold 4.233, d xnew xold 5.106 4.233 0.873, and sold nold 2 2 0.187 . So s s1 s2 d n1 n2 .0.286 2 0.187 2 0.34171 . Finally t snew nnew 0.286 d 0 sd 0.873 0 2.5548 . 0.34171 f. Assume that you got the following output from a 'describe' command Descriptive Statistics Variable N Mean Median TrMean StDev new 150 5.106 5.000 5.081 1.500 old 150 4.233 4.300 4.250 0.800 Construct a 96% confidence interval for new old . (3) SEMean 0.122 0.065 2 2 Solution: The formula that is used in the large sample case is s s1 s2 d n1 .0.122 2 0.065 2 n2 0.13824 . d 5.106 4.233 0.873 . On Page 1, we found z.02 2.05 The Confidence Interval formula is d t 2 sd 0.873 2.05 0.13824 0.873 0.283 . 10 252y0221 3/29/02 3. (McClave et. al. ) I am a razor manufacturer and claim that my disposable razor gives more shaves than my competitor's. A sample is taken, with x m representing the number of shaves per razor on my product and x c representing the number of shaves on my competitor's product. Row me xm 1 2 3 4 5 6 7 8 8 17 9 11 15 10 6 12 rankm rm 16 15 12 compet rankc xc rc 10 6 3 7 13 14 5 7 1 13 14 2 diff d -2 11 6 4 2 -4 1 5 For your convenience some calculations have been made. The columns rm and rc represent the beginning and end of the ranking of the numbers in x m and x c . d represents the difference between the numbers in x m and x c . For the three data columns we have the following. Variable me 11.00 xm compet Sample Mean xc 8.12 Sample Std Dev 3.63 3.87 4.73 d Note that you will probably not need all the information that I am giving you. a. The authors say that x m and x c represent two independent samples. Because of uncertainty about the underlying distribution, the authors imply that you should compare medians to see if my blade is better. State your hypotheses and your conclusion after doing an appropriate test. Use a 95% confidence level. (5) b. The authors then concede that this was an inefficient way to do the problem and that we ought to compare medians using paired samples. Assume that each line represents the experience of one shaver and repeat the test. (5) c. Now assume that we find out that the underlying distribution is Normal after all and repeat the test in b, using means instead of medians. (5). Solution: The data are repeated below with the ranks filled in. Rank totals are found for m and c. Absolute values of d are found and d is ranked. The ranks are then corrected for ties and marked by the sign of the diff 2.88 difference. .05 . Row me rankm compet rankc diff d xm rm xc rc rd d rd* 1 8 7 10 9.5 -2 2 3 2.52 17 16 6 3.5 11 11 8 8 + 3 9 8 3 1 6 6 7 7 + 4 11 11 7 5.5 4 4 5 4.5+ 5 15 15 13 13 2 2 2 2.5+ 6 10 9.5 14 14 -4 4 4 4.57 6 3.5 5 2 1 1 1 1 + 8 12 12.0 7 5.5 5 5 6 6 + 82.0 54.0 T+=29, T-=7 H 0 : m c a. To check the ranking, note that the rank sums add to 82 +54 = 136. This should be the H1 : m c same as the totals of the numbers 1 through 16, 16(17 ) 136 . For a 1-sided test use the Mann-Whitney2 Wilcoxon rank sum test For n1 n2 8, Table 6b gives critical values of 52 and 84. Since our rank sums fall between these, we do not reject the null hypothesis. 11 252y0221 3/29/02 H 0 : m c b. For a test of the correctness of the ranking, note that the sum of the two Ts in b add to H1 : m c 29+7=36. This should be the same as the sum of the first eight numbers 8(9) 36 . For a Wilcoxon 2 Signed Rank Sum Test, compare T 7, the smaller of the totals, with Table 7. For n 8, the 5% critical value is 6. Since T is above this value, we do not reject the null hypothesis. c. If the parent distribution is Normal, we use Method D4. From the outline, there are three ways of approaching a problem involving two means. You should have chosen one! We know that H : 2 s 4.73 or 1.6723 . We are testing 0 1 d xm xc 2.88. , m c , s d 4.73, sd d n 8 H1 : 1 2 H 0 : 1 2 0 H : 0 7 or 0 , df n 1 7 , tn1 t.05 1.895 . H : 0 H : 0 2 1 1 1 (i) . Confidence Interval: d t 2 s d or 1 2 x1 x 2 t 2 s d . This interval becomes d t sd 2.88 - 1.895 1.6723 2.88 - 3.17 -0.29 . Since this interval includes zero, we cannot reject H 0 . (ii). Test Ratio: t x x 2 10 20 d 0 or t 1 . sd sd d 0 2.88 0 1.722 . Make a diagram showing an almost Normal curve with a sd 1.6723 mean at zero and a 'reject' region above t n1 t 7 1.895 . Since 1.722 is not in this t .05 region, we cannot reject H 0 . (iii). Critical Value: d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value above zero. The critical value formula becomes d CV 0 t sd 0 1.895 1.6723 3.169 Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region above 3.169. Since 2.88 is not in this region, we cannot reject H 0 . 12 252y0221 3/29/02 4. According to your authors, when a sample of 74 woman students were asked whether they would consent to an interview with a travelling recruiter in a local office building, 73 said yes. However, when another sample of 74 women were asked whether they would consent to a similar interview in a hotel room, only 46 said yes. This represents a tremendous difference in the proportion that said yes and the researcher was asked to verify her findings by repeating the question about the hotel room interview to a second sample of 100 students. This time 64 out of the 100 students said yes. a. Was the proportion that consented to the hotel interview significantly different between the sample of 74 and the sample of 100? State your hypotheses and test them using a 90% confidence level? (4) b. The sponsoring firm was so upset by the results that the researcher was asked to interview another sample of 200 women. This time, out of the 200 women, 132 said that they would consent. Check to see if the proportions of the samples of 74, 100 and 200 women who consented differ, Again use a 90% confidence level. (6) c. Do a 2-sided confidence interval for the difference between the proportion of women in the samples of 74 who will consent to interviews in a office building and a hotel room. (3) d. A business has just completed a switch to a new invoicing system. Previously the number of errors per invoice seemed to follow a Poisson distribution with a mean of 0.2. After the switch to the new system was made, a sample of 500 invoices was taken. Of these 479 had no errors, 10 had one error, 8 had 2 errors, 2 had 3 errors and one had 4 errors. Does the Poisson(0.2) distribution still apply? (5) e. The business wants to know if some Poisson distribution works for the data in part d. Using your Poisson table as best you can, how do you go about this and what do you conclude? (3) Solution: To summarize the information in parts a and b - .10 and Hotel Room Sample 1 Sample 2 Sample 3 46 64 132 Yes 28 36 68 No 74 100 200 Total .6216 .6400 .6600 Proportion saying yes We are comparing p1 .6216 , n1 74 and p2 .6400 , n2 100 . Interval for Confidence Hypotheses Test Ratio Critical Value Interval pcv p0 z 2 p Difference p p 0 p p z 2 sp H 0 : p p0 z between If p0 0 p H 1 : p p0 p p1 p2 proportions p p0 q 0 1 n 1 n If p 0 p0 p01 p02 p1q1 p2 q 2 q 1 p s p p01q 01 p02 q 02 n p n2 p2 n1 n2 p or p 0 0 p 1 1 n n 1 1 2 0 Or use s p sp 2 n1 n 2 p1q1 p2q2 .6216 .3784 .6400 .3600 .00317856 .00230400 .00548256 .074044 n1 n2 74 100 p p1 p2 .0184 , p0 46 64 n p n p 74 .6216 100 .6400 1 1 2 2 .6322 , 74 100 n1 n2 74 100 .10, z 2 z.05 1.645. Note that q 1 p and that q and p are between 0 and 1. p p0q0 1 n1 1 n3 .6322 .3678 174 1100 .005467 .073942 13 252y0221 3/29/02 H : p 0 H : p p2 H : p p2 0 a) 0 Same as 0 1 or 0 1 H1 : p 0 H1 : p1 p2 H1 : p1 p2 0 There are three ways to do this problem. Only one is needed. p p0 .0184 0 (i) Test Ratio: z 0.2488 p .073942 Make a Diagram showing 'reject' regions above 1.645 and below -1.645. Since -0.2488 is between these values, do not reject H 0 . (ii) Critical Value: pcv p0 z p 0 1.645 .073942 0.1216 . Make a Diagram 2 showing a 'reject' region above 0.1216 and below -0.1216. Since -0.0184 is between these values, do not reject H 0 . (iii) Confidence Interval:: p p z s p .0184 1.645 .074044 .0184 .1218 or 2 -0.1402 to 0.1034. Since zero is between these values, do not reject H 0 . b) DF r 1c 1 12 2 H 0 : Homogeneousor p1 p 2 p 3 H 1 : Not homogeneousNot all ps are equal O 1 2 3 Total pr Yes 46 64 132 242 .647 No 28 36 68 132 .353 Total 74 100 200 374 1.000 .2102 4.6052 E Satisfied Not Total 1 2 3 Total pr 47 .88 64 .70 129 .40 241 .98 .647 26 .12 35 .30 70 .60 132 .02 .353 74 .00 100 .00 200 .00 374 .00 1.000 The proportions in rows, p r , are used with column totals to get the items in E . Note that row and column sums in E are the same as in O . (Note that 2 0.379 374.379 374 is computed two different ways here - only one way is needed.) O2 O E 2 Row E O2 E O O E E E 1 46 47.88 1.88000 3.53440 0.073818 44.194 2 28 26.12 -1.88000 3.53440 0.135314 30.015 3 64 64.70 0.70000 0.49000 0.007573 63.308 4 36 35.30 -0.70000 0.49000 0.013881 36.714 5 132 129.40 -2.60001 6.76003 0.052241 134.652 6 68 70.60 2.60000 6.75999 0.095751 65.496 374 374.00 0.00001 0.378576 374.379 Since the 2 computed here is less than the 2 from the table, we do not do not reject H 0 . c) Let us call the proportion of women who consented to the office interview p4 p1 73 .9864 . Recall that 74 46 .6216 so p p4 p1 .9864 .6216 .3648 . .10, z z.05 1.645. 2 74 sp p4q4 p2q1 n4 n1 .9864 .0136 .6216 .3784 .00018128 .00317856 .00335984 .057964 74 74 p p z 2 s p .3648 1.645 .057964 .3648 .0954 14 252y0221 3/29/02 d) If we take the column in the Poisson table for the Poisson distribution with a mean of .2 and multiply it by 500, we get E. O is given in the problem. x Poisson(0.2) O E 0 0.818731 409.366 479 1 0.163746 81.873 10 2 0.016375 8.187 8 3 0.001092 0.546 2 4 0.000055 0.027 1 But the E column has items in it below 5 (and 2) and thus can be used only if the last three cells are added together. DF 3 1 2, .2102 4.6052 . O E 2 O2 E E 0 479 409.366 -69.6345 4848.96 11.8451 560.480 1 10 81.873 71.8730 5165.73 63.0944 1.221 2+ 11 8.761 -2.2390 5.01 0.5722 13.811 500 500.000 0.0005 75.5117 575.512 Since 2 75.512 575.512 500 is larger than the table value of 4.4052, we reject the null hypothesis. x O E O E E O2 479 (0) 10 (1) 8(2) 2(3) 1(4) .072 . 500 The closest we can come on the table is Poisson(0.1). If we do the same thing we did in d), we get a ridiculous situation, since the only numbers in E that are above 5 are the first two and scrunching the bottom 3 cells together produces a number that results in a gigantic contribution to 2 . e) There isn't much choice here, but our estimate of the mean is x 0 1 2 3 4 Poisson(0.1) 0.904837 0.090484 0.004524 0.000151 0.000004 E 452.419 45.242 2.262 0.075 0.002 O 479 10 8 2 1 Our table reads: x O E E O 1 2 479 21 500 452.419 47.581 500.000 -26.5815 26.5810 0.0005 E O2 706.575 706.550 O E 2 E 1.5618 14.8494 16.4112 O2 E 507.143 9.268 516.412 But the degrees of freedom seem to be 2 - 1 -1 = 0. (The second -1 is because we estimated a parameter from the data.). This is worse than the dreaded 2 2 case, which needs corrections or should be done with proportions directly. However, given the fact that .2101 2.70554, is smaller than the 2 that we computed, we don't have a very good fit here. © 2002 Roger Even Bove 15