252y0222 3/29/02 ECO252 QBA2 Name SECOND HOUR EXAM Hour of Class Registered (Circle) (Open this document in 'Page Layout') March 19, 2002 MWF 10 11 TR 12:30 2:00 Hour of Class Attended (If Different) __ I. (14 points) Do all the following. Diagrams will help! x ~ N 7,11 Probabilities still can't be negative! 16 7 3 7 z P 0.91 z 0.82 1. P 3 x 16 P 11 11 P 0.91 z 0 P0 z 0.82 .3186 .2939 .6125 7 7 0 7 z P 0.64 z 0 .2389 2. P0 x 7 P 11 11 14 7 30 7 z P 3.37 z 0.64 3. P 30 x 14 P 11 11 P3.37 z 0 P0 z 0.64 .4996 .2389 .7385 4 7 3 7 z P 0.91 z 0.27 4. P 3 x 4 P 11 11 P0.91 z 0 P0.27 z 0 .3186 .1064 .2122 2 7 5. F 2 (The Cumulative probability up to 2) . Px 2 P z 11 Pz 0.45 Pz 0 P 0.45 z 0 .5 .1736 .3264 6. A symmetrical interval about the mean with 33% probability. We want two points x.665 and x.335 , so that Px.665 x x335 .3300 . Make a Diagram showing 7 in the middle at the center of a 33% region split into two areas with probabilities of 16.5%. From the diagram, if we replace x by z, P0 z z.335 .1650 . The closest we can come is P0 z 0.42 .1628 or P0 z 0.43 .1664 . Since neither of these is much closer than the other, and the probability we want is about 30% of the way between these two probabilities, the best compromise is z.335 0.423 , and x z.335 7 0.423 11 7 4.653 , or 2.347 to 11.653. 11 .653 7 2.347 7 z To check this note that P2.347 x 11.653 P 11 11 P 0.42 z 0.42 2P0 z 0.42 2.1628 .3256 33% . Of course z.335 0.42 and z.335 0.43 are perfectly acceptable. 7. x.04 - We want a point x.04 , so that Px x.04 .04 . Make a diagram of z showing zero in the middle, .4600 between 0 and z.04 and .04 above z.04 . From the diagram, if we replace x by z, P0 z z.04 .4600 . The Normal table says P0 z 1.75 .4599 , which is the closest we can come to.4600. So z.04 1.75 , and 26 .25 7 x z.04 7 1.7511 26.25. To check this note Px 26 .25 P z 11 Pz 1.75 Pz 0 P0 z 1.75 .5 .4599 .0401 .04. 252y0222 3/29/02 II. (6 points-2 point penalty for not trying part a.) Show your work! Do not answer part b with a yes or no unless you have stated your hypotheses! a. According to your text, a study was made to compare ages of purchasesrs of Crest with nonpurchasesrs, yielding the following results. These are two independent samples taken from an approximately normal population. Row crest nocrest x1 1 2 3 4 5 6 7 8 x2 34 33 52 41 32 34 49 50 60 54 53 58 52 52 66 35 The Minitab 'describe' function gave the following results for the 'nocrest' column. Variable N Mean Median StDev nocrest 8 53.75 53.50 8.99 a. Compute the standard deviation, s1 , for the 'crest' column. Show your work! (3) b. Compute a 90% confidence interval for the difference between the two population means 1 and 2 on the assumption that these are independent samples taken from approximately normal populations with similar variances. According to your confidence interval, is there a significant difference between the population means? Why? (3) Solution: a) n1 8. Row 1 2 3 4 5 6 7 8 x1 x12 34 1156 33 1089 52 2704 41 1681 32 1024 34 1156 49 2401 50 2500 325 13711 b) From Table 3 of the Syllabus Supplement: Interval for Confidence Hypotheses Interval Difference H 0: 0 d t 2 sd between Two H 1: 0 1 1 Means ( sd s p 1 2 n1 n2 unknown, variances DF n1 n2 2 assumed equal) x1 s12 x 1 n1 x 2 1 325 40 .625 8 n1x12 n1 1 13711 840 .625 2 507 .875 72 .553571 7 7 s1 8.51784 Test Ratio t sˆ 2p d 0 sd Critical Value d cv 0 t 2 sd n1 1s12 n2 1s22 n1 n2 2 2 252y0222 3/29/02 x1 40 .625 , s12 72 .553571 , s1 8.51784 x2 53 .75, s2 8.99, s22 80 .8201 d x1 x2 40 .625 53 .75 13 .125 DF n1 n2 2 8 8 2 14 sˆ 2p .10, n1 1s12 n2 1s22 = 772.553571 780.8201 72.553571 80.8201 76.6868 sd s p n1 n2 2 1 1 n1 n2 14 76 .6868 1 1 76 .6868 .25 8 8 2 14 t.05 1.761 19 .1717 4.37855 Confidence Interval: d t sd 13.125 1.7614.37855 13.125 7.711 or -20.836 to -5.414. 2 The interval does not includes 0, so there is a significant difference between the means. Formally, our H 0 : 0 H : 2 H : 2 0 hypotheses are H 1 : 0 or 0 1 or 0 1 We reject H 0 . H 1 : 1 2 H 1 : 1 2 0 1 2 3 252y0222 3/29/02 III. Do at least 3 of the following 4 Problems (at least 10 each) (or do sections adding to at least 30 points Anything extra you do helps, and grades wrap around) . You must do problem 2a! Show your work! State H 0 and H1 where applicable. Do not answer a question 'yes' or 'no' without citing a statistical test. Use a 95% confidence level unless another level is specified. 1. For your convenience, data is repeated from the previous page. Use a 90% confidence level in this problem. Row crest x1 1 2 3 4 5 6 7 8 34 33 52 41 32 34 49 50 nocrest x2 60 54 53 58 52 52 66 35 The Minitab 'describe' function gave the following results for the 'nocrest' column. Variable N Mean Median StDev nocrest 8 53.75 53.50 8.99 a. Test the hypothesis that the mean age of Crest buyers is lower than the mean for those who did not buy Crest. Assume that these are independent samples taken from approximately normal populations with similar variances and . (i) State your null and alternate hypotheses. (2) (ii) Find a critical value for the difference between the sample means and use it to test your hypothesis. (2) (iii) Repeat the test using a test ratio and find an approximate p-value for the hypothesis. (3) (iv) Repeat the test using a confidence interval. (2). b. Test the equality of the standard deviations of the two samples. (i) Do the test without using a confidence interval. (2) (ii) Create a confidence interval for the variance ratio and use it to find a confidence interval for the ratio of the standard deviations. (3.5) c. (Extra credit) Repeat the tests in a(ii) - a(iv) dropping the assumption of equal variances. (6) Solution: a. (i) Because we are being asked if the mean of Crest users is less than the mean for non-Crest users, we are asking if 1 2 . Because this does not contain an equality this must be an alternate H 0 : 1 2 0 H 0 : 0 H 0 : 1 2 hypothesis. Thus we are testing or or , .10. H1 : 1 2 0 H1 : 0 H1 : 1 2 From the previous page x1 40.625 , x2 53 .75, d x1 x2 40.625 53.75 13.125, 1 1 4.37855 . n1 n2 14 Because this is a one - sided hypothesis, we use t.10 1.345 . DF n1 n2 2 8 8 2 14 and sd s p (ii) The formula for a Critical Value is d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value below zero. The critical value formula becomes d CV 0 t sd 0 1.345 4.37855 5.889 . Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region below -5.899. Since -13.125 is in this region, we reject H 0 . 4 252y0222 3/29/02 (iii) The formula for a Test Ratio is : t x x 2 10 20 d 0 or t 1 . sd sd d 0 13 .125 0 2.998 . To do a conventional test make a diagram showing an sd 4.37855 almost Normal curve with a mean at zero and a 'reject' region below t n 1 t 14 1.345 . t .10 Since -2.998 is in this region, we reject H 0 . To find an approximate p- value, compare this value of t with the values on the DF 14 line of the t table. Because this is a left-sided test, we want 14 to know the area below -2.998. Since t.14 005 2.997 and t.001 3.787, we can say that .001 pval .005 . (iv) The formula for a Confidence Interval is d t 2 s d or 1 2 x1 x 2 t 2 s d . Since the alternate hypothesis is H1 : 1 2 or 0 , this interval becomes d t sd 13.125 1.3454.37855 13.125 5.889 7.236 . Since this interval does not include zero and numbers above zero, we reject H 0 . H 0 : 1 2 b. The best place to find the formulas for comparing variances to test is the outline. Recall H 1 : 1 2 that s12 72.553571, s22 80.8201, n1 n2 8 , and .10. (i) . If we want to do a 2-sided test where DF1 n1 1 7 and DF2 n2 1 7 , we compare 7 7 against F DF1 , DF2 F.057 3.79 and 2 so s 22 7,7 . against FDF2 ,DF1 F.05 s12 s22 s12 s12 s 22 80 .8201 1.114 , 72 .553571 s12 must be below one. Since both ratios are not above the corresponding table values for s 22 F , we cannot reject the null hypothesis of equality. (ii) A 2-sided confidence interval is 22 s 22 ( n1 1, n2 1) F , which becomes s12 Fn2 1,n1 1 12 s12 2 s 22 1 2 1.114 s12 s 22 1 3.79 22 12 1.114 3.79 or 0.2939 22 12 4.2221 . The opposite interval is 12 s12 ( n2 1, n1 1) 12 1 1 1 F , which becomes 3.79 or n1 1, n2 1 2 s 2 2 2 1.114 3.79 2 1.114 F 2 2 1 2 0.2369 12 22 3.4022 . If we take the square roots , we get confidence intervals for the ratios of standard deviations 0.542 2 1 2.05 or 0.487 1 2 1.84 . Since this interval includes one, we cannot reject H 0 . H 0 : 1 2 0 H 0 : 0 H 0 : 1 2 c. (i). We are testing or or , .10. H1 : 1 2 0 H1 : 0 H1 : 1 2 5 252y0222 3/29/02 From the formula table; Interval for Confidence Interval d t 2 s d Difference between Two Means( unknown, variances assumed unequal) s12 s 22 n1 n2 sd DF s12 s22 n 1 n2 2 s 22 n1 n1 1 Test Ratio H 0 : 0 H 1: 0 t 1 2 d 0 sd Critical Value d cv 0 t 2 s d Same as H 0: 1 2 2 H 1: 1 2 s12 Hypotheses 2 n2 if 0 0 n2 1 From the previous pages x1 40.625 , x2 53 .75, d x1 x2 40.625 53.75 13.125, s12 72.553571, s22 80.8201, n1 n2 8 For these calculations, done by Minitab, I used s1 8.51784, and s2 8.9900, so that Minitab reported s12 72.5536 and s22 80.8201. s12 72 .5536 9.0692 n1 8 s22 80 .8201 10 .1025 n2 8 sd s12 s22 19 .1717 4.37855 n1 n2 s12 s22 19 .1717 n1 n2 DF s12 s 22 n1 n 2 2 2 2 s12 s 22 n1 n2 n1 1 n2 1 degrees of freedom. 19 .1717 2 9.0692 2 10 .1025 2 7 367 .555 367 .555 13 .9594 , so use 13 11 .7501 14 .5801 26 .3302 7 13 1.350 . Because this is a one - sided hypothesis, we use t .10 (ii) The formula for a Critical Value is d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value below zero. The critical value formula becomes d CV 0 t sd 0 1.350 4.37855 5.911 . Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region below -5.911. Since -13.125 is in this region, we reject H 0 . iii) The formula for a Test Ratio is : t t x x 2 10 20 d 0 or t 1 . sd sd d 0 13 .125 0 2.998 . To do a conventional test make a diagram showing an sd 4.37855 13 almost Normal curve with a mean at zero and a 'reject' region below tn 1 t.10 1.350 . Since -2.998 is in this region, we reject H 0 . 6 252y0222 3/29/02 (iv) The formula for a Confidence Interval is d t 2 s d or 1 2 x1 x 2 t 2 s d . Since the alternate hypothesis is H1 : 1 2 or 0 , this interval becomes d t sd 13 .125 1.350 4.37855 13 .125 5.911 7.214 . Since this interval does not include zero and numbers above zero, we reject H 0 Two reminders 1) A one-sided hypothesis is tested by a one-sided test which includes a one-sided critical value or a onesided confidence interval. 2) A table from the outline: Methods for Comparing Two Samples. Paired Samples Location - Normal distribution. Method D4 Compare means. Location - Distribution not Normal. Compare medians. Method D5b Independent Samples Methods D1- D3 Method D5a Proportions Method 6 Variability - Normal distribution. Compare variances. Method 7 7 252y0222 3/29/02 2 a. Turn in your computer output from computer problem 1 only tucked inside this exam paper. (3 - 2 point penalty for not handing this in.) b. A new battery is being tested for use in a tiny stuffed animal. We will use the new battery if it is longerlasting than the old one. Use a 95% confidence level. Slightly edited Minitab output is below. The assumption was that the old battery had an average life of 4.7 hours and this is tested for both the old and new battery before they are compared. Can we say that either battery has a life that is significantly different from 4.7 hours? What evidence in what tests led you to your conclusion? (3) c. Continuing to use the data below, which one of these tests would you use to decide whether to switch batteries. What would you do? Why? (2) d. Using an almost Normal curve, and the appropriate values of t, shade the areas represented by the pvalues in tests 3, 4 and 5. (3) e. Using means and standard deviations from the printout, explain how the computer got the values of t in tests 3, 4 and 5. (2) Part f is at the end. Worksheet size: 100000 cells MTB > RETR 'C:\MINITAB\2X0222-2.MTW'. Retrieving worksheet from file: C:\MINITAB\2X0222-2.MTW Worksheet was saved on 3/14/2002 MTB > print 'new' Data Display new 4.2 7.3 3.9 5.4 5.1 7.3 4.5 5.8 6.4 4.5 4.6 4.9 7.2 6.1 3.9 4.0 3.5 5.1 5.1 3.2 4.0 3.5 5.3 4.4 4.5 3.8 MTB > print 'old' Data Display old 5.1 3.8 4.5 4.9 5.0 4.0 5.2 3.5 3.0 5.2 MTB > describe 'new''old' Descriptive Statistics Variable N Mean new 18 5.206 old 18 4.333 Variable new old Min 3.500 3.000 Median 5.000 4.450 TrMean 5.181 4.356 Q1 4.150 3.725 Q3 6.175 5.100 Max 7.300 5.300 StDev 1.228 0.755 SEMean 0.289 0.178 SE Mean 0.289 T 1.75 P-Value 0.099 SE Mean 0.178 T -2.06 P-Value 0.055 Test 1: MTB > ttest mu=4.7 'new' T-Test of the Mean Test of mu = 4.700 vs mu not = 4.700 Variable new N 18 Mean 5.206 StDev 1.228 Test 2: MTB > ttest mu=4.7 'old' T-Test of the Mean Test of mu = 4.700 vs mu not = 4.700 Variable old N 18 Mean 4.333 StDev 0.755 8 252y0222 3/29/02 H 0 : new 4.6 H 0 : old 4.6 b) Solution: The two tests above are and . We are using .05 . In H1 : new 4.6 H1 : old 4.6 both cases the p-value is above the significance level so do not reject the null hypothesis. Test 3: MTB > twosamplet 'new''old' Two Sample T-Test and Confidence Twosample T for new vs old N Mean StDev SE new 18 5.21 1.23 old 18 4.333 0.755 Interval Mean 0.29 0.18 95% C.I. for mu new - mu old: ( 0.18, 1.57) T-Test mu new = mu old (vs not =): T= 2.57 P=0.016 DF= 28 Test 4: MTB > twosamplet 'new''old'; SUBC> alt=1. Two Sample T-Test and Confidence Twosample T for new vs old N Mean StDev SE new 18 5.21 1.23 old 18 4.333 0.755 Interval Mean 0.29 0.18 95% C.I. for mu new - mu old: ( 0.18, 1.57) T-Test mu new = mu old (vs >): T= 2.57 P=0.0079 DF= 28 Test 5: MTB > twosamplet 'new''old'; SUBC> alt=-1. Two Sample T-Test and Confidence Twosample T for new vs old N Mean StDev SE new 18 5.21 1.23 old 18 4.333 0.755 Interval Mean 0.29 0.18 95% C.I. for mu new - mu old: ( 0.18, 1.57) T-Test mu new = mu old (vs <): T= 2.57 P=0.99 DF= 28 c. Solution: If the new battery is 'longer - lasting,' we will find that, new old so that our hypotheses are H 0 : new old . These are the hypotheses tested in test 4. The p-value reported in this test is .0079, H1 : new old which is certainly less than .05 . So reject the null hypothesis and buy the new battery. d. Solution: Make a diagram showing an almost-normal curve with a mean at zero. In test 3, a 2-sided test, the p-value is twice the probability that t 2.57 , so shade the area above 2.57 and below -2.57 and label it 1.6%. In test 4, a 1-sided test, the p-value is the probability that t 2.57 , so shade the area above 2.57 and label it 0.79%, which must be half the p-value in test 4. In test 5, a 1-sided test, the p-value is the probability that t 2.55 , so shade the area below 2.55 2.55 and label it 99%. Except for rounding, this is one minus the p-value in test 4. 9 252y0222 3/29/02 2 2 d 0 e. Solution: As is shown on page 6, above. The formulas are s s1 s2 and t . The 'describe d sd n1 n2 printout shows that xnew 5.206 , xold 4.333, d xnew xold 5.206 4.333 0.873, and sold nold 2 2 0.178 . So s s1 s2 d n1 n2 .0.289 2 0.178 2 0.3394186 . Finally t snew nnew 0.289 d 0 sd 0.873 0 2.5720 . 0.33394186 f. Assume that you got the following output from a 'describe' command Descriptive Statistics Variable N Mean Median TrMean StDev new 150 5.106 5.000 5.081 1.500 old 150 4.233 4.300 4.250 0.800 Construct a 92% confidence interval for new old (3) SEMean 0.122 0.065 2 2 Solution: The formula that is used in the large sample case is s s1 s2 d n1 .0.122 2 0.065 2 n2 0.13824 . d 5.106 4.233 0.873 . On Page 1, we found z.04 1.75 The Confidence Interval formula is d t 2 sd 0.873 1.75 0.13824 0.873 0.242 . 10 252y0222 3/29/02 3. (McClave et. al. )I am a razor manufacturer and claim that my disposable razor gives more shaves than my competitor's. A sample is taken, with x m representing the number of shaves per razor on my product and x c representing the number of shaves on my competitor's product. Row 1 2 3 4 5 6 7 8 me rankm compet rankc xm rm xc rc 8 16 9 11 15 10 6 12 16 15 12 10 6 4 7 13 14 5 7 1 13 14 2 diff d -2 10 5 4 2 -4 1 5 For your convenience some calculations have been made. The columns rm and rc represent the beginning and end of the ranking of the numbers in x m and x c . d represents the difference between the numbers in x m and x c . For the three data columns we have the following. Variable Me 10.88 xm Compet Sample Mean xc 8.25 Sample Std Dev 3.40 3.69 2.62 4.41 d Note that you will probably not need all the information that I am giving you. a. The authors say that x m and x c represent two independent samples. Because of uncertainty about the underlying distribution, the authors imply that you should compare medians to see if my blade is better. State your hypotheses and your conclusion after doing an appropriate test. Use a 95% confidence level. (5) b. The authors then concede that this was an inefficient way to do the problem and that we ought to compare medians using paired samples. Assume that each line represents the experience of one shaver and repeat the test. (5) c. Now assume that we find out that the underlying distribution is Normal after all and repeat the test in part b using means instead of medians. (5). Solution: The data are repeated below with the ranks filled in. Rank totals are found for m and c. Absolute values of d are found and d is ranked. The ranks are then corrected for ties and marked by the sign of the Diff difference. .05 . Row me rankm compet rankc diff d xm rm xc rc rd d rd* 1 8 7 10 9.5 -2 2 3 2.52 16 16 6 3.5 10 10 8 8 + 3 9 8 4 1 5 5 7 6.5+ 4 11 11 7 5.5 4 4 5 4.5+ 5 15 15 13 13 2 2 2 2.5+ 6 10 9.5 14 14 -4 4 4 4.57 6 3.5 5 2 1 1 1 1 + 8 12 12.0 7 5.5 5 5 6 6.5+ 82.0 54.0 T+=29, T-=7 H 0 : m c a. For a test of the correctness of the ranking, note that the sum of the two Ts in a add to H1 : m c 82+54=136. This should be the same as the sum of the first 16 numbers 16(17 ) 136 . For a 1-sided test 2 use The Mann-Whitney-Wilcoxon rank sum test. For n1 n2 8, Table 6b gives critical values of 52 and 84. Since our rank sums fall between these, we do not reject the null hypothesis. 11 252y0222 3/29/02 H 0 : m c b. For a test of the correctness of the ranking, note that the sum of the two Ts in b add to H1 : m c 29+7=36. This should be the same as the sum of the first eight numbers 8(9) 36 . For a Wilcoxon 2 Signed Rank Sum Test, compare T 7, the smaller of the totals, with Table 7. For n 8, the 5% critical value is 6. Since T is above this value, we do not reject the null hypothesis. c. If the parent distribution is Normal, we use Method D4. From the outline, there are three ways of approaching a problem involving two means. You should have chosen one! We know that H : 2 s 4.41 or 1.5592 . We are testing 0 1 d xm xc 2.62. , m c , s d 4.41, sd d n 8 H1 : 1 2 H 0 : 1 2 0 H 0 : 0 7 or , df n 1 7 , tn1 t.05 1.895 . H : 0 H : 0 2 1 1 1 (i) . Confidence Interval: d t 2 s d or 1 2 x1 x 2 t 2 s d . This interval becomes d t sd 2.62 - 1.895 1.5592 2.62 - 2.95 -0.33 . Since this interval includes zero, we cannot reject H 0 . (ii). Test Ratio: t x x 2 10 20 d 0 or t 1 . sd sd d 0 2.62 0 1.680 . Make a diagram showing an almost Normal curve with a sd 1.5592 mean at zero and a 'reject' region above t n1 t 7 1.895 . Since 1.680 is not in this t .05 region, we cannot reject H 0 . (iii). Critical Value: d CV 0 t 2 s d or x1 x 2 CV 10 20 t 2 s d . Because this is a one-sided test, we want one critical value above zero. The critical value formula becomes d CV 0 t sd 0 1.8951.5592 2.955 Make a diagram showing an almost Normal curve with a mean at zero and a 'reject' region above 2.955. Since 2.62 is not in this region, we cannot reject H 0 . 12 252y0222 3/29/02 4. According to your authors, when a sample of 74 woman students were asked whether they would consent to and interview with a travelling recruiter in a local office building, 73 said yes. However, when another sample of 74 women were asked whether they would consent to a similar interview in a hotel room, only 47 said yes. This represents a tremendous difference in the proportion that said yes and the researcher was asked to verify her findings by repeating the question about the hotel room interview to a second sample of 100 students. This time 66 out of the 100 students said yes. a. Was the proportion that consented to the hotel interview significantly different between the sample of 74 and the sample of 100? State your hypotheses and test them using a 90% confidence level? (4) b. The sponsoring firm was so upset by the results that the researcher was asked to interview another sample of 200 women. This time, out of the 200 women, 140 said that hey would consent. Check to see if the proportions of the samples of 74, 100 and 200 women who consented differ, Again use a 90% confidence level. (6) c. Do a 2-sided confidence interval for the difference between the proportion of women in the samples of 74 who will consent to interviews in a office building and a hotel room. (3) d. A business has just completed a switch to a new invoicing system. Previously the number of errors per invoice seemed to follow a Poisson distribution with a mean of 0.2. After the switch to the new system was made, a sample of 500 invoices was taken. Of these 461 had no errors, 28 had one error, 8 had 2 errors, 2 had 3 errors and one had 4 errors. Does the Poisson(0.2) distribution still apply? (5) e. The business wants to know if some Poisson distribution works for the data in part d. Using your Poisson table as best you can, how do you go about this and what do you conclude? (3) Solution: To summarize the information in parts a and b - .10 and Hotel Room Sample 1 Sample 2 Sample 3 47 66 140 Yes 27 34 60 No 74 100 200 Total .6351 .6600 .7000 Proportion saying yes We are comparing p1 .6351 , n1 74 and p2 .6600 , n2 100 . Interval for Confidence Hypotheses Test Ratio Critical Value Interval pcv p0 z 2 p Difference p p 0 p p z 2 sp H 0 : p p0 z between If p0 0 p H 1 : p p0 p p1 p2 proportions p p0 q 0 1 n 1 n If p 0 p 0 p 01 p 02 p1q1 p2 q 2 q 1 p s p p01q 01 p02 q 02 n p n2 p2 p n1 n2 or p 0 0 p0 1 1 n1 n2 n1 n 2 Or use s p 1 sp p1q1 p2q2 n1 n2 2 .6351 .3649 .6600 .3400 .0031317296 .00224400 .0053757 .0733191 74 100 p p1 p2 .0249 , p0 47 66 n p n2 p2 74 .6351 100 .6600 1 1 .6494 , 74 100 n1 n2 74 100 .10, z 2 z.05 1.645. Note that q 1 p and that q and p are between 0 and 1. p p0q0 1 n1 1 n3 .6494 .3506 174 1100 .005354 .073167 13 252y0222 3/29/02 H0 : p 0 H 0 : p1 p2 H0 : p1 p2 0 a) Same as or H1 : p 0 H1 : p1 p2 H1 : p1 p2 0 There are three ways to do this problem. Only one is needed. p p0 .0249 0 0.3403 (i) Test Ratio: z p .073167 Make a Diagram showing 'reject' regions above 1.645 and below -1.645. Since -0.3403 is between these values, do not reject H 0 . (ii) Critical Value: pcv p0 z p 0 1.645 .073167 0.1204 . Make a Diagram 2 showing a 'reject' region above 0.1204 and below -0.1204. Since -0.0249 is between these values, do not reject H 0 . (iii) Confidence Interval:: p p z s p .0249 1.645 .0733191 .0249 .1206 or 2 -0.1455 to 0.0957. Since zero is between these values, do not reject H 0 . b) DF r 1c 1 12 2 H 0 : Homogeneousor p1 p 2 p 3 H 1 : Not homogeneousNot all ps are equal O 1 2 3 Total pr Yes 47 66 140 253 .6765 No 27 34 60 121 .3235 Total 74 100 200 374 1.0000 .2102 4.6052 E Satisfied Not Total 1 2 3 Total pr 50 .06 67 .65 135 .30 253 .01 .6765 23 .94 32 .35 64 .70 120 .99 .3235 74 .00 100 .00 200 .00 374 .00 1.0000 The proportions in rows, p r , are used with column totals to get the items in E . Note that row and column sums in E are the same as in O . (Note that 2 1.207 375.207 374 is computed two different ways here - only one way is needed.) O2 O E 2 Row E O2 E O O E E E 1 47 50.06 3.06 9.3636 0.187048 44.127 2 27 23.94 -3.06 9.3636 0.391128 30.451 3 66 67.65 1.65 2.7225 0.040244 64.390 4 34 32.35 -1.65 2.7225 0.084158 35.734 5 140 135.30 -4.70 22.0900 0.163267 144.863 6 60 64.70 4.70 22.0900 0.341422 55.641 374 374.00 0.00 1.207267 375.206 Since the 2 computed here is less than the 2 from the table, we do not do not reject H 0 . c) Let us call the proportion of women who consented to the office interview p4 p1 73 .9865 . Recall that 74 47 .6351 so p p4 p1 .9865 .6351 .3514 . .10, z z.05 1.645. 2 74 sp p4q4 p1q1 .9865 .0135 .6351 .3649 .0001799695 9 .0031317296 .0033116 .0575465 n4 n1 74 74 p p z 2 s p .3514 1.645 .057964 .3514 .0954 14 252y0222 3/29/02 d) If we take the column in the Poisson table for the Poisson distribution with a mean of .2 and multiply it by 500, we get E. O is given in the problem. x Poisson(0.2) O E 0 0.818731 409.366 461 1 0.163746 81.873 28 2 0.016375 8.187 8 3 0.001092 0.546 2 4 0.000055 0.027 1 But the E column has items in it below 5 (and 2) and thus can be used only if the last three cells are added together. DF 3 1 2, .2102 4.6052 . x O 0 1 2+ 461 28 11 500 E E O E O2 409.366 81.873 8.761 500.000 -51.6345 53.8730 -2.2390 0.0005 2666.12 2902.30 5.01 O E 2 E 6.5128 35.4488 0.5722 42.5338 O2 E 519.147 9.576 13.811 542.534 Since 2 42.534 542.534 500 is larger than the table value of 4.4052, we reject the null hypothesis. 461(0) 28 (1) 8(2) 2(3) 1(4) .108 . 500 The closest we can come on the table is Poisson(0.1). If we do the same thing we did in d), we get a ridiculous situation, since the only numbers in E that are above 5 are the first two and scrunching the bottom 3 cells together produces a number that results in a gigantic contribution to 2 . e) There isn't much choice here, but our estimate of the mean is x 0 1 2 3 4 Poisson(0.1) 0.904837 0.090484 0.004524 0.000151 0.000004 E 452.419 45.242 2.262 0.075 0.002 O 461 28 8 2 1 Our table reads: x O E E O E O2 1 2 461 39 500 452.419 47.581 500.000 -8.58148 8.58100 -0.00048 73.6418 73.6336 O E 2 E 0.16277 1.54754 1.71031 O2 E 469.744 31.967 1.711 But the degrees of freedom seem to be 2 - 1 -1 = 0. (The second -1 is because we estimated a parameter from the data.). This is worse than the dreaded 2 2 case, which needs corrections or should be done with proportions directly. However, given the fact that .2101 2.70554, is larger than the 2 that we computed, we have a very good fit here. 15