Handout 13.5 Mixed Inference 1. Some people think that chemists are more likely than other parents to have female children. (Perhaps chemists are exposed to something in their laboratories that affects the sex of their children.) The Washington State Department of Health lists the parents’ occupations on birth certificates. Between 1980 and 1990, 555 children were born to fathers who were chemists. Of these births, 273 were girls. During this period, 48.8% of all births in Washington State were girls. Is there evidence that the proportion of girls born to chemists is higher than the state proportion? state proportion? A: H 0 : p=.488 The proportion of girls born to fathers who are chemists is 48.8%. H a : p>.488 The proportion of girls born to fathers who are chemists is greater than 48.8%. We do not have an SRS, and this may constitute an assumption violation. We do not know how large the population of chemists who have fathered children is, so we do not know if it is more than 5550. np0 = 555 (.488 ) ³ 10 n(1- p0 ) = 555 (1- .488 ) ³ 10 z= 273 p̂ - p0 555 - .488 = .1834 = p0 (1 - p0 ) .488(1 - .488) n 555 P(z ³ .1834) = .427 Fail to reject H0 at a = .05 , a value this extreme may occur by chance alone about 43% of the time. We lack evidence that chemists have a higher proportion of daughters than the general population. 2. During 14 years of follow-up to the 1976 Nurses Health Study, the relationship between nut consumption (true nuts, not peanuts) and risk of coronary heart disease was examined in a group of 86,016 female nurses aged 34 to 59 years of age without a prior diagnosis of coronary heart disease. The data for 1255 of the nurses are given in the following table: Frequency of nut consumption Fatal coronary heart disease Non-fatal myocardial infarction (heart attack) Total cases of coronary heart disease Almost never 197 345 542 Once a week 161 423 584 2-4 times per week 22 63 85 >4 times per week 14 30 44 Total 394 861 1255 Do the data give evidence that coronary heart disease is independent of nut consumption? A: H0: Nut consumption and heart disease are independent of one another. Ha: Nut consumption and heart disease are not independent of one another. or H0: There is no relationship between nut consumption and heart disease. Ha: There is a relationship between nut consumption and heart disease. Handout 13.5 We have no evidence of a random sample so our results may not be representative. All expected counts are greater than 5. é 197 345 ù é 170 372 ù ê ú ê ú 161 423 ú 183 401 ú ê ê obs. = exp = ê 22 63 ú ê 27 58 ú ê 14 30 ú ê 14 30 ú ë û ë û c2 = å (o - e)2 = 11.344 df = 3 e P( c 2 ³ 11.344) = .010004 Reject H0 at a = .05 a value this extreme may occur by chance alone only 1% of the time. We have evidence of a relationship between nut consumption and the occurrence of fatal and nonfatal heart disease, but recall that there was an assumption violation. 3. Researchers at the National Cancer Institute released the results of a study that examined the effect of weed-killing herbicides on house pets. The following data is compatible with summary values given in the report. Dogs, some of whom were from homes where the herbicide was used on a regular basis, were examined for the presence of malignant lymphoma. Below are the data: Group Sample Size # with Lymphoma ˆ p Exposed 827 473 0.572 Unexposed 130 19 0.146 Estimate the difference by which the proportion of exposed dogs that develop lymphoma exceed that for unexposed dogs. Answer: 2-proportion Z-interval We are uncertain of having SRSs. The samples can reasonably be expected to be independent. Let p1=proportion of dogs exposed to herbicide that develop lymphoma. Let p2=proportion of dogs not exposed to herbicide that develop lymphoma. The populations of dogs exposed to herbicide and those not exposed to herbicide are each well over 10 times the sample sizes. n1 p̂1 = 827 (.572 ) ³ 5 n2 p̂2 = 130 (.146 ) ³ 5 ( ) ( ) n1 1- p̂1 = 827 (1- .572 ) ³ 5 n2 1- p̂2 = 130 (1- .146 ) ³ 5 p̂ (1- p̂1) p̂2 (1- p̂2 ) p̂1 - p̂2 ± z* 1 + n1 n2 .572 - .146 ±1.96 (.3563,.4953) .572 (1- .572 ) .146 (1- .146 ) + 827 130 Handout 13.5 We are 95% confident that the proportion of dogs that develop lymphoma is between 35 and 50%, higher than those not exposed to herbicide. In repeated random sampling this method captures the true difference in proportions 95% of the time. 4. A distributor of raisins claims that the average box contains 36 raisins. The stem-and-leaf plot displays the number of raisins found in 30 randomly selected 1/2 oz. boxes. Test the claim that the mean number of raisins is actually less than 36. Number of raisins 2 3 3 4 A: 679 13334444 555566777788899 0013 H0 : m = 36 The mean number of raisins in a box is 36. Ha : m < 36 The mean number of raisins in a box is <36. We are given an SRS. Our sample is large (30), so by the Central Limit Theorem the normal approximation is useful. t= x - m 35.466 - 36 = = -.7556 df = 30 s 3.8661 n 31 P(t £ -.7556) = .228 Fail to reject H0 a value this extreme may occur by chance alone about 23% of the time. We lack strong evidence that there are fewer than 36 raisins per box. 5. An experiment on the side effects of pain relievers assigned arthritis patients to one of several overthe-counter pain medications. Of the 440 patients who took one brand of pain reliever, 23 suffered some “adverse symptom.” Does the experiment provide strong evidence that fewer than 10% of patients who take this medication have adverse symptoms? H 0 : p = 0.10 The proportion of patients who suffer adverse symptoms when taking the medicine is 0.10. H a : p < 0.10 The proportion of patients who suffer adverse symptoms when taking the medicine is <0.10 The data came from an experiment, and presumably they randomly assigned to treatment. np = 440(.1) = 44 > 10 n(1- p) = 440(.9) = 396 10n = 4400 The population of patients taking the medicine is likely to be greater than 4400. z= p̂ - p0 p0 (1- p0 ) n 23 - .1 = 440 = -3.337 .1(.9) 440 P(Z < -3.337) = 4.2 ´10-4 Reject H0, p = 0.00042 < a = .01, a test statistic this small may occur by chance alone well less than 1% of the time. We have strong evidence that the true proportion of adverse symptoms is less than 10%. Handout 13.5 6. A department store stocks blue jeans that are identical except for color. A random sample of 32 sales showed the following purchases: Color Number sold Color Number sold Faded blue denim 13 Darker blue denim 8 Traditional blue denim 6 Black 5 Does this data indicate that one color of jeans is preferred over the others, or are consumers buying the jeans in equal proportions? A: H0 : Consumers show equal preference for the various jean colors. Ha : Consumers show unequal preference for the various jean colors. We are given a random sample. All expected cell counts are greater than 5. Expected counts are 8,8,8,8. c2 = å (o - e)2 = 4.75 df = 3 e P(c 2 ³ 4.75) = .191 Fail to reject H0 at a = .05 a value this extreme may occur by chance alone about 19% of the time. We lack evidence that consumers have different preferences for various jean colors. 7. Poisoning by the pesticide DDT causes tremors and convulsions. In a study of DDT poisoning, researchers fed several rats a measured amount of DDT. They then made measurements on the rats’ nervous systems that might explain how DDT poisoning causes tremors. One important variable was the “absolute refractory period,” the time required for a nerve to recover after a stimulus. This period varies normally. Measurements on ten rats gave the data below (in milliseconds). 1.5 2.0 1.7 1.9 1.8 1.6 2.15 1.75 1.50 3.01 (a) Give a 90% confidence interval for the mean absolutely refractory period for all rats of this strain when subjected to the same treatment. A: 1-sample t-interval We are uncertain that this is an SRS. We are told that the refractory period varies normally. s x ± t* n .4455 1.891 ± t * df = 9 10 (1.63, 2.15) We are 90% confident that the mean refractory period is between 1.633 and 2.149 ms. In repeated random samples this method captures the true mean difference approximately 90% of the time. Handout 13.5 A: (b) Does this differ significantly from the published value of 1.88 ms for rats of this strain? H 0 : m = 1.88 The mean refractory period is 1.88 ms. H a : m ¹ 1.88 The mean refractory period is not 1.88 ms.. Using the confidence interval already constructed, we fail to reject H0 at a = .10 . Otherwise, we calculate the test statistic and p-value. t= x-m 0 = 1.891-1.88 = 0.0781 df =9 P(t £ -.0780 or t ³ 0.0780) = .939 s .4455 n 10 8. An educator believes that new reading activities in the classroom will help elementary school pupils improve their reading ability. She arranges for a third-grade class of 21 students to follow these activities for an 8-week period. A control classroom of 23 third graders follows the same curriculum without the activities. At the end of the 8 weeks, all students are given the Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve. Here are the data: Treatment Control 24 43 58 71 43 57 42 43 55 26 62 48 49 61 44 67 49 43 37 33 41 19 54 28 53 56 59 52 62 46 20 85 46 10 17 55 54 57 33 60 53 48 37 42 Is there good evidence that the new activities improve the mean DRP score? A: Let 1 = Treatment 2 = Control H0 : m1 = m2 The mean DRP score is the same for both treatment and control. Ha : m1 > m2 The mean DRP score is greater for the treatment. We do not have SRSs. The samples are independent. This normal probability plot of the treatment data is roughly linear suggesting a normal model. This normal probability plot of the control data is roughly linear suggesting a normal model. t= x1 - x2 51.476 - 41.782 = = 2.245 df=37 (calculator, round down) 2 2 2 2 s1 s2 11.007 17.201 + + 21 23 n1 n2 Handout 13.5 P( t ³ 2.245) = .0153 Reject H0 a = .05 ,a value this extreme may occur by chance alone about 1% of the time. We have strong evidence that the mean treatment DRP score is higher than the mean control score. It is noted that our samples may not have been SRSs, so our results may be in question. 9. The National Assessment of Educational Progress (NAEP) Young Adult Literacy Assessment Survey interviewed a random sample of 1917 people 21 to 25 years old. The sample contained 840 men, of whom 775 were fully employed. There were 1077 women, and 680 of them were fully employed. (a) Use a 99% confidence interval to describe the difference between the proportions of young men and young women who are fully employed. Is the difference statistically significant at the 1% significance level? A: Let p1=proportion of young men who are fully employed. Let p2= proportion of young women who are fully employed We are given two independent SRSs. The populations are large. æ 775 ö æ 680 ö n1 p̂1 = 840 ç ³ 5 n2 p̂2 = 1077 ç ³5 ÷ è 1077 ÷ø è 840 ø p̂1 - p̂2 ± z* æ ç çè p̂1(1- p̂1) p̂2 (1- p̂2 ) ö + ÷ ÷ø n1 n2 æ .9226(1- .9226) .6313(1- .6313) ö .9226 - .6313 ± z* ç + ÷ø è 840 1077 (.2465,.3359) We are 99% confident that the true difference in proportions between men and women who are fully employed is between 25% and 34%. In repeated random sampling, this method captures the true difference in proportions about 99% of the time. To answer the question of whether the difference is significant at the 1% level, we utilize our 99% confidence interval, instead of starting all over to do a test. As we look at the interval we see that 0 is not ion the interval. Our null hypothesis is H0 : p1 - p2 = 0 . Zero is not in the interval so we reject the null hypothesis. The alternate hypothesis is Ha : p1 - p2 ¹ 0 . We have strong evidence that the proportions of men and women who are fully employed are not the same. (b) The mean and standard deviation of scores on the NAEP’s test of quantitative skills were x1 = 272.40 and s1 = 59.2 for the men in the sample. For the women, the results were x2 = 274.73 and s2 = 57.5 . Is the difference between the mean scores for men and women significant at the 1% level? Handout 13.5 A: Let 1 = Men 2 = Women H0 : m1 = m2 The mean NAEP score is the same for men and women. Ha : m1 ¹ m2 The mean NAEP score is not the same for men and women. We have independent SRSs, given. We do not know if the NAEP scores vary normally, and lack data to investigate. t= x1 - x2 272.40 - 274.73 = = -.8620 df=1777(calculator, round down) s12 s22 59.2 2 57.5 2 + + 840 1077 n1 n2 P( t £ -.8620 or t ³ .8620) = .388 Fail to reject H0 at a = .05 , a value this extreme may occur by chance alone about 39% of the time. We lack strong evidence that men and women have different mean NAEP scores.