Chapter 12: More Confidence Intervals *Four Common Research Situations Arise Which can be Analyzed with Confidence Intervals. 1. p = a population proportion Ex. What proportion of college age females perceive themselves as being overweight? Population Parameter: p = Population Proportion Sample Estimate: 2. ^ p = Sample Proportion = a population mean Ex. What is the mean number of alcoholic drinks a college student has in one week? Population Parameter: Sample Estimate: _ = Population Mean = Sample Mean x 3. p1 - p2 = the difference between two population proportions Ex. How much difference is there between college men and college women with regard to the proportion who use RTD transportation? Population Parameter: p1 - p2 = Proportions in Population 1 and 2 Respectively. ^ ^ Sample Estimate: p1 p 2 = Difference between two sample proportions. 4. 1 - 2 = the difference between two population means Ex. How much difference is there between grade point average of UCD females compared to UCD males? Population Parameter: 1 2 (Difference between the two population means) _ _ Sample Estimate: x1 x 2 (Difference between the two sample means) Special Case: Paired Data Ex. When comparing the difference between driving speed of an individual in the morning as opposed to the evening a random sample was conducted to choose 100 individuals. Each individual was then observed and a morning driving speed and an evening driving speed were calculated. The differences of each individuals driving speed was then analyzed. *This Matched-Pairs Experiment would be analyzed with an estimation of the mean as opposed the difference of means. *We are analyzing a population of differences which will yield a single mean for each pair. Standard Errors: For each of our four research scenarios we can calculate a standard error for the given sample statistic. The s.e. measures the difference between the statistic and the population parameter. *Standard Error Formula will vary for the four different scenarios. Standard Error of a Sample Proportion: ^ ^ p(1 p) n ^ s.e.( p ) = where ^ p is the sample proportion Standard Error of the Sample Mean: _ s s.e.( x ) = n where s is the sample standard deviation. Standard Error of the Difference Between Two Sample Proportions: s.e. (Difference) = [s.e. (Statistic 1)] 2 [s.e. (Statistic 2)] 2 ^ ^ ^ s.e. ( p1 p 2 ) = ^ ^ ^ p1 (1 p1 ) p 2 (1 p 2 ) n1 n2 Standard Error of the Difference Between Two Sample Means: 2 _ _ s.e.( x1 x 2 ) = 2 s1 s 2 n1 n2 *Notice One Common Theme: Each Standard Error formula has sample size in the denominator. Therefore as sample size is increased, the standard error will decrease. Ex. The Physicians Health Study Research Group at Harvard Medical School conducted a five-year randomized study about the relationship between aspirin and heart disease. The study subjects were 22,071 male physicians. Every other day, study participants took either an aspirin tablet or a placebo tablet. The physicians were randomly assigned to the aspirin or to the placebo group. The study was double-blind. The following table shows the results: Group Placebo Aspirin Heart Attack 189 104 No Heart Attack Total 10,845 11,034 10,933 11,037 a) What is the sample proportion suffering a heart attack? ^ p1 ^ p2 = (189/11,034) = 0.017 for the Placebo = (104/11037) = 0.009 for Aspirin b) What is the estimated difference? ^ p1 - ^ p2 = 0.017 – 0.009 = 0.008 c) What is the standard error of this estimate? ^ s.e.( ^ ^ p1 p 2 ) = ^ ^ ^ p1 (1 p1 ) p 2 (1 p 2 ) = n1 n2 (0.017)(1 0.017) 0.009(1 0.009) = 0.0015 11034 11037 Approximate 95% Confidence Intervals for all four scenarios: Sample Estimate +/- 2 x Standard Error d) From the Physicians and Aspirin Example above compute a 95% Confidence Interval for the difference between the two proportions. Our Sample Estimate is: ^ p1 - ^ p2 = 0.008 We calculates Standard Error as: 0.0015 95% C.I. = 0.008 +/- 2(0.0015) = ( 0.005, .011) Interpretation: Since both endpoints of the Confidence Interval are positive, we can infer that ( p1 p2 ) is positive. This implies that p1 is larger than p2 . This means that the population proportion of heart attacks is larger when subjects take the placebo than when they take the aspirin. Researcher Insight: Although the population difference is small it may be important in public health terms. For instance projected over a population of 200 million adults (as in the United States), a decrease over a five-year period of 0.01 in the proportion of people suffering heart attacks would mean two million fewer people having heart attacks!! Ex. A 30-month study evaluated the degree of addiction that teenagers form to nicotine once they begin experimenting with smoking (Archives of Pediatric and Adolescent Medicine). Random numbers were used to sample 679 seventh-grade students in two-Massachusetts cities. Of them, the 332 students who had ever used tobacco by the start of the study were the subjects evaluated. The response variable was constructed using a questionnaire developed for the study. It included questions such as “Have you ever tried to quit but couldn’t?”, “Do you ever have strong cravings to smoke?” and “Is it hard to keep from smoking in places where you are not suppose to, like school?” There were 10 total questions and each student was given a score from 0-10 (called a HONC score) based on how many questions they answered yes to. (The higher the score the more hooked on nicotine the student is). The participants were then analyzed based on gender and the following results were obtained. Group Females Males Sample Size 150 182 Mean 2.8 1.6 Standard Deviation 3.6 2.9 Let Females be Group 1 and Males be Group 2. a) Calculate the Standard Error for the difference of the two means. 2 _ _ s.e.( x1 x 2 ) = 2 s1 s 2 = n1 n2 3.6 2 2.9 2 = 0.364 150 182 b) Construct a 95% C.I. for the difference between the strength of nicotine addiction for females and males. _ _ Our Sample Estimate is x1 x 2 = 2.8 - 1.6 = 1.2 95% C.I. = 1.2 +/- 2 x S.E. = 1.2 +/- 2(0.364) = ( 0.472, 1.928) c) Interpret the results with a sentence. We are 95% confident that the interval from 0.472 to 1.928 covers the increased mean HONC score for females compared to males. Since the interval does not cover 0, we are fairly certain that this difference was not jus for this particular sample but also holds for the population. 12.4 General Confidence Intervals for One Mean t-interval: A Confidence Interval estimate of the mean of a population. The general C.I. formula is the same: Sample Estimate +/- Multiplier x Standard Error _ *For the Mean: Sample Estimate: = x _ Standard Error: = s.e. ( x ) = s n Multiplier = t* Degrees of Freedom = n-1 t* can be determined using the Student’s t-distribution on Page 614. So any C.I. for a population mean _ x +/- t* ( s n ) is: Ex. In yesterday’s statistics class I surveyed the students to determine the average number of times they ate fast food in the last week. The _ sample mean was x = 1.4 the sample standard deviation was s = 0.45 and the sample size was n = 28. Construct a 98% C.I. for the population parameter mean number of times a Statistics student eats fast food in any given week. To Find t* Use the Table on Page 614 and the fact that df=27 and our C.I. is 98%. t* =2.47 _ 98% C.I. = x +/- t* ( s n ) = 1.4 +/- 2.47( 0.45 ) = (1.19, 1.61) 28