AP Statistics Chapter 23 Notes “Inference about Sample Means” Introduction to Sample Means We’ve been working with confidence intervals and hypothesis tests about proportions (% of the population, p), now we want to do the same thing for means (average amount for the population, m). The Central Limit Theorem tells us that we can still use a Normal Model for means, no matter what shape population the data came from (as long as the conditions are met) The assumptions and conditions Independence – must have independent events Random sample – must be a simple random sample 10% condition – population must be at least 10 times the sample size Nearly Normal condition – – – For very small samples (<40), always check your data in a histogram to make sure the data is unimodal and roughly symmetric. For larger samples (>40), proceed with hypothesis testing even if the data is skewed or has outliers. You do not have to check a histogram. Standard Error For proportions we used: SE For means we will use: pˆ qˆ n std SE n The T-Model With smaller samples we need a little extra variation and a little more margin of error than the Normal Model allows The T-Model, created by William Gosset, uses a whole family of corrected “Normal Models” with fatter tails to correct this problem of a small sample size Solid curve = t model with 2 degrees of freedom dotted curve = normal model More about the T-Model Gosset’s T-Model uses degrees of freedom to determine how fat the tails should be. The smaller the sample size, the fatter the tails. As the sample size increases, the tails shrink closer to the tails of the Normal Model and as the sample size approaches infinity, the TModel becomes the same as the Normal Model. Compare the Normal Model to the T-Model (n – 1 degrees of freedom) Find normalcdf(1.645, 99) Find tcdf(1.645, 99, 12) Find tcdf(1.645, 99, 25) Find tcdf(1.645, 99, 100) Example – With Data In 2000 the Bureau of Census reports that the average life expectancy for a person in the United States has increased beyond 77 years. Insurance companies track life expectancy information to assist in determining the cost of life insurance policies. The insurance company wants to know if their clients have also started living longer, so they randomly select a sample of recently paid policies to see if the mean life expectancy of those policyholders has increased. The insurance company will change their premium rates if there is evidence that people who buy their policies are living longer. Does this sample indicate that the insurance company should increase their premiums? Test the hypotheses and state your conclusion. 86 76 75 85 83 70 84 76 81 79 77 81 78 73 79 74 79 72 81 83 STAT TESTS The Solution Process State Ho and Ha Check the conditions – – – – #2 T-Test It is a random sample – given information The events are independent – we assume the length of one person’s life is not effected by the length of another person’s life in this situation The population of all policy holders with this insurance company must be at least 200 people. The sample size is under 40 so we need to check a histogram of the data. The histogram of the data looks unimodal and roughly symmetric. (draw it here) Use a T-model with how many degrees of freedom? Calculations: SE = p-value = Conclusion: STAT TESTS Confidence Intervals #8 T-Interval Find the 95% confidence interval and explain what it means in context. Does the confidence interval support your conclusion? Why? Example – With Stats According to a newspaper article, the national average math SAT score in 2010 was a 516. A certain teacher wants to see if the students in her school are performing higher than the national average, so she collects a random sample of 50 students’ SAT math scores in her school. She calculates the average from her sample to be 550 with a standard deviation of 36 points. Is this sufficient evidence to conclude that students in her school are performing higher than the national average? STAT TESTS The Solution Process State Ho and Ha Check the conditions – – – – #2 T-Test It is a random sample – given information The events are independent – we assume one student’s SAT math score is not influenced by another student’s SAT math score The population of all students at this high school must be at least 500. The sample size is over 40 so we don’t need to check a histogram of the data. Use a T-model with how many degrees of freedom? Calculations: SE = p-value = Conclusion: