Chapter 18 Sampling Distribution Models Review Notation µ : population mean (real or assumed average) ðĨ : sample mean (estimated average) p : population proportion (real or assumed proportion of success) ð : sample proportion (estimated proportion of success) • Why do we use a sample? - to make general conclusion about the whole population - a census is impractical • Which has a larger standard deviation, individual grades on a test or average grades over several tests? - Individual grades have more variation! - average grades have a tighter fit around the center (appear more like the normal model and smaller SD) Review drawing the 68-95-99.7 rule: for this distribution of test grades, N(70, 10) Dice Experiment http://www.math.uah.edu/stat/apps/ DiceExperiment.html Means – Averaging More Dice • Looking at the average • The average of three of two dice after a dice after a simulation of simulation of 10,000 10,000 tosses looks like: tosses: Means – Averaging Still More Dice • The average of 5 • dice after a simulation of 10,000 tosses looks like: The average of 20 dice after a simulation of 10,000 tosses looks like: The Fundamental Theorem of Statistics • The sampling distribution (shape and spread) of any statistic (averages or proportions) becomes more Normal as the sample size grows. • This is called the Central Limit Theorem (CLT). Sampling Distribution for ð : % of Americans who believe in ghosts • Distribution Model for pĖ –Mean: ðļ(ð) = ð –Standard deviation: SD( pĖ) ï― ïĶ N ï§ï§ p, ïĻ pq ïķ ï·ï· n ïļ pq n A picture of what we just discussed is as follows: Assumptions for proportions 1. Randomization Condition: The sample should be a simple random sample. 2. 10% Condition: the sample size, n, must be no larger than 10% of the population. 3. Success/Failure Condition: both np (number of successes) and nq (number of failures) are at least 10. 4. Independence Sampling Distribution for ðĨ : SAT scores for all HS students in US •Distribution Model for ðĨ –Mean: ðļ(ðĨ) = ð –Standard deviation: ðð· ðĨ = ð ð ð, ð ð ð Assumptions for means 1. Randomization Condition: The data values must be sampled randomly. 2. 10% Condition: the sample size, n, should be no more than 10% of the population. 3. Large Enough Sample Condition: if you think it’s large enough. (CLT doesn’t tell us) 4. Independence The Process Going Into the Sampling Distribution Model Why use these models? µ What Can Go Wrong? • Don’t confuse the sampling distribution with the distribution of the sample. – When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. – The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get. • Beware of observations that are not independent. –The CLT depends crucially on the assumption of independence. –You can’t check this with your data—you have to think about how the data were gathered. • Watch out for small samples from skewed populations. –The more skewed the distribution, the larger the sample size we need for the CLT to work. Chapter 18 Assignment Pg. 432: #1, 3, 11, 15, 17, 23, 25, 31, 33, 37, 43 (omit b), 45, 47 Show work!