Name__________________________________ AP Statistics Chapter 18: Sampling Distribution Models Notes Suppose I randomly select 100 seniors in Howard County and record each one’s GPA. 1.95 1.98 1.86 2.04 2.75 2.72 2.06 3.36 2.09 2.06 2.33 2.56 2.17 1.67 2.75 3.95 2.23 4.53 1.31 3.79 1.29 3.00 1.89 2.36 2.76 3.29 1.51 1.09 2.75 2.68 2.28 3.13 2.62 2.85 2.41 3.16 3.39 3.18 4.05 3.26 1.95 3.23 2.53 3.70 2.90 2.79 3.08 2.79 3.26 2.29 2.59 1.36 2.38 2.03 3.31 2.05 1.58 3.12 3.33 2.04 2.81 3.94 0.82 3.14 2.63 1.51 2.24 2.22 1.85 1.96 2.05 2.62 3.27 1.94 2.01 1.68 2.01 3.15 3.44 4.00 2.33 3.01 3.15 2.25 3.34 2.22 3.29 3.90 2.96 2.61 3.01 2.86 1.70 1.55 1.63 2.37 2.84 1.67 2.92 3.29 -These 100 seniors make up one possible sample. All seniors in Howard County make up the population. -The sample mean, ðĨĖ is 2.5470 and the sample standard deviation, s, is 0.7150. -The population mean, µ, and the population standard deviation, ð, are unknown. -We can use ðĨĖ to estimate µ and we can use s to estimate ð. These estimates may or may not be reliable. -A number that describes the population is called a parameter. Hence, ï and ïģ are both parameters. A parameter is usually represented by p. -A number that is computed from a sample is called a statistic. Therefore, x and s x are both statistics. A statistic is usually represented by pĖ. **If I had chosen a different 100 seniors, then I would have a different sample, but it would still represent the same population. A different sample almost always produces different statistics.** Example: Let pĖ represent the proportion of seniors in a sample of 100 seniors whose GPA is 2.0 or higher. pˆ1 ï― .78 pˆ 2 ï― .72 p3 ï― .81 pˆ 4 ï― .70 pˆ 5 ï― .68 pˆ 6 ï― .75 pˆ 7 ï― .79 pˆ 8 ï― .72 pˆ 9 ï― .83 pˆ10 ï― .76 If I compare many different samples and the statistic is very similar in each one, then the sampling variability is low. If I compare many different samples and the statistic is very different in each one, then the sampling variability is high. Sampling Model The sampling model of a statistic is a model of the values of the statistic from all possible samples of the same size from the same population. Example: Suppose the sampling model consists of the samples pˆ1 , pˆ 2 ,..., pˆ 9 , pˆ10 . (Note: There are actually many more than ten possible samples.) This sampling model has mean 0.754 and standard deviation 0.049. Sampling Distribution Model for a Proportion Assumptions: 1) Sampled values are independent Condition: 10% Condition 2) Sample size is large enough Condition: Success/Failure Condition (i.e. np ïģ 10 and nq ïģ 10 ) If these assumption/conditions are met, then we can model this with a Normal model were the mean and standard deviation are as follows: ðpĖ = ð ðð ðpĖ = √ ð . Examples: 1. Recent estimates suggest that 16.9% of young people (aged 2–19) are obese. A researcher into childhood obesity obtains a random sample of 60 young people. (a) What is the probability that fewer than 10 of these young people are obese? (b) What is the probability that between 9 and 11 of these young people are obese? 2. A CBS poll asked people if they thought that high school students should be required to learn a foreign language in order to graduate from high school. 51% of the respondents answered “yes.” As a follow-up, a research group has randomly selected 25 people and asked the same question. (a) What is the probability that more than 13 people will answer yes? (b) What is the probability that fewer than 10 people will answer yes? Sampling Distribution Model for Sample Means Must check: 1) Random Sampling Condition: values must be randomly sampled 2) Independence Assumption: sampled values must be mutually independent 3) 10% Condition Central Limit Theorem: As the sample size, n, increases, the mean of n independent values has a sampling distribution that tends toward a Normal model with the following mean and standard deviation: ð ððĨ = ð ððĨ = ð √ Examples: 1.) A new drug designed boost the immune system of HIV infected patients is being tested. Using the drug for three years results in a mean change in CD4+ cells (more of these cells indicate a healthier immune system) was 94.9 million cells per liter of blood, with a standard deviation of 169.0414. A sample of 127 HIV infected patients is selected to use this drug for a three year period, and their CD4+ cell count change is measured at the conclusion. What is the probability that between the patients will have a mean change of 110 million cells per liter of blood or more? 2.) Studies suggest that adult males are missing 3.49 teeth on average, with a standard deviation of 12.459. The population distribution of number of missing teeth amongst males is clearly skew right. A random sample of 100 males is obtained and the number of missing teeth measured. (a) What is the probability that group will have a mean of 1 missing tooth or less? (b) 10% of samples will have a mean of ____ missing teeth or more. What number fills in the blank?