Chapter 18 Sampling Distribution Models Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 10 Population VS Sample Example 18.1 Suppose a total of 10,000 patients in a hospital and 7,000 of them like to play basketball. A sample of 200 patients is selected from this hospital, and 128 of them like to play basketball. Find the proportion of patients who like to play basketball in the population and in the sample. Population parameter is a numerical measure, e.g., mean, median, variance, etc, of the given population. Sample statistic is a summary measure calculated from a sample data set. Remark: sample statistic is a random variable, since its value varies from sample to sample. The distribution of a sample statistic is called sampling distribution. Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 2 / 10 Sample Proportion Population parameter: proportion p. e.g., proportion of students who pass the final in a class. Sample statistic: proportion p̂. e.g., instead of considering all students, we rather select a sample to investigate. Then, p̂ apparently depends on the chosen sample. Assume we select a sample of n students, and k of them passed the k exam. Hence, p̂ = . (different k and/or n give different p̂.) n Question? Assume true population parameter p is known, what is the distribution of p̂? (which is called sampling distribution) Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 3 / 10 Central Limit Theorem for Sample Proportion CLT Given p (population proportion) and n (sample size), the sampling distribution of p̂ is r ! pq N p, , where q = 1 − p, n that is, p̂ follows a normal distribution with mean p and standard q pq deviation n . Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 4 / 10 Assumptions and Conditions for CLT 1 Independence Assumption: The sampled values must be independent of each other. 2 Sample Size Assumption: The sample size, n must be large enough. 3 Randomization Condition: The sample should be a simple random sample of the population. 4 10% Condition: The sample size, n, must be no larger than 10% of the population. 5 Success/failure Condition: n · p ≥ 10 and n · q ≥ 10. Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 5 / 10 Example 18.2 A study shows that the proportion of people in age 20 to 34 with IQ over 120 is about 0.35. We randomly choose a sample of 50 people aged between 20 and 34. What is the probability there are more than 30 people with IQ over 120? Check assumptions: independence & large sample and conditions: randomization; 10%; and success/failure. Identify: p = 0.35 and n = 50. (q = 1 − p = 0.65) p Sampling distribution: p̂ ∼ N(0.35, 0.35 ∗ 0.65/50 = 0.06745). 30 “more than 30 people with IQ over 120” = “p̂ ≥ = 0.6”. 50 p̂ − 0.35 0.6 − 0.35 ≥ = 3.71 = 0.0001. P(p̂ ≥ 0.6) = P Z = 0.06745 0.06745 Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 6 / 10 Sample Mean Sampling distribution for Mean When a simple random sample is drawn from a population with mean µ and standard deviation σ , its sample mean ȳ has a sampling distribution with the same mean µ but new standard deviation is given by √σn . If the population is normally distributed, sampling distribution is exactly normally distributed. If not, then sampling distribution is only approximately normally distributed. And the larger the sample size, the closer the approximation. Remark: when we estimate the standard deviation of a sampling distribution, we call it standard error. Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 7 / 10 Assumptions and Conditions for CLT 1 Independence Assumption: The sampled values must be independent of each other. 2 Sample Size Assumption: The sample size, n must be large enough. 3 Randomization Condition: The sample should be a simple random sample of the population. 4 10% Condition: The sample size, n, must be no larger than 10% of the population. 5 Large Enough Sample Condition: n ≥ 30. Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 8 / 10 Example 18.3 The scores of students on the ACT college entrance exam has a normal distribution with µ = 18.6 and σ = 5.9. (a) What is the probability that one randomly chosen student scores 21 or higher? (b) Now take a simple random sample of 50 students who took the test. What are the mean and standard deviation of ȳ and describe the shape of its sampling distribution? (c) What is the probability that the sample mean is 21 or higher? Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 9 / 10 Example 18.4 The duration of a disease from the onset of symptoms until death ranges from 3 to 20 years. The mean is 8 years and the standard deviation is 4 years. Looking at the average duration for 30 randomly selected patients, calculate the mean and standard deviation of ȳ and describe the shape of its sampling distribution. What is the probability that the average duration of those 30 patients is less than 7 years? Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 10 / 10