Sampling Distributions 1 Central Limit Theorem* Distribution of Sample Means • Consider the following data as a Population 2, 4, 6, 8 – The population mean is 5 – The population standard deviation is 2.236 • Now we are going to take ALL possible samples of n = 2 from this population. • We will calculate the mean for each sample Sampling Distribution of Means for Samples of n = 2 Pick 1 Pick 2 Mean Mean 2 Variance Standard Deviation 2 2 2 2 4 4 4 4 6 6 6 6 8 8 8 8 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 80 4 9 16 25 9 16 25 36 16 25 36 49 25 36 49 64 440 0 2 8 18 2 0 2 8 8 2 0 2 18 8 2 0 0.000 1.414 2.828 4.243 1.414 0.000 1.414 2.828 2.828 1.414 0.00 1.414 4.243 2.828 1.414 0.00 Central Limit Theorem Applied Page 337 Results from a survey of students who were asked how many hours they spend per week using a search engine on the Internet. n = 400 μ = 3.88 σ = 2.40 Sample 1 1.1 7.0 6.8 7.8 3.8 6.5 A sample of 32 students selected from the 400 on the previous slide. 6.8 5.7 1.7 4.9 6.5 2.1 3.0 2.7 1.2 6.5 2.6 0.3 5.2 1.4 0.9 2.2 7.1 2.4 5.1 5.5 2.5 3.4 3.1 7.8 4.7 5.0 The mean of this sample is x̄x = 4.17. Sample 2 1.8 3.6 5.0 0.4 5.2 3.1 A different sample of 32 students selected from the 400. 4.0 5.7 0.5 2.4 6.5 3.9 0.8 1.2 3.1 6.2 5.4 5.8 0.8 5.7 2.9 6.6 7.2 7.2 5.7 5.1 0.9 7.9 3.2 4.0 2.5 3.1 For this sample x̄ is = 3.98. Now you have two sample means that don’t agree with each other, and neither one agrees with the true population mean. Figure 8.6 shows a histogram that results from 100 different samples, each with 32 students. Notice that this histogram is very close to a normal distribution and its mean is very close to the population mean, μ = 3.88. Figure 8.6 A distribution of 100 sample means, with a sample size of n = 32, appears close to a normal distribution with a mean of 3.88. Central Limit Theorem application: If we were to include all possible samples of size n = 32, this distribution would have these characteristics: • The distribution of sample means is approximately a normal distribution. • The mean of the distribution of sample means is 3.88 (the mean of the population). • The standard deviation of the distribution of sample means depends on the population standard deviation and the sample size. The population standard deviation is σ = 2.40 and the sample size is n = 32, so the standard deviation of sample means is σ = 2.40= 0.42 n 32 Margin of Error for the Mean The margin of error for the 95% confidence interval is margin of error = E ≈ 2s n where s is the standard deviation of the sample. We find the 95% confidence interval by adding and subtracting the margin of error from the sample mean. That is, the 95% confidence interval ranges from (x – margin of error) to (x + margin of error) We can write this confidence interval more formally as or more briefly as x̄ – E < μ < x̄ + E x̄ ± E 10 95% Confidence Interval Constructing a Confidence Interval Interpreting the Confidence Interval Figure 8.10 This figure illustrates the idea behind confidence intervals. The central vertical line represents the true population mean, μ. Each of the 20 horizontal lines represents the 95% confidence interval for a particular sample, with the sample mean marked by the dot in the center of the confidence interval. With a 95% confidence interval, we expect that 95% of all samples will give a confidence interval that contains the population mean, as is the case in this figure, for 19 of the 20 confidence intervals do indeed contain the population mean. We expect that the population mean will not be within the confidence interval in 5% of the cases; here, 1 of the 20 confidence intervals (the sixth from the top) does not contain the population mean. Using StatCrunch -Confidence Intervals • In the data set; select: – STAT – Z Statistics – One-Sample – With Data – Select Variable – Click next – Select confidence interval and percent – Calculate 15 Determine Minimum Sample Size • Solve the margin of error formula [E =2s/√n] for n. 2s n E 2 • You want to study housing costs in the country by sampling recent house sales in various (representative) regions. Your goal is to provide a 95% confidence interval estimate of the housing cost. Previous studies suggest that the population standard deviation is about $7,200. What sample size (at a minimum) should be used to ensure that the sample mean is within • a. $500 of the true population mean? 2 n E 2 2 7,200 2 28.8 829.4 500 2 16 EXAMPLE Constructing a Confidence Interval You want to study housing costs in the country by sampling recent house sales in various (representative) regions. Your goal is to provide a 95% confidence interval estimate of the housing cost. Previous studies suggest that the population standard deviation is about $7,200. What sample size (at a minimum) should be used to ensure that the sample mean is within a. $500 of the true population mean? b. $100 of the true population mean? Solution: a. With E = $500 and σ estimated as $7,200, the minimum sample size that meets the requirements is 2 n E 2 2 7,200 2 28.8 829.4 500 2 EXAMPLE Constructing a Confidence Interval Solution: a. (cont.) Because the sample size must be a whole number, we conclude that the sample should include at least 830 prices. b. With E = $100 and σ = $7,200, the minimum sample size that meets the requirements is 2 n E 2 2 7,200 2 144 20,736 100 2 Notice that to decrease the margin of error by a factor of 5 (from $500 to $100), we must increase the sample size by a factor of 25. That is why achieving greater accuracy generally comes with a high cost. Distribution of Sample Proportions Page 340 Sample Proportions In a survey where 400 students were asked if they own a car, 240 replied that they did. The exact proportion of car owners is p= 240 400 = 0.6 This population proportion, p = 0.6, is another example of a population parameter. 95% Confidence Interval for a Population Proportion For a population proportion, the margin of error for the 95% confidence interval is ˆ ˆ E2 p(1 p) n where p̂ is the sample proportion. The 95% confidence interval ranges from p̂ – margin of error to p̂ + margin of error We can write this confidence interval more formally as pˆ – E p pˆ E 21 Choosing the Correct Sample Size In order to estimate a population proportion with a 95% degree of confidence and a specified margin of error of E, the size of the sample should be at least 1 n> 2 E 22 EXAMPLE TV Nielsen Ratings The Nielsen ratings for television use a random sample of households. A Nielsen survey results in an estimate that a women’s World Cup soccer game had 72.3% of the entire viewing audience. Assuming that the sample consists of n = 5,000 randomly selected households, find the margin of error and the 95% confidence interval for this estimate. Solution: The sample proportion, p̂ = 72.3% = 0.723, is the best estimate of the population proportion. The margin of error is E2 pˆ (1 pˆ ) 0.723(1 0.723) 2 0.013 n 5,000 EXAMPLE 2 TV Nielsen Ratings Solution: (cont.) The 95% confidence interval is 0.723 – 0.013 < p < 0.723 + 0.013, or 0.710 < p < 0.736 With 95% confidence, we conclude that between 71.0% and 73.6% of the entire viewing audience watched the women’s World Cup soccer game. EXAMPLE Minimum Sample Size for Survey You plan a survey to estimate the proportion of students on your campus who carry a cell phone regularly. How many students should be in the sample if you want (with 95% confidence) a margin of error of no more than 4 percentage points? Solution: Note that 4 percentage points means a margin of error of 0.04. From the given formula, the minimum sample size is n= 1 1 = = 625 2 2 E 0.04 You should survey at least 625 students. 25 Core Logic of Hypothesis Testing • Considers the probability that the result of a study could have come about if the experimental procedure had no effect • If this probability is low, scenario of no effect is rejected and the theory behind the experimental procedure is supported Hypothesis Testing using Confidence Intervals State the claim about the population mean Determine desired confidence level Select a random sample from the population Calculate the confidence interval for the desired level of confidence. If the claim is contained within the interval, the claim is reasonable; if it is not within the interval, the claim is not reasonable, at the given level of confidence. See Testing a Claim document in Doc Sharing