"It has been proved beyond a shadow of a doubt that smoking is one of the leading causes of statistics." Fletcher Knebel Chapter 18: SAMPLING DISTRIBUTION MODELS (Pages 410 - 431) OVERVIEW: One examines samples in order to come to reasonable conclusions about the population from which the sample is chosen. One must be statistically literate in order to glean meaningful information from a sample. This involves an awareness of what the sample results tell us, along with what they don't tell us. A statistic calculated from a sample may suffer from bias or high variability, and hence not represent a good estimate of a population parameter. Ideal Situation: Another Route: Parameter: Statistic: Quanity Mean Std. Deviation Proportion Statistic Parameter Will all samples of the same size give us the same statistics? Sampling Variability( Sampling Error): Why we consider a sample statistic to be a random variable w/ a distribution. Sampling distribution of a statistic: The following example demonstrates some of the statistical concepts developed in this section. Consider the three element population P ={1,2,3}. The mean of P is The standard deviation of P is = The variance of P is 2 = These values are parameters, since they are derived from a population. Now, consider all possible samples of size 2, with replacement. There would be 32 = 9 such samples. The Sampling Distribution Model, which contains the stats for every possible sample of size 2, looks like this: Sample Sample Mean = x Sample Var. = s2 Sample St.Dev.= s 1,1 1 0 0 1,2 1.5 .5 .707107 1,3 2 2 1.4142 2,1 1.5 .5 .707107 2,2 2 0 0 2,3 2.5 .5 .707107 3,1 2 2 1.4142 3,2 2.5 .5 .707107 3,3 3 0 0 MEANS A statistic is unbiased if the mean of the sampling model is equal to the true value of the parameter being estimated. Dist. of Means The table shows that... -the mean of the sampling distribution model means is the mean ( )of the population. This illustrates that a sample mean is an unbiased statistic/estimator of the population mean. (The distribution of sample means "centers" around the mean of the population.) -the mean of the sampling model variances (s2) is equal to the variance ( 2) of the population. This illustrates that a sample variance (s2) is an unbiased statistic/estimator of the population variance. (The distribution of sample variances "centers"around the variance of the population.) - A sample standard deviation is not an unbiased estimator of the population standard deviation. In this example, the mean of the sample standard deviations is s = 0.628539, and the standard deviation of the population is = 0.81649658. (The distribution of sample standard deviations does "not center" around the standard deviation of the population.) * Note: An unbiased statistic itself may fall above or below the true value of the parameter, but the distribution of the statistics mean will be equal to the mean of the parameter. How does knowing that the numbers do this when it comes to sampling distribution models help us? If a few assumptions are met by checking some conditions, then we are allowed to apply the Normal Model to our information. Assumptions and Conditions: 1. Independence Assumption: The sampled values must be independent. It is hard to be sure unless they give us this information in the problem, therefore we just check the Randomization Condition: 10% Condition: 2. Large enough Sample Condition: If the original population distribution is Normal, then the sample can be small. If the original population distribution is skewed, then have to have a bigger sample. When these conditions are met, not only can we apply the Normal Model but we can also use The Central Limit Theorem which is often referred to as the Fundamental Theorem of Statistics. CLT: says that the mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be. *Even cooler than that, the CLT says that no matter what the shape of the distribution is originally, as the sample size increases, the sampling distribution model will get more and more normal. Let’s investigate some pennies! So the Central Limit Theorem says that the sampling distribution of any mean is approximately Normal. The sampling distribution model for a mean When a random sample is drawn from any population with mean and standard deviation , it’s sample mean, x , has a sampling distribution with the same mean but the standard deviation will be SD(x) Therefore, the model we use is N( . , ). Ex1. SAT scores should have mean 500 and standard deviation 100. What about the mean of random samples of 20 students? Solution: Think – We are interested in the distribution of possible means from samples of SAT scores from 20 students. SAT scores have a mean of 500 and a standard deviation of 100, and since the SAT is standardized, it’s reasonable to assume that the model for all SAT scores is Normal. Independence Assumption: It’s reasonable to think that the SAT scores of the 20 randomly sampled students will be independent, as long as the students weren’t all from the same university. Random Sampling Condition: The 20 students were sampled randomly. 10% Condition: 20 students represent less than 10% of all students. Big Enough Sample Assumption: Histogram of the population is unimodal and roughly symmetric. n = 20 is large enough for CLT to take effect Under these conditions, the sampling distribution of x has a Normal model with mean Show - Tell – Ex2. Speeds of cars on a highway have mean 52 mph and standard deviation 6 mph, and are likely to be skewed to the right (a few very fast drivers). Describe what we might see in random samples of 50 cars. Solution: Think – We are interested in the distribution of possible means from samples of speeds from 50 cars on the highway. Speeds have a mean of 52 mph and a standard deviation of 6 mph, with a distribution that is skewed to the right. Independence Assumption: It’s reasonable to think that the speeds of the 50 randomly sampled cars will be independent. Random Sampling Condition: The 50 speeds were sampled randomly. 10% Condition: 50 cars represent less than 10% of all cars. Big Enough Sample Assumption: Even though the distribution is skewed, the Central Limit Theorem applies, since thesample size, 50 cars, is large. Under these conditions, the sampling distribution of x has a Show - Tell – Ex3. At birth, babies average 7.8 pounds, with a standard deviation of 2.1 pounds. A random sample of 34 babies born to mothers living near a large factory that may be polluting the air and water shows a mean birthweight of only 7.2 pounds. Is that unusually low? Solution: Think – We are interested in the probability that a sample of babies has mean birthweight less than 7.2 pounds. At birth, babies average 7.8 pounds, with a standard deviation of 2.1 pounds. The model for birthweights should be roughly unimodal and symmetric, if not Normal. Independence Assumption: It’s reasonable to think that the weights of the 50 randomly sampled babies will be independent. Random Sampling Condition: The 34 babies were sampled randomly. 10% Condition: As long as more than 340 babies were born to mothers living in the vicinity of the factory, 34 babies represent less than 10% of all babies. Since the model for all babies is unimodal and symmetric, the Central Limit Theorem applies, especially since the sample size, 34 babies, is large. Under these conditions, the sampling distribution of x has a Normal model, Show Tell – Sample Proportions Consider a simple random sample (SRS) of 1,000 people from a large population. If X represents the number in this sample who are Republicans, then there are 1,001 possible values of X, namely 0,1,2,3,..., 998, 999, 1000. If p̂ (p-hat) represents the possible sample proportions of Republicans in the sample, then there are 1,001 possible values of p̂ , namely 0/1000, 1/1000, 2/1000,..., 998/1000, 999/1000, 1000/1000. For a given sample, we might find p̂ = .56. For another sample, we might find p̂ = .52. We could choose many SRS's and calculate a p̂ for each sample. In general, we would expect the distribution of p̂ to be approximately normal. Once again, if a few assumptions are met by checking some conditions, then we are allowed to apply a Normal Model to our information. Assumptions: 1. 2. Since it is hard to check assumptions, we verify the following conditions: 1. Randomization Condition: 2. 10% Condition: 3. Success/Failure Condition: If we choose an SRS of size n from a large population with population proportion p having some characteristic of interest, and if p̂ is the proportion of the sample having that characteristic, then - - - The standard deviation of the sampling distribution is - So, in symbolic notation we have the following distribution: Example: The Census Bureau reports that 40% of the 50,000 families in a particular region have more than one color TV in their household. What is the probability that a SRS of size 100 will indicate 45% or more households with more than one color TV when the population proportion is 40%? Solution p _____, n ______, pˆ ____ a.check assumptions/conditions Randomization Condition: 10% Condition: Success/Failure Condition: b. calculate go to the z-chart or use the TI to do a normcdf( there is a probability of roughly that a sample of size 100 will have a proportion of .45 or more when the population proportion is .40. In other words a sample proportion of .45 is not necessarily an unexpected event, and could easily occur due simply to sampling variation. Example: Suppose it is known that 60% of the registered voters in a district of over 20,000 people are Republicans. If you choose an SRS of 1000 registered voters, (a) what is the probability that the proportion of registered voters in the sample is between 58% and 62%? (b) what is the probability that the sample will contain no more than 550 Republicans? First, note that both assumptions and conditions are satisfied. The sample proportion p̂ has mean = and standard deviation = Response to (a): Using the TI-83, normalcdf( Response to (b): 1. Records at a large university indicated that 20% of all freshman are placed on academic probation at the end of their first semester. A random sample of 100 of this year’s freshman indicated that 25% of them were placed on academic probation at the end of the first semester. The results of this sample: (A) are surprising since it indicates that 5% more were placed on probation than was expected. (B) are surprising since SAT scores have been increasing over the past few years. (C) are not surprising since the standard deviation of the sampling distribution is 4%. (D) are surprising since the standard deviation of the sampling distribution is .4%. (E) are biased since the increase of 5% could not happen w/o injecting bias into the sample. 2. According to the manufacturer, the average proportion of red candies in a package is 20%. An 8oz. package contains 250 candies. Find the probability that a randomly selected 8 oz. package contains less than 45 red candies. 3. A mathematics department published the claim that a minimum of 70% of students enrolled in their classes receive a final grade of C- or better in any semester. A SRS of 50 students from the department’s classes indicated that only 65% of the students had a C- or better. What is the probability that a sample of this size will have a result that differs from the claimed proportion by more than 5%(above or below)? Would such a result surprise you? State a conclusion. 4. A SRS of size 200 is taken from a population of 1,000 people in a professional organization regarding preferences on the issues of raising dues. What is the probability that this sample will produce a result of 10% or less favoring the raise when it is known that 3.5% of the population favors the raise? 5. Find the size of a SRS needed so that the probability that its proportion differs from the population proportion by more than 2%(above or below) is .1. Assume that the population proportion is .63.