A P STATISTICS LESSON 9 – 1 ( DAY 1 ) SAMPLING DISTRIBUTIONS ESSENTIAL QUESTION: How often would this method give a correct answer if I used it very many times? Objectives: To distinguish between parameters and statistics. T0 define and recognize sampling distributions, bias and variability. The advantages and disadvantages in size of sample. Introduction The reasoning of statistical inference rests on asking, “How often would this method give a correct answer if I used it many, many times?” If it doesn’t make sense to imagine repeatedly producing your data in the same circumstances, statistical inference is not possible. Introduction (continued…) All agree that inference is most secure when we produce data by random sampling or randomized comparative experiments. The reason is that when we use chance to choose respondents or assign subjects, the laws of probability answer the question “What would happen if we did this many times?” Parameter, Statistic A parameter is a number that describes the population. A parameter is a fixed number, but in practice we do not know its value because we cannot examine the entire population. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use a statistic to estimate an unknown parameter. Example 9.1 Making Money The mean income of the sample of households contacted by the Current Survey was x = $57,045. The number $57,045 is a statistic because it describes the Current Population Survey sample. The parameter of interest is the mean income of all of these households. We don’t know the value of this parameter. Symbols for Populations and Samples The symbol for population proportion is p. The symbol for sample proportion is p. Since most of the time the actual parameters are not known, the mean and standard deviation for a sample are used for the parameters mean and standard deviation. Example 9.1 (continued…) The representation for a parameter mean is the Greek letter μ which is “mu”. The mean of the sample is the symbol x. The basic fact that every sample’s mean will probably be different is called sampling variability. The value of a statistic varies in repeated random samples. Sampling Variability If we take many samples: 1. Take a large number of samples from the same population. 2. Calculate the sample mean x or proportion p for each sample. 3. Make a histogram of the values of x and p. 4. Examine the distribution displayed in the histogram for shape, center, and spread, as well as outliers or other deviations. Sampling Distribution The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. Strictly speaking, the sampling distribution is the ideal pattern that would emerge if we looked at all possible samples of the same size from the population. Describing Sampling Distributions Describe a sampling distribution by finding the center and spread of the sample. Example Are You a Survivor Fan? The figure shows the results of drawing 1000 SRSs of size n = 100 from a population with p = 0.37. We see that: The overall shape of the distribution is symmetric and approximately normal. The center of the distribution is very close to the true value p = 0.37. The Bias of a Statistic Sampling distributions allow us to describe bias more precisely by speaking of the bias of a statistic rather than bias in a sampling method. (a) Sample size 100 (b) Sample size 1000 Bias concerns the center of the sampling distribution. The statistic from the larger sample is less variable. Unbiased Statistics A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. An unbiased statistic will sometimes fall above the true value of the parameter and sometimes below if we take many samples. Because its sampling distribution is centered at the true value, however, there is no systematic tendency to overestimate or underestimate the parameter. The Variability of a Statistic The statistics whose sampling distributions are unbiased are centered at the true population parameter. The sample proportion, p , from a random sample of any size is an unbiased estimate of the parameter, p. The Variability of a Statistic (continued…) The variability of a statistic is described by the spread of its sampling distribution. This spread is determined by the sampling design and the size of the sample. Larger samples give smaller spread. As long as the population is much larger than the sample (say, at least 10 times as large), the spread of the sampling distribution is approximately the same for any population size. Bias and Variability Bias means that our aim is off and we consistently miss the bulls-eye in the same direction. Our sample values do not center on the population value. High variability means that repeated shots are widely scattered on the target. Notice that low variability (shots are consistently away from the bulls-eye in one direction), and low bias (shots centered on the bulls-eye), can accompany high variability (shots that are widely scattered). Properly chosen statistics computed from random samples of sufficient size will have low bias and low variability. Figure 9.5