Sampling Distributions and the Central Limit Theorem Whenever we select a random sample from a population, collect data from the members of the sample, and summarize the data values in the form of a statistic, that statistic is a random variable (depending on which random sample we happen to choose from the population), and thus has an associated probability distribution, called a sampling distribution. The form of the sampling distribution will, in general, depend on the type of statistic we are using. However, there are certain general properties shared by all sampling distributions. There is also a rather remarkable fact from probability theory that says that, under very general conditions and for large sample sizes, all sampling distributions tend to have approximately the same form. Random Sampling distribution principles: Even if the underlying distribution isn’t normal, the sampling distribution can be close enough to normal to be able to use it. (This is a consequence of the Central Limit Theorem. A good approximation requires that the sample size be large. Samples of size 30 or larger generally work well.) Assume each sample is exactly the same size. Assume you take samples over and over billions of times. Assume each sample is chosen at random (any sample has an equal probability of being selected). These samples will usually differ slightly. The value of the statistics you compute (e.g., the proportion of males in the sample) will vary from sample to sample. The mean of the sampling distribution, X , will equal the population mean, X . The standard deviation of the sampling distribution depends on the size of the sample you’re working with. The bigger the sample, the narrower the sampling distribution. In particular, the standard deviation of the sampling distribution will be smaller than the population standard deviation 1 by a factor of . In other words, X X . n n The principles stated above are illustrated in the following diagram. In this case, the population distribution (shaded curve) is actually normal. This is the histogram that we would get if we measured X for each member of the entire population. The other curves represent the histograms that we would get from selecting all possible random samples of a certain size from the population, calculating the sample mean for each sample, and constructing a histogram of the resulting values.