1 Chapter 6 – Sampling Distributions In most experiments, we have more than one measurement for any given variable, each measurement being associated with one randomly selected a member of a population. Hence we need to examine probabilities associated with events that specify conditions on two or more random variables. Defn: A set of n random variables ๐1 , ๐2 , … , ๐๐ constitutes a random sample of size n from a finite population of size N if each member of the sample, ๐๐ , is chosen in such a way that every member of the population has the same probability of being chosen. Defn: A set of (continuous or discrete) random variables X1, X2, ..., Xn is called a random sample if the r.v.’s have the same distribution and are independent. We say that X1, X2, ..., Xn are independent and identically distributed (i.i.d.). Note: We will also use the term random sample to the set of observed values ๐ฅ1 , ๐ฅ2 , … , ๐ฅ๐ of the random variables. Prior to selecting the sample and making the measurements, we have ๐1 , ๐2 , … , ๐๐ , with each ๐๐ being an (unknown) random quantity having associated probability distribution f(x). After selecting the sample and making the measurements, we have ๐ฅ1 , ๐ฅ2 , … , ๐ฅ๐ . Note: In practice, it is often difficult to do random sampling. However, random sampling is basic to the use of the statistical inferential procedures that we will discuss later. These procedures are used for analyzing experimental data, for testing hypotheses, for estimating parameters (numerical characteristics of populations), and for performing quality control in manufacturing. In each situation, we must somehow obtain convincing evidence that the data collected do approximate the conditions of randomness. Example : In a manufacturing situation, we have manufactured items coming off an assembly line. These items are in continuous production, being finished, packaged, and shipped. We do not at any given instant have access to the entire population of items. If we want to do sampling for quality control, we must do systematic sampling. Assume that at any given instant, there are 36 items that have been completed but are not yet packaged and shipped. We number these items from 1 to 36, in order of production. We choose a random integer between 1 and 36, and select that item for inspection. Assume that we have selected item number 23 out of that batch. Then, from the next batch of 36 items coming off the assembly line, we would also inspect item number 23; and we would inspect item number 23 for each succeeding batch. Do we have confidence that this type of systematic sampling will give us a representative sample of the population of items? Why or why not? If the manufacturing process is operating consistently, so that any variation from item to item is due to chance, then the above method of sampling has a good chance to yield a representative sample from the population. However, if there are occurrences in the process that, for example, lead to flawed items at repeated, regularly-spaced time intervals, then this cyclical pattern would tend to prevent systematic sampling from giving a representative sample. 2 The Sampling Distribution of the Sample Mean Defn: A statistic is a random variable which is a function of a random sample. The probability distribution associated with a statistic is called its sampling distribution. Example: Let X1, X2, ..., Xn be a random sample from a population (probability distribution). The statistic X ๏ฝ 1 n ๏ฅ X i is called the sample mean. Since The Xi’s are random variables, then X is n i ๏ฝ1 also a random variable, with a sampling distribution. Some other examples of statistics are: 1 n ๏จX i ๏ญ X ๏ฉ2 , ๏ฅ n ๏ญ 1 i ๏ฝ1 2 1) The sample variance, S ๏ฝ 2) The sample median, ~ X, Theorem 6.1: Let X1, X2, ..., Xn be a random sample from a distribution having mean ๏ญ and standard deviation ๏ณ. Then the mean of the sampling distribution of ๐ฬ is: ๏ฉ ๏น n ๏ป i ๏ฝ1 ๏ญ X ๏ฝ E ๏X ๏ ๏ฝ E ๏ช๏ฅ X i ๏บ ๏ฝ ๏ฅ ๏ญ ๏ฝ ๏ญ n n n 1 ๏ซ i ๏ฝ1 1 The variance of the sampling distribution depends on the size of the population from which the sample is drawn. If the population is of infinite size, then n ๏ณ X2 ๏ฝ ๏ฅ i ๏ฝ1 1 2 ๏ณ2 ๏ณ ๏ฝ . n n2 Or if the population is finite, of size N, then ๏ณ2 ๏ฆ N ๏ญn๏ถ ๏ณ X2 ๏ฝ ๏ง ๏ท. n ๏จ N ๏ญ1 ๏ธ Note: The quantity ๐−๐ ๐−1 is called the finite population correction factor. Note that if N is many times the size of n (as is often the case), then ๐๐๐ = ๐๐๐ ≅ 1. Note: The quantity ๐๐ฬ (standard deviation of the sampling distribution of the sample mean) is also called the standard error of the mean. It provides us with a measure of reliability of the sample mean as an estimate of the population mean. This term will be important when we discuss statistical inference. Note: If the random sample was selected from a normal distribution (we write X1, X2, ..., Xn ~ Normal(๏ญ, ๏ณ) ), then it can be shown that 3 ๏ฆ ๏ณ ๏ถ X ~ Normal๏ง๏ง ๏ญ , ๏ท๏ท . n๏ธ ๏จ Example: On page 134, Exercise 5.33. If I randomly select a single assembled piece of machinery from the population of assembled pieces, the time for assembly will be a random variable X having a Normal(µ = 12.9 min., σ = 2.0 min.) On the other hand, if I select a random sample of size 64 from the population, the distribution of ๐ฬ , the average assembly time for the sample of pieces, will have a distribution that is ๐๐๐๐๐๐(๐๐ฬ = 12.9 ๐๐๐. , ๐๐ฬ = 0.25 ๐๐๐. ). Note that the variability in the distribution of ๐ฬ is only one-eighth the variability in the distribution of X. This is an important concept. Theorem 6.2 (Law of Large Numbers): Let X1, X2, ..., Xn be a random sample from a distribution having mean ๏ญ and standard deviation ๏ณ. Then for any positive number ε, we can say that ๐(|๐ฬ − ๐| > ๐) → 0 ๐๐ ๐ → ∞. In other words, as the sample size increases unboundedly, the probability that the sample mean will differ from the population mean by an arbitrary amount ε will decrease to 0. For larger samples, the sample mean is more likely to be close to the population mean. The Law of Large Numbers and Relative Frequency Example: (page 182) Let our random experiment consist of flipping a fair coin twice. Let the event A = {at least one head}. What is P(A)? We define a sequence of random variables X1, X2, ..., Xn by saying that ๐๐ = 1 if the ith performance of the experiment results in at least one head (event A occurs), or ๐๐ = 0 if event A does not occur (result of ith performance of the experiment is two tails). The ๐′๐ ๐ are i.i.d. random variables whose distribution has mean ๐ = 0.75 and variance ๐ 2 = 0.1875. (Why?). Let # ๐๐ ๐ก๐๐๐๐ ๐ด ๐๐๐๐ข๐๐ ๐๐ ๐ ๐ก๐๐๐๐๐ ๐๐๐๐๐ก๐๐ฃ๐ ๐๐๐๐๐ข๐๐๐๐ฆ ๐๐ ๐ด = . ๐ Now, in this case, the sample mean is the relative frequency of occurrence of A, and the probability of occurrence of event A is 0.75. The sampling distribution of the sample mean for n repetitions of the experiment will have mean ๐๐ฬ = ๐2 ๐ = 0.75, and variance ๐๐2ฬ = ๐ = sampling distribution decreases. (0.75)(0.25) ๐ = 0.1875 ๐ . Note that, as n increases, the variance of the The Law of Large Numbers tells us that, as we perform the random experiment repeatedly, the value of the sample mean will get closer and closer to the population mean. In this case, the relative frequency of occurrence of event A will get closer and closer to the probability of occurrence of event A. 4 The following theorem is EXTREMELY important (as well as astonishing). This theorem provides the basis for our procedures for doing statistical inference. Theorem 6.3: (Central Limit Theorem) If X1, X2, ..., Xn are a random sample from any distribution with mean ๏ญ and standard deviation ๏ณ < +๏ฅ, then the limiting distribution of X ๏ญ๏ญ as n ๏ฎ +๏ฅ is standard normal. ๏ฆ ๏ณ ๏ถ ๏ง๏ง ๏ท๏ท ๏จ n๏ธ Note: Nothing was said about the distribution from which the sample was selected except that it has finite standard deviation. The sample could be selected from a normal distribution, or from an exponential distribution, or from a Weibull distribution, or from a Bernoulli distribution, or from a Poisson distribution, or from any other distribution with finite standard deviation. See, e.g., the example on pages 179-180. See also the illustration on page 184. Note: For what n will the normal approximation be good? For most purposes, if that the approximation given by the Central Limit Theorem (CLT) works well. n ๏ณ 30 , we will say Example: p. 187, Exercise 6.15. Example: The fracture strength of tempered glass averages 14 (measured in thousands of p.s.i.) and has a standard deviation of 2. What is the probability that the average fracture strength of 100 randomly selected pieces of tempered glass will exceed 14,500 p.s.i.? 1 Example: Shear strength measurements for spot welds have been found to have a standard deviation of 10 p.s.i. If 100 test welds are to be measured, what is the approximate probability that the sample mean will be within 1 p.s.i. of the true population mean? The T Distribution Use of the above discussion (Central Limit Theorem, etc.) to draw conclusions about the value of the population mean, µ, from a measured value of the sample mean, ๐ฅฬ , has a flaw. If we have to depend on sample data for information about the population mean, then we would tend not to know the value of the population standard deviation, either. We would also have to estimate σ. We need to modify our theory somewhat to take this complication into account. We introduce another probability distribution that allows us to use sample data alone to make inferences about the population mean. Theorem 6.4: If ๐ฬ is the mean of a random sample of size n taken from a normal distribution having mean µ and standard deviation σ, and if ๐ 1 2 ๐ = ∑(๐๐ − ๐ฬ )2 ๐−1 ๐=1 is the sample variance, then the random variable ๐ฬ − ๐ ๐ ( ) √๐ has a t-distribution with degrees of freedom ν = n – 1. ๐ก= 5 The t-distribution (which is actually a family of distributions, characterized by the degrees of freedom) has characteristics similar to those of the standard normal distribution, as we can see from the figure on page 187. Note that for large d.f., the t(n-1) distribution is very close to the standard normal distribution. In fact, the standard normal distribution provides a good approximation to the t(n-1) distribution for n of size 30 or more. Note: Cut-off values and various tail probabilities for the t-distribution, with various values for ν, may be found in Table 4 on page 516. Note that in order to use this table, we must know the degrees of freedom in the particular exercise. Example: page 188. The Sampling Distribution of the Variance The above discussion provides us with the tools to do inference about the value of a population mean. If we want to do inference about the value of a population variance, ๐ 2 , then we need to discuss the sampling distribution for the sample statistic, ๐ 2 , that we use to estimate the population variance. For this, we need to introduce another family of probability distributions, the chi-square family. Theorem 6.5: If ๐ 2 is the variance of a random sample of size n taken from a normal distribution with variance ๐ 2 , then the random variable (๐ − 1)๐ 2 ∑๐๐=1(๐๐ − ๐ฬ )2 2 ๐ = = ๐2 ๐2 has a chi-square distribution with degrees of freedom ν = n – 1. Note: Cut-off values and various tail-probabilities for the chi square distribution, with various values for ν, may be found in Table 5 on page 517. Note that in order to use this table, we must know the degrees of freedom in the particular exercise. Example: p. 190. The F-Distribution When we do analysis of experimental data, our conclusions about whether the experimental treatments had an effect will be based on a statistic which may be imagined as a “signal-to-noise” ratio, with the “signal” being the treatment effect (differences among the treatment groups) and the “noise” being the variability of the data within treatment groups. The sampling distribution of this statistic is given in the following theorem. This statistic may also be used to do inference about the differences between two population variances. Theorem 6.6: If ๐12 and ๐22 are the variances of independent random samples of size ๐1 and ๐2 , respectively, taken from two normal distributions having the same variance, then the random variable ๐12 ๐น= 2 ๐2 6 has an F distribution with parameters ๐1 = ๐1 − 1 (the numerator degrees of freedom) and ๐2 = ๐2 − 1 (the denominator degrees of freedom). Note: Cut-off values and various tail-probabilities for the F distribution, with various values for ๐1 and ๐2 , may be found in Table 6 on pages 518-519 (note that this table is an abbreviated version of an Ftable that would be used in practical situations). Note that in order to use this table, we must know the values of the two degrees-of-freedom parameters in the particular exercise. We will come back to the F distribution later in the course.