Chapter 7: The Distribution of Sample Means Samples and Populations • Samples provide an incomplete picture of the population. • There are aspects of the population that may not be included within a sample. • The sampling error is the error between a sample statistic and the corresponding population parameter. – The sampling error is the measure of the discrepancy between the sample and the population. Sampling Error By definition, sampling is used to calculate sample statistics which are estimates of population parameters. So there will always be a difference (usually an unknown difference) between the sample statistic and the population parameter. This difference is called sampling error. Examples: X s s 2 p 2 Sampling distribution • Sampling distribution is a distribution of statistics (e.g. M or s) obtained by selecting all of the possible samples of a specific size (n) from the population. • General characteristics of the sampling distributions: 1. P(|M-μ| ≤ ɛ) quite large, i.e. M μ 2. M ~ Normal distribution 3. n↑ σM↓ or M μ or P(|M-μ| ≤ ɛ) ↑ Example 7.1: N=4, X= 2, 4, 6, 8 • Fig 7.1: population (p. 203): evenly distributed • Table 7.1: selected 16 samples with n=2 (p. 204), you get sixteen M distribution of M • Fig 7.2: sampling distribution of those 16 sample means with n=2 (p. 204) ~ Normal distribution • P(M>7) = ? 1/16 The Distribution of Sample Means • The distribution of sample means is defined as the set of means from all the possible random samples of a specific size (n) selected from a specific population. • This distribution has well-defined (and predictable) characteristics that are specified in the Central Limit Theorem Sampling Distribution of the Sample Mean The sampling distribution of the sample mean is a probability distribution consisting of all possible sample means of a given sample size selected from a population. The sampling distribution of the sample mean summarizes the probabilities of sampling error: X - m Sampling Distribution of the Sample Mean The mean of the distribution of sample means will be exactly equal to the population mean if we are able to select all possible samples of the same size from a given population. The Standard Error of the Mean: There will be less dispersion in the sampling distribution of the sample mean than in the population. As the sample size n increases, the standard error of the mean decreases. The Central Limit Theorem 1. The mean of the distribution of sample means is called the Expected Value of M and is always equal to the population mean μ. E(M) = μ 2. The standard deviation of the distribution of sample means is called the Standard Error of M and is computed by σ σM = ____ n or σ2M = σ2 ____ n 3. The shape of the distribution of sample means tends to be normal. It is guaranteed to be normal if either a) the population from which the samples are obtained is normal, or b) the sample size is n = 30 or more. Central Limit Theorem CENTRAL LIMIT THEOREM If all samples of a particular size are selected from any population, the sampling distribution of the sample mean is approximately a normal distribution. This approximation improves with larger samples. • If the population follows a normal probability distribution, then for any sample size the sampling distribution of the sample mean will also be normal. • If the population distribution is symmetrical (but not normal), the normal shape of the distribution of the sample mean emerges with samples as small as 10. • If a distribution is skewed or has thick tails, it may require samples of 30 or more to observe the normality feature. • The mean of the sampling distribution is equal to μ. The variance is equal to σ2/n and the standard deviation is equal to s / n . The Expected Value of M: E(M)= μ • If two (or more) samples are selected from the same population, the two samples probably will have different means. • Although the samples will have different means, you should expect the sample mans to be close to the population mean. • The mean of the distribution of the sample of means is equal to the mean of the population of scores (μ): that is the expected value of M. Standard Error: σM • The standard error (also known as the standard deviation of the distribution of sample means, σM) provides a measure of the average distance between M (sample mean) and μ (population mean). • Standard error describes the distribution of sample means (variability). • Law of large numbers: The larger the sample size (n), the more probable that M is close to μ. – Inverse relationship: the larger the sample size, the smaller the stander error. The Standard Error of M • The standard error of M is defined as the standard deviation of the distribution of sample means and measures the standard distance between a sample mean and the population mean. • Thus, the Standard Error of M provides a measure of how accurately, on average, a sample mean (M) represents its corresponding population mean (μ). p. 208-209-210 • if σ = 10 • Fig 7.3: n↑ σM↓ • Table 7.2: n↑ σM↓ Note: around n=30, σM is pretty small and stable Fig. 7.4: population a sample sampling distribution (M) p. 211 1. μ=50, σ=12 a. n=4, E(M)=? σM =? b. If the population is not normal, n=4, what is the shape of M distribution? c. n=36 , E(M)=? σM =? d. If the population is not normal, n=36, what is the shape of M distribution? 2. As n increases, E(M) also increases. (true or false?) 3. As n increases, σM also increases. (true or false?) Probability and Sample Means • Because the distribution of sample means tends to be normal, the z-score value obtained for a sample mean can be used with the unit normal table to obtain probabilities. • The procedures for computing z-scores and finding probabilities for sample means are essentially the same as we used for individual scores Probability and Sample Means (cont'd.) • However, when you are using sample means, you must remember to consider the sample size (n) and compute the standard error (σM) before you start any other computations. • Also, you must be sure that the distribution of sample means satisfies at least one of the criteria for normal shape before you can use the unit normal table. i.e. 1. population is normally distributed 2. n > 30 Using the Sampling Distribution of the Sample Mean If a population follows the normal distribution, the sampling distribution of the sample mean will also follow the normal distribution. If the shape is known to be non-normal but the sample contains at least 30 observations, the central limit theorem guarantees the sampling distribution of the mean follows a normal distribution. When the population standard deviation is known, a z-statistic for the sampling distribution of the sample mean is calculated as: z-Scores and Location within the Distribution of Sample Means (cont'd.) • As always, a positive z-score indicates a sample mean that is greater than μ and a negative zscore corresponds to a sample mean that is smaller than μ. • The numerical value of the z-score indicates the distance between M and μ measured in terms of the standard error. z-Scores and Location within the Distribution of Sample Means • Within the distribution of sample means, the location of each sample mean can be specified by a z-score: M–μ z = ───── σM Example 7.2 (p.211) Whenever you have a probability question about a sample mean, you must use the distribution of sample means. • SAT: Population ~ Normal distribution μ=500, σ=100, n = 25, P(M>540)=? E(M) = 500, σM =100/5=20, z0=(540-500)/20=2 P(M>540)= P(z>2) = 0.5 – P(0<z<2) =0.5 - 0.4772 = 0.0228 example 7.2 (p. 211) • The population of SAT scores ~ N(500, 100) • Take a random sample of n=25, P(M>540)=? M is a probability distribution. M’s distribution is normal, because the population is normally distributed. E(M) = μ = 500, σM = 100/5 = 20 P(M>540) = P(z>540-500/20)=P(z>2) = 0.5-0.4772 = 0.0228 Example 7.3 (p.213) • Computing z for a single score: use σ • Computing z for sample mean: use σM • SAT: Population ~ Normal distribution μ=500, σ=100, n = 25, P(|z|<z0) = 0.8, z0 = ? find P(0 < z < z0) = 0.4, z0 = ? from z tableP(0 < z < 1.28) = 0.3997 z0 = 1.28 (X0 – 500)/20 = 1.28 X0 = 500 1.28 * 20 X0 = 474.44, 525.6 example 7.3 (p. 213) • The population of SAT scores ~ N(500, 100) • a random sample: n=25, P(|M|<?)=0.8 E(M) = μ = 500, σM = 100/5 = 20 P(|z|<z0)=0.8 z0 = 1.28 (M0 - 500)/20 = 1.28 M0 - 500 = 25.6 M0 = (474.4, 525.6) standard deviation vs standard error Box 7.2 (p. 214) • Standard deviation measures the standard distance between X and μ. • Standard error measures the standard distance between M and μ. p214 1. μ=40, σ=8, M=44 z=? a. n=4 z = (44-40)/4 = 1 b. n=16 z = (44-40)/2 = 2 2. normal distribution: μ=65, σ=20, n=16, p(M>60)=? σM =20/4 = 5, z0 = (60-65)/5 = -1 p(z>-1) = 0.5+0.3413 = 0.8413 3. positively skewed: μ=60, σ=8, a. n=4, p(M>62)=? not enough info b. n=64, p(M>62)=? σM =8/8 =1 p(M>62)=(z>2) = 0.5-0.4772 = 0.0228 example 7.4 (p. 216-217) • population: students in a local college • survey question: # of minutes watching video per day • μ=80, σ=20 • n=1, n=4, n=100 (figure 7.8 (p.217) ∵ population: normal ∴ all three are normal all three have μ=80 but they have different σM p. 219 1. σ=10. On average, how much difference ..... a. between μ and a single score? σ = 10 b. between μ and M (n=4)? σM = 10/2 c. between μ and M (n=25)? σM = 10/5 2. Can σM > σ ? 3. σ=12. random sampling a. if σM ≤ 6, n = ? 12/? ≤ 6 ? ≥ 2 n ≥ 4 b. if σM ≤ 4, n = ? 12/? ≤ 4 ? ≥ 3 n ≥ 9 Example 7.5 Evaluate the effect of new growth hormone: Normal: μ=400, σ=20, n=25 σM = 4 compare the weight of treated rats (treatment group) and untreated rats (control group) If treated sample is noticeably different from untreated samples treatment is effective middle 95% is acceptable difference (i.e. random error) If z > 1.96 or z < - 1.96 noticeable difference treatment is effective boundary of middle 95% = (400-1.96*4, 400+1.96*4) =( 392.16, 407.84) • outside this boundary significant (effective) σM as a measure of reliability degree of confidence about the accuracy of M Is it OK to use M as a representative of μ? • large σM less confident about M • small σM more confident about M • the size of σM can be controlled.... select a large sample • n ↑ σM ↓ p. 223 1. Normal, μ=80, σ=20, a. select 1 score, how much distance would you expect, on average, between X and μ? σ b. select 100 scores? σM 2. Normal, μ=40, σ=8, a. n=16, M=36, relatively typical or extreme? σM = 8/4 = 2, z=(36-40)/2 = -2 b. n=4, typical or exteme? σM = 8/2 = 4, z=(36-40)/4 = -1 p. 224 3. Normal, μ=530, σ=80, a. n=16, 95% range? σM = 80/4 = 20, 5301.96*20 = (490.8, 569.2) b. n=100, 95% range? σM = 80/10 = 8, 5301.96*8 = (514.32, 545.68) 4. claimed: (μ=45, σ=4), Sample: n=16, M=43 Is this sample mean likely to occur if the claim is true? σM = 4/4 = 1, z=(43-45)/1=2 Is the sample mean within the range of values that would be expected 95% of the time? (assume: normal population)