Central Limit Theorem: Definition and Examples The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger — no matter what the shape of the population distribution. This fact holds especially true for sample sizes over 30. All this is saying is that as you take more samples, especially large ones, your graph of the sample means will look more like a normal distribution. Here’s what the Central Limit Theorem is saying, graphically. The picture on the left shows one of the simplest types of test: rolling a fair die. The more times you roll the die, the more likely the shape of the distribution of the means tends to look like a normal distribution graph. The Central Limit Theorem and Means An essential component of the Central Limit Theorem is that the average of your sample means will be the population mean. In other words, add up the means from all of your samples, find the average and that average will be your actual population mean. Similarly, if you find the average of all of the standard deviations in your sample, you’ll find the actual standard deviation for your population. It’s a pretty useful phenomenon that can help accurately predict characteristics of a population. Definition using Calculus If you’ve taken some calculus, you can define the CLT more precisely using the definition of a limit. The CDF of the standardized sample mean (X̄ – μ)/σ converges pointwise to the CDF (Φ) of the standard normal distribution. This is shown with the integral: Where: • • Xn is an IID sequence, Φ(z) = ℙ (Z ≤ z) Note: An assumption is that the expected value of X and X2 < infinity. Central Limit Theorem Examples A Central Limit Theorem word problem will most likely contain the phrase “assume the variable is normally distributed”, or one like it. With these central limit theorem examples, you will be given: 1. A population (i.e. 29-year-old males, seniors between 72 and 76, all registered vehicles, all cat owners) 2. An average (i.e. 125 pounds, 24 hours, 15 years, $15.74) 3. A standard deviation (i.e. 14.4lbs, 3 hours, 120 months, $196.42) 4. A sample size (i.e. 15 males, 10 seniors, 79 cars, 100 households) Central Limit Theorem Examples: Greater than For Central Limit Theorem word problems that contain the phrase “greater than” (or a similar phrase such as “above”). 1. General Steps Step 1: Identify the parts of the problem. Your question should state: 1. the mean (average or μ) 2. the standard deviation (σ) 3. population size 4. sample size (n) 5. a number associated with “greater than” ( ). Note: this is the sample mean. In other words, the problem is asking you “What is the probability that a sample mean of x items will be greater than this number? Step 2: Draw a graph. Label the center with the mean. Shade the area roughly above area). This step is optional, but it may help you see what you are looking for. (i.e. the “greater than” Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1. 1. Subtract the mean (μ in step 1) from the ‘greater than’ value ( in step 1). Set this number aside for a moment. 2. Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if thirty six children are in your sample and your standard deviation is 3, then 3 / √36 = 0.5 3. Divide your result from step 1 by your result from step 2 (i.e. step 1/step 2) Step 4: Look up the z-score you calculated in step 3 in the z-table. If you don’t remember how to look up zscores, you can find an explanation in step 1 of this article: Area to the right of a z-score. Step 5: Subtract your z-score from 0.5. For example, if your score is 0.1554, then 0.5 – 0.1554 = 0.3446. Step 6: Convert the decimal in Step 5 to a percentage. In our example, 0.3446 = 34.46%. That’s it! 2. Specific Example Q. A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation of $20. If a random sample of 25 people is taken, what is the probability their mean benefit will be greater than $120 per week? Step 1: Insert the information into the z-formula: = (120-110)/20 √25 = 10/ (20/5) = 10/4 = 2.5. Step 2: Look up the z-score in a table (or calculate it using technology). A z-score of 2.5 has an area of roughly 49.38%. Adding 50% (for the left half of the curve), we get 99.38%. Central Limit Theorem Examples: Less than Solving Central Limit Theorem word problems that contain the phrase “less than” (or a similar phrase such as “lower”). 1. General Steps Step 1: Identify the parts of the problem. Your question should state: 1. the mean (average or μ) 2. the standard deviation (σ) 3. population size 4. sample size (n) 5. a number associated with “less than” ( ) Step 2: Draw a graph. Label the center with the mean. Shade the area roughly above This step is optional, but it may help you see what you are looking for. (i.e. the “less than” area). Step 3: Use the following formula to find the z-score. Plug in the numbers from step 1. 1. Subtract the mean (μ in step 1) from the ‘less than’ value ( in step 1). Set this number aside for a moment. 2. Divide the standard deviation (σ in step 1) by the square root of your sample (n in step 1). For example, if thirty six children are in your sample and your standard deviation is 2, then 3 / √ 36 = 0.5 3. Divide your result from step 1 by your result from step 2 (i.e. step 1/step 2) Step 4: Look up the z-score you calculated in step 4 in the z-table. If you don’t remember how to look up z-scores you can find an explanation in step 1 of this article on area to the right of a z-score in a normal distribution curve. Step 5: Add your z-score to 0.5. For example, if your z-score is 0.1554, then 0.5 + 0.1554 is 0.6554. Step 6: Convert the decimal in Step 6 to a percentage. In our example, 0.6554 = 65.54%. That’s it! 2. Specific Example A population of 29 year-old males has a mean salary of $29,321 with a standard deviation of $2,120. If a sample of 100 men is taken, what is the probability their mean salaries will be less than $29,000? Step 1: Insert the values into the z-formula: = (29,000 – 29,321) / (2,120/√100) = -321/212 = -1.51. Step 2: Look up the z-score in the left-hand z-table (or use technology). -1.51 has an area of 93.45%. However, this is not the answer, as the question is asking for LESS THAN, and 93.45% is the area “greater than” so you need to subtract from 100%. 100% – 93.45% = 6.55% or about 0.07. Central Limit Theorem Examples: Between Example problem: There are 250 dogs at a dog show who weigh an average of 12 pounds, with a standard deviation of 8 pounds. If 4 dogs are chosen at random, what is the probability they have an average weight of greater than 8 pounds and less than 25 pounds? Step 1: Identify the parts of the problem. Your question should state: • mean (average or μ) standard deviation (σ) population size • sample size (n) • number associated with “less than” 1 • number associated with “greater than” 2 Step 2: Draw a graph. Label the center with the mean. Shade the area between 1 and 2. This step is optional, but it may help you see what you are looking for. Step 3: Use the following formula to find the z-scores. All this formula is asking you to do is: a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 25 – 12 = 13. b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8 / √ 4 = 4 c) Divide your result from a by your result from b: 13 / 4 = 3.25 Step 4: Use the formula from Step 3 to find the z-values. This time, use Xbar2 from Step 1 (8). a) Subtract the mean (μ in Step 1) from the greater than value (Xbar in Step 1): 8 – 12 = -4. b) Divide the standard deviation (σ in Step 1) by the square root of your sample (n in Step 1): 8 / √ 4 = 4 c) Divide your result from a by your result from b: -4 / 4= -1 Step 5: Look up the value you calculated in Step 3 in the z-table. Z value of 3.25 corresponds to .4994 Step 6: Look up the value you calculated in Step 4 in the z-table. Z value of 1 corresponds to .3413 Note that the bell curve is symmetrical, so if you want to look up a negative value like -1, then just look up the positive counterpart. The area will be the same. Step 7: Add Step 5 and 6 together: .4994 + .3413 = .8407 Step 8: Convert the decimal in Step 7 to a percentage: .8407 = 84.07% That’s it! Central Limit Theorem on the TI 89 Example problem: A population of community college students includes inner city students (p = .33). What is the probability that a random sample of 45 students from the population will have from 20% to 40% inner city students? Step 1: Press APPS. Highlight the Stats/List Editor by using the scroll keys. Press ENTER. Step 2: Press F5 and scroll down to C: BinomialCdf. Step 3: Enter 45 in the Num Trials box. Step 4: Scroll down and enter .33 in the Prob Success box. Step 5: Scroll down and enter 9 in the Lower Value box (because 20% of 45 = 9). Step 6: Scroll down and enter 18 in the Upper Value box (because 40% of 45 = 18). Press ENTER. Step 7: Read the Result: Cdf = .857142. This means that the probability your random sample will have 20 – 40% inner city students is 85.71%. TI 83 Central Limit Theorem: Overview The TI 83 calculator has a built in function that can help you calculate probabilities of central theorem word problems, which usually contain the phrase “assume the distribution is normal” (or a variation of that phrase). The function, normalcdf, requires you to enter a lower bound, upper bound, mean, and standard deviation. Example problem: A fertilizer company manufactures organic fertilizer in 10 pound bags with a standard deviation of 1.25 pounds per bag. What is the probability that a random sample of 15 bags will have a mean between 9 and 9.5 pounds? Step 1: 2nd VARS 2. Step 2: Enter your variables (lower bound, upper bound, mean, and standard deviation). Separate each variable by a comma: 9,9.5, 10,(1.25/√15)). Step 3: Press ENTER. This returns the probability of .05969, or .05969%. Tip: If you have a question that asks for “greater than” or “less than” a certain number, enter 999999999 for the lower or upper bound. For example, if you wanted to know the probability of greater than 8 pounds you would enter: NORMALCDF(8,999999999,10,1./√(15)) Less than 8 pounds you would enter: NORMALCDF(999999999,8,10,1./√(15)) Tip: Sampling distributions require that the standard deviation of the mean is σ / √(n), so make sure you enter that as the standard deviation. Chi squared Distribution (Transcript of the Video) The chi square or chi squared distributions describe the variance of samples from Normal populations. Consider the n-1 version of the variance formula: S2 n-1 = 1/(n-1) [(x1-x-bar)2 + (x2-x-bar)2 + (x3-x-bar)2 + … + (xn-x-bar)2] If the population from which the sample elements are drawn is Normal, then each of the data, x1 will be normally distributed as will their mean, x-bar, since it is simply a sum of variables that have Normal distributions. So, if the sample variance is the sum of a bunch of Normally-distributed quantities squared, what shape will its distribution take? (0:48/4:16) To answer that question, let’s look at the shape we get if we square a single normal distribution. For simplicity, we will make its mean zero and its variance one. If we square the values of points that fall between 1 and 1.5 in this distribution, they will map to points that range from 1 to 1.5 squared, which is 2.25. When squared, corresponding negative points will map to the same final range as their positive counterparts. The members of the distribution that fall between 2 and 2.5 will map to the new range 4 to 6.25, the squares of their original values, as will the matching negative numbers. Thus they will be more spread out and so the likelihood of getting a particular value in the set of squared values will be less. Next, points that started between zero and ½ and zero and -½ will map to the narrow zone between their squared values, namely between zero and ¼. If we continue this process for all ranges in the original graph, we obtain the graph shown. This figure is the chi squared distribution for k=1 degree of freedom. (2:03/4:16) Here is a histogram of elements from a random Normal distribution with a mean of zero and variance of one, along with its theoretical Normal curve. If we now square each of the data values, and plot a histogram of those values, we obtain this histogram, which matches well with the chi squared distribution for k=1. If you calculate the variance for a sample size n=2, then the variance calculation simplifies as shown in this slide, and as you can see, you really have only one independent Normal distribution squared, namely the one associated with x1 minus x2. The variance associated with a sample size of n=3 can be simplified to the sum of two squared quantities, each of which has same variance. (2:49/4:16) You might not be surprized to learn that the variance associated with a sample size of n gives rise to the sum of n-1 independent squared normal distributions and is characterized by the chi squared distribution with k=n-1 degrees of freedom. You can see a progression in these shapes as you increase n from 2 to 3 to 5 to 30, when n is large, the curve approaches a normal distribution. This last result should not be surprising since the Central Limit Theorem says that when you add a sufficient number of continuous distributions together you will obtain a Normal distribution. These chi squared curves also match well with histograms from our Sampling Distributions Spreadsheet. In case you are curious, the general formula for the chi squared family of distributions is the one shown here, and the distribution for k degrees of freedom has a mean of k and variance of 2k. But don’t worry, most spreadsheets and computer libraries have built-in functions for the chi squared distribution, so you will probably never have to use this expression to program your own function. (3:58/4:16) If the mean of the chi square distribution for k=1 is equal to 1, can you explain why the mean for the chi square distribution with k degrees of freedom is equal to k? Can you do the parallel calculation that relates their variances?