Stat 220, Parts IV-V Probability and Chance Variability Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Stat 220, Parts IV-V Probability and Chance Variability Lecture 17 The Standard Error (Ch. 17) Normal Approximation for Averages and Sums (Ch. 18) Overview Probability Histograms The Central Limit Theorem The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Definition Example The Short Cut Normal Approximation: Central Limit Theorem John Kerrich tossed a coin 10,000 times and counted the number of heads (his story is told in Section 16.1 in the textbook). Before knowing the outcomes, what statements can we make based on our current knowledge? Summary Coin Tossing Experiment Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment A coin lands heads or tails with equal chances of 50%. In the long run, should the number of heads equal the number of tails? The Standard Error Overview Probability Histograms The Central Limit Theorem Summary Stat 220, Parts IV-V Probability and Chance Variability Example 3: Kerrich’s Coin Tossing Experiment Coin tossing experiment Stat 220, Parts IV-V Probability and Chance Variability Coin Tossing Experiment Central Limit Theorem If we can describe toss outcomes with numbers – random variables – then we can invoke the Law of Averages, since 10,000 is a pretty large number. A common trick with processes whose outcome can be seen as “yes” or “no” (“yes=heads, no=tails”), is to represent “yes” as 1 and “no” as 0. Then, our box model for the fair-coin toss has one ticket with 1 and one with 0. The number of heads in 10,000 tosses, is like the sum of 10,000 independent draws from the box. Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Recall: the expected value of a sum of draws is (number of draws) × (average of box) In our case, the average of the box is 1/2, so the expected value for Kerrich’s experiment is 5000 heads. The Law of Averages tells us that the actual fraction of heads (which is the average of all draws) should be pretty close to 1/2. But how close? It is not very likely that John Kerrich got exactly 5000 heads. We expect that he got about 5,000 heads. Stat 220, Parts IV-V Probability and Chance Variability Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary −2.00% −3.00% 0.40% 0.80% 0.28% −0.14% The difference between observed and expected number of heads seems to increase, but the difference in percents seems to decrease. Stat 220, Parts IV-V Probability and Chance Variability The standard error for the sum of the draws Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary 20 10 0 2000 4000 Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary 6000 8000 10000 6000 8000 10000 number of tosses Definition Example The Short Cut 0 2000 4000 nr of tosses Stat 220, Parts IV-V Probability and Chance Variability The standard error for the Sum of the draws Central Limit Theorem The actual sum will likely be different from the expected value. It will be off by the chance error: The Standard Error Definition Example The Short Cut The Standard Error 0 48.00% 47.00% 50.40% 50.80% 50.28% 49.86% −20 −10 -1 -3 2 8 14 -14 Example 3: Kerrich’s Coin Tossing Experiment nr of heads − half the number of tosses 24 47 252 508 2514 4986 Difference in % 4 Definition Example The Short Cut 50 100 500 1000 5000 10000 Difference observed % of heads 2 The Standard Error nr tosses observed nr of heads 0 Example 3: Kerrich’s Coin Tossing Experiment Central Limit Theorem −2 Central Limit Theorem 10,000 tosses: A Graph % of heads − 50% 10,000 tosses: A Table −4 Stat 220, Parts IV-V Probability and Chance Variability sum = expected value + chance error The chance error is the amount above (+) or below (-) the expected value. Definition The standard error (SE) for the sum tells us how big the chance error is likely to be. The SE has a lot in common with the r.m.s. error we learned about in regression. Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary A sum is likely to be around its expected value, but to be off by an amount similar in size to the Standard Error (SE). To compute the SE for a sum, we use the following law: Theorem (The square root law for sums) When drawing at random with replacement from a box of numbered tickets, the standard error for the sum of the draws is √ number of draws × (SD of the box). Stat 220, Parts IV-V Probability and Chance Variability The standard error for the Average of the draws Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Stat 220, Parts IV-V Probability and Chance Variability Example 4 Central Limit Theorem For sums, the SE will increase as we keep drawing. But for the average of all draws, the opposite happens! The formula for the SE of the average is: Example 3: Kerrich’s Coin Tossing Experiment Theorem (The square root law for averages) The Standard Error When drawing at random with replacement from a box of numbered tickets, the standard error for the average of the draws is SD of the box √ . number of draws Note: the formulas may look similar but the square-root of the number of draws works in opposite directions. For sums, it increases the SE and for averages, it reduces the SE. Short cut for calculating the SE When the tickets in the box show only two different numbers (’big’ and ’small’), the SD of the box is big number - small number p × (fraction with big number) × (fraction with small number) Example 4 (continued): The SD is r 1 3 √ (9 − 1) × × = 12 ≈ 3.5 4 4 Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary We make 100 draws at random with replacement from the box 1 1 1 9 The average of the box is 3. The expected value of the sum is 100 × 3 = 300. The SD of the box is r (1 − 3)2 + (1 − 3)2 + (1 − 3)2 + (9 − 3)2 √ = 12 ≈ 3.5 4 √ The SE for the sum is 100 × 3.5 = 35. Thus, the sum of the draws is likely to be ≈ 300, give or take 35 or so. The average will be ≈ 3, give or take 0.35 or so. Stat 220, Parts IV-V Probability and Chance Variability Example 5 Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary We make 25 draws from the box 0 2 3 4 6 Fill in the blanks: The sum of the draws is around ..., give or take ... or so. The average of the draws is around ..., give or take ... or so. The Standard Error • We can also calculate the SE for the average. The actual Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary of draws will tend to the expected value of a single draw. average will be within about 2-3 SE’s (rarely more) of the expected value. • We have similar results for sums (the book only talks about sums at this point). It’s all the same, because sum is just average times number of draws! Is that all? No. It turns out that we can also estimate the probabilities for how far off the average (or sum) will be. These probabilities follow... you guessed it, the normal curve. Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Probability histograms The histograms that we saw so far are data histograms (a.k.a. “empirical”): • They are based on data • Area under the histogram represents the percents (or counts) of cases We will now look at a new type of histogram: probability histograms: • They are based on theory, not on data • Area under the histogram represents chance Definition A probability histogram represents chance by area. The total area under the histogram is 100%. This type of histogram can also be a smooth curve. Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem The mathematical result proving that we can use the normal curve for averages, sums and some other beasts, is known as the Central Limit Theorem (CLT). Some people (myself included) think that it is more like a law of nature (e.g., Newton’s Laws) than like a piece of math. Never mind. The CLT is the main reason why we put so much effort into learning the normal curve. It is what makes the normal curve so commonly used. But before we present the CLT, we need one new concept. Summary Stat 220, Parts IV-V Probability and Chance Variability Probability histograms Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary We have actually seen at least two probability histograms: Probabilities of 2−Dice Sums 0.15 • When repeating the same draw independently, the average 0.10 Example 3: Kerrich’s Coin Tossing Experiment Central Limit Theorem Probability So probability theory allows us to say that: A Law of Nature? 0.05 Central Limit Theorem Can we say more? Stat 220, Parts IV-V Probability and Chance Variability 0.00 Stat 220, Parts IV-V Probability and Chance Variability 2 3 4 5 6 7 8 9 10 Sum The normal curve is a continuous probability histogram; the graph of monopoly-move probabilities is a discrete one. 11 12 Stat 220, Parts IV-V Probability and Chance Variability The Law of Averages for probability histograms Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Law of Averages for Probability Histograms As we draw many independent instances of the same r.v., the data histogram will look more and more like the r.v.’s probability histogram. Definition Example The Short Cut Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Summary This is how it looks for Monopoly moves. Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Example 6: Probability Histogram of a Sum Central Limit Theorem Sum of draws from the box 1 Stat 220, Parts IV-V Probability and Chance Variability 2 9 On the next slide we see a histogram of sums of 25, 50 and 100 draws, each repeated a large number of times. This is known as a statistical simulation. Because of the Law of Averages for probability histograms, when we simulate the same random process many times and make the same calculation each time, the histogram of all calculation results will be similar to the probability histogram for the corresponding r.v. Statistical simulation can be used (as we do here) to demonstrate a theoretical result about an r.v., or (as statistical researchers do) to explore the probability properties of r.v.’s for which we don’t have a theoretical result. Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Central limit theorem Theorem (The Central Limit Theorem) When drawing at random with replacement from a box, the probability histogram for the average (and the sum) will follow the normal curve, even if the contents of the box do not. The histogram must be put in standard units, and the number of draws must be reasonably large. Notes: 1 The theorem applies to averages and sums (also to medians, btw), but not necessarily to every number calculated from the data. 2 There is no clear-cut answer to the question what ’reasonably large’ is. Much depends on the contents of the box, but for the average of 100 draws the probability histogram will usually be very close to the normal curve. Example 6: Probability Histogram of a Sum Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary How to use the CLT On a conceptual level, the CLT is a very fortunate result because the normal curve has very thin tails compared to most distributions. So any number that obeys the CLT will rarely stray too far from its expected value. On a practical level, we can now make probability statements about averages and sums, using the normal curve. We are already familiar with normal-curve lookup techniques. The average of this normal curve will be the Expected Value (EV) of the average (or sum). The SD of this normal curve will be the Standard Error (SE) of the average (or sum). Example 7: solution to part 1 Set up the box model: 120 draws with replacement from a box with tickets 0,0,0,0,0,1. • EV = (# of draws) × (avg of box) = 120 × 61 = 20 • SE for sum of draws = • SD of box = (1 − 0) × • SE for sum of draws = p q √ # of draws × (SD of the box) 1 6 × 5 6 = .37 120 × .37 = 4.1 Normal approximation: New Average = 20, New SD = 4.1 Oops... the probability histogram for number of sixes looks like the monopoly one. It is discrete. But the normal curve is continuous. If we look up an interval from 15 to 25, we are really cutting the rectangles representing 15 and 25 sixes, right down the middle. This can distort the results; in fact we are underestimating the probability. Stat 220, Parts IV-V Probability and Chance Variability Example 7: Using the CLT for Dice Rolls Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment Roll a die 120 times 1 Use the normal approximation to estimate the chances of getting between 15 and 25 sixes, inclusive. 2 Use the normal approximation to estimate the chances of getting exactly 20 sixes. The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Stat 220, Parts IV-V Probability and Chance Variability Continuity correction Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary The solution: continuity correction. Use it for normal approximation, whenever the actual data come in whole numbers (integers). For example: • Exactly 20: look up the interval 19.5 to 20.5 • Between 15 and and 25, inclusive: 14.5 to 25.5 • Between 15 and and 25, exclusive: 15.5 to 24.5 Stat 220, Parts IV-V Probability and Chance Variability Example 7: solution to part 1 Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Now we can continue our solution. 15 to 25 inclusive becomes 14.5 to 25.5 for the look-up. z= sum−EV SE = 25.5−20 4.1 = 1.35 The probability is ≈ 0.82, or 82%. Overview Probability Histograms The Central Limit Theorem The CLT and Regression Why did we use the normal curve in regression? Because the CLT affects regression in two ways: 1 The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Normal Approximation: Central Limit Theorem In the probability histogram, the bar for 20 goes from 19.5 to 20.5. So we should find this area under the normal curve. z= sum−EV SE = 20.5−20 4.1 = .12 From normal table: Area=8.0%. So chance ≈ 8.0% Summary Stat 220, Parts IV-V Probability and Chance Variability Example 3: Kerrich’s Coin Tossing Experiment Definition Example The Short Cut Probability of getting exactly 20 sixes. The EV and SE are the same as before: EV = 20, SE = 4.1 Overview Probability Histograms The Central Limit Theorem Summary Central Limit Theorem The Standard Error Example 7: solution to part 2 2 The data points themselves might represent some sort of averaging process. For example, a person’s height is affected by many little things combined. So looking at the distribution of heights in a large population (in a relatively homogeneous ethnic setting), is like looking at a probability histogram of averages. That explains why the height distribution is approximately normal. The estimated regression line is itself a type of average. Which means it is subject to the CLT as well. So even if x or y do not look normal, we can often still make probability statements about the line using the normal approximation. Note: if x and y look very far from normal, we can use robust regression methods that do not assume normality. Stat 220, Parts IV-V Probability and Chance Variability Central Limit Theorem Example 3: Kerrich’s Coin Tossing Experiment The Standard Error Definition Example The Short Cut Normal Approximation: Central Limit Theorem Overview Probability Histograms The Central Limit Theorem Summary Summary • For averages and sums of draws with replacement, we can often assume that their probabilities follow the normal curve. This is known as the Central Limit Theorem or CLT. • Note: the book focuses on sums (the authors think sums are easier); the CLT is actually about averages, and most practical uses involve averages. The results hold for sums, because sum is just average times number of draws. • We use the expected value and SE for the average (or the sum) in order to convert to and from standard units, and make probability statements about the average (sum). • If we use the normal curve to estimate a discrete probability histogram, a continuity correction is recommended.